diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md index 90b9befa261..fe4b06dd263 100644 --- a/CONTRIBUTORS.md +++ b/CONTRIBUTORS.md @@ -40,11 +40,13 @@ Guidelines for modifications: * Amr Mousa * Andrej Orsula * Anton Bjørndahl Mortensen +* Antonin Raffin * Arjun Bhardwaj * Ashwin Varghese Kuruttukulam * Bikram Pandit * Bingjie Tang * Brayden Zhang +* Brian Bingham * Cameron Upright * Calvin Yu * Cheng-Rong Lai @@ -74,6 +76,7 @@ Guidelines for modifications: * Jinqi Wei * Johnson Sun * Kaixi Bao +* Kris Wilson * Kourosh Darvish * Kousheek Chakraborty * Lionel Gulich @@ -86,6 +89,7 @@ Guidelines for modifications: * Miguel Alonso Jr * Muhong Guo * Nicola Loi +* Norbert Cygiert * Nuoyan Chen (Alvin) * Nuralem Abizov * Ori Gadot @@ -102,6 +106,7 @@ Guidelines for modifications: * Rosario Scalise * Ryley McCarroll * Shafeef Omar +* Shaoshu Su * Shundo Kishi * Stefan Van de Mosselaer * Stephan Pleines @@ -119,6 +124,7 @@ Guidelines for modifications: * Zhengyu Zhang * Ziqi Fan * Zoe McCarthy +* David Leon ## Acknowledgements diff --git a/README.md b/README.md index 94fb5bddbea..aae3d840dd3 100644 --- a/README.md +++ b/README.md @@ -93,12 +93,13 @@ or opening a question on its [forums](https://forums.developer.nvidia.com/c/agx- ## Connect with the NVIDIA Omniverse Community -Have a project or resource you'd like to share more widely? We'd love to hear from you! Reach out to the -NVIDIA Omniverse Community team at OmniverseCommunity@nvidia.com to discuss potential opportunities -for broader dissemination of your work. +Do you have a project or resource you'd like to share more widely? We'd love to hear from you! +Reach out to the NVIDIA Omniverse Community team at OmniverseCommunity@nvidia.com to explore opportunities +to spotlight your work. -Join us in building a vibrant, collaborative ecosystem where creativity and technology intersect. Your -contributions can make a significant impact on the Isaac Lab community and beyond! +You can also join the conversation on the [Omniverse Discord](https://discord.com/invite/nvidiaomniverse) to +connect with other developers, share your projects, and help grow a vibrant, collaborative ecosystem +where creativity and technology intersect. Your contributions can make a meaningful impact on the Isaac Lab community and beyond! ## License diff --git a/docs/source/_static/tasks/locomotion/agility_digit_flat.jpg b/docs/source/_static/tasks/locomotion/agility_digit_flat.jpg new file mode 100644 index 00000000000..f98c15e014a Binary files /dev/null and b/docs/source/_static/tasks/locomotion/agility_digit_flat.jpg differ diff --git a/docs/source/_static/tasks/locomotion/agility_digit_loco_manip.jpg b/docs/source/_static/tasks/locomotion/agility_digit_loco_manip.jpg new file mode 100644 index 00000000000..5d6abe59865 Binary files /dev/null and b/docs/source/_static/tasks/locomotion/agility_digit_loco_manip.jpg differ diff --git a/docs/source/_static/tasks/locomotion/agility_digit_rough.jpg b/docs/source/_static/tasks/locomotion/agility_digit_rough.jpg new file mode 100644 index 00000000000..85b9d8fe24b Binary files /dev/null and b/docs/source/_static/tasks/locomotion/agility_digit_rough.jpg differ diff --git a/docs/source/how-to/add_own_library.rst b/docs/source/how-to/add_own_library.rst index e1ca232704e..2606abe9a13 100644 --- a/docs/source/how-to/add_own_library.rst +++ b/docs/source/how-to/add_own_library.rst @@ -63,7 +63,7 @@ Integrating a new library Adding a new library to Isaac Lab is similar to using a different version of a library. You can install the library in your Python environment and use it in your experiments. However, if you want to integrate the library with -Isaac Lab, you can will first need to make a wrapper for the library, as explained in +Isaac Lab, you will first need to make a wrapper for the library, as explained in :ref:`how-to-env-wrappers`. The following steps can be followed to integrate a new library with Isaac Lab: diff --git a/docs/source/overview/core-concepts/actuators.rst b/docs/source/overview/core-concepts/actuators.rst index 4f874830036..72d686a632b 100644 --- a/docs/source/overview/core-concepts/actuators.rst +++ b/docs/source/overview/core-concepts/actuators.rst @@ -35,6 +35,7 @@ maximum effort: .. math:: \tau_{j, computed} & = k_p * (q_{des} - q) + k_d * (\dot{q}_{des} - \dot{q}) + \tau_{ff} \\ + \tau_{j, max} & = \gamma \times \tau_{motor, max} \\ \tau_{j, applied} & = clip(\tau_{computed}, -\tau_{j, max}, \tau_{j, max}) diff --git a/docs/source/overview/environments.rst b/docs/source/overview/environments.rst index 2f41da00066..476b571a732 100644 --- a/docs/source/overview/environments.rst +++ b/docs/source/overview/environments.rst @@ -214,6 +214,12 @@ We provide environments for both disassembly and assembly. wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run sudo sh cuda_12.8.0_570.86.10_linux.run + When using conda, cuda toolkit can be installed with: + + .. code-block:: bash + + conda install cudatoolkit + For addition instructions and Windows installation, please refer to the `CUDA installation page `_. * |disassembly-link|: The plug starts inserted in the socket. A low-level controller lifts the plug out and moves it to a random position. This process is purely scripted and does not involve any learned policy. Therefore, it does not require policy training or evaluation. The resulting trajectories serve as demonstrations for the reverse process, i.e., learning to assemble. To run disassembly for a specific task: ``python source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_disassembly_w_id.py --assembly_id=ASSEMBLY_ID --disassembly_dir=DISASSEMBLY_DIR``. All generated trajectories are saved to a local directory ``DISASSEMBLY_DIR``. @@ -288,6 +294,12 @@ Environments based on legged locomotion tasks. +------------------------------+----------------------------------------------+------------------------------------------------------------------------------+ | |velocity-rough-g1| | |velocity-rough-g1-link| | Track a velocity command on rough terrain with the Unitree G1 robot | +------------------------------+----------------------------------------------+------------------------------------------------------------------------------+ + | |velocity-flat-digit| | |velocity-flat-digit-link| | Track a velocity command on flat terrain with the Agility Digit robot | + +------------------------------+----------------------------------------------+------------------------------------------------------------------------------+ + | |velocity-rough-digit| | |velocity-rough-digit-link| | Track a velocity command on rough terrain with the Agility Digit robot | + +------------------------------+----------------------------------------------+------------------------------------------------------------------------------+ + | |tracking-loco-manip-digit| | |tracking-loco-manip-digit-link| | Track a root velocity and hand pose command with the Agility Digit robot | + +------------------------------+----------------------------------------------+------------------------------------------------------------------------------+ .. |velocity-flat-anymal-b-link| replace:: `Isaac-Velocity-Flat-Anymal-B-v0 `__ .. |velocity-rough-anymal-b-link| replace:: `Isaac-Velocity-Rough-Anymal-B-v0 `__ @@ -318,6 +330,9 @@ Environments based on legged locomotion tasks. .. |velocity-flat-g1-link| replace:: `Isaac-Velocity-Flat-G1-v0 `__ .. |velocity-rough-g1-link| replace:: `Isaac-Velocity-Rough-G1-v0 `__ +.. |velocity-flat-digit-link| replace:: `Isaac-Velocity-Flat-Digit-v0 `__ +.. |velocity-rough-digit-link| replace:: `Isaac-Velocity-Rough-Digit-v0 `__ +.. |tracking-loco-manip-digit-link| replace:: `Isaac-Tracking-Flat-Digit-v0 `__ .. |velocity-flat-anymal-b| image:: ../_static/tasks/locomotion/anymal_b_flat.jpg .. |velocity-rough-anymal-b| image:: ../_static/tasks/locomotion/anymal_b_rough.jpg @@ -336,6 +351,9 @@ Environments based on legged locomotion tasks. .. |velocity-rough-h1| image:: ../_static/tasks/locomotion/h1_rough.jpg .. |velocity-flat-g1| image:: ../_static/tasks/locomotion/g1_flat.jpg .. |velocity-rough-g1| image:: ../_static/tasks/locomotion/g1_rough.jpg +.. |velocity-flat-digit| image:: ../_static/tasks/locomotion/agility_digit_flat.jpg +.. |velocity-rough-digit| image:: ../_static/tasks/locomotion/agility_digit_rough.jpg +.. |tracking-loco-manip-digit| image:: ../_static/tasks/locomotion/agility_digit_loco_manip.jpg Navigation ~~~~~~~~~~ @@ -554,6 +572,10 @@ Environments based on fixed-arm manipulation tasks. Comprehensive List of Environments ================================== +For environments that have a different task name listed under ``Inference Task Name``, please use the Inference Task Name +provided when running ``play.py`` or any inferencing workflows. These tasks provide more suitable configurations for +inferencing, including reading from an already trained checkpoint and disabling runtime perturbations used for training. + .. list-table:: :widths: 33 25 19 25 @@ -761,6 +783,10 @@ Comprehensive List of Environments - - Manager Based - + * - Isaac-Tracking-LocoManip-Digit-v0 + - Isaac-Tracking-LocoManip-Digit-Play-v0 + - Manager Based + - **rsl_rl** (PPO) * - Isaac-Navigation-Flat-Anymal-C-v0 - Isaac-Navigation-Flat-Anymal-C-Play-v0 - Manager Based @@ -869,6 +895,10 @@ Comprehensive List of Environments - Isaac-Velocity-Flat-Cassie-Play-v0 - Manager Based - **rsl_rl** (PPO), **skrl** (PPO) + * - Isaac-Velocity-Flat-Digit-v0 + - Isaac-Velocity-Flat-Digit-Play-v0 + - Manager Based + - **rsl_rl** (PPO) * - Isaac-Velocity-Flat-G1-v0 - Isaac-Velocity-Flat-G1-Play-v0 - Manager Based @@ -884,7 +914,7 @@ Comprehensive List of Environments * - Isaac-Velocity-Flat-Unitree-A1-v0 - Isaac-Velocity-Flat-Unitree-A1-Play-v0 - Manager Based - - **rsl_rl** (PPO), **skrl** (PPO) + - **rsl_rl** (PPO), **skrl** (PPO), **sb3** (PPO) * - Isaac-Velocity-Flat-Unitree-Go1-v0 - Isaac-Velocity-Flat-Unitree-Go1-Play-v0 - Manager Based @@ -913,6 +943,10 @@ Comprehensive List of Environments - Isaac-Velocity-Rough-Cassie-Play-v0 - Manager Based - **rsl_rl** (PPO), **skrl** (PPO) + * - Isaac-Velocity-Rough-Digit-v0 + - Isaac-Velocity-Rough-Digit-Play-v0 + - Manager Based + - **rsl_rl** (PPO) * - Isaac-Velocity-Rough-G1-v0 - Isaac-Velocity-Rough-G1-Play-v0 - Manager Based @@ -924,7 +958,7 @@ Comprehensive List of Environments * - Isaac-Velocity-Rough-Unitree-A1-v0 - Isaac-Velocity-Rough-Unitree-A1-Play-v0 - Manager Based - - **rsl_rl** (PPO), **skrl** (PPO) + - **rsl_rl** (PPO), **skrl** (PPO), **sb3** (PPO) * - Isaac-Velocity-Rough-Unitree-Go1-v0 - Isaac-Velocity-Rough-Unitree-Go1-Play-v0 - Manager Based diff --git a/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst b/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst index b48fb494b9c..4ff23a7a7b7 100644 --- a/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst +++ b/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst @@ -187,7 +187,7 @@ Stable-Baselines3 - Training an agent with `Stable-Baselines3 `__ - on ``Isaac-Cartpole-v0``: + on ``Isaac-Velocity-Flat-Unitree-A1-v0``: .. tab-set:: :sync-group: os @@ -200,14 +200,13 @@ Stable-Baselines3 # install python module (for stable-baselines3) ./isaaclab.sh -i sb3 # run script for training - # note: we set the device to cpu since SB3 doesn't optimize for GPU anyway - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Cartpole-v0 --headless --device cpu + ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless # run script for playing with 32 environments - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Cartpole-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip + ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip # run script for playing a pre-trained checkpoint with 32 environments - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Cartpole-v0 --num_envs 32 --use_pretrained_checkpoint + ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --use_pretrained_checkpoint # run script for recording video of a trained agent (requires installing `ffmpeg`) - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Cartpole-v0 --headless --video --video_length 200 + ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless --video --video_length 200 .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows @@ -217,14 +216,13 @@ Stable-Baselines3 :: install python module (for stable-baselines3) isaaclab.bat -i sb3 :: run script for training - :: note: we set the device to cpu since SB3 doesn't optimize for GPU anyway - isaaclab.bat -p scripts\reinforcement_learning\sb3\train.py --task Isaac-Cartpole-v0 --headless --device cpu + isaaclab.bat -p scripts\reinforcement_learning\sb3\train.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless :: run script for playing with 32 environments - isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Cartpole-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip + isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip :: run script for playing a pre-trained checkpoint with 32 environments - isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Cartpole-v0 --num_envs 32 --use_pretrained_checkpoint + isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --use_pretrained_checkpoint :: run script for recording video of a trained agent (requires installing `ffmpeg`) - isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Cartpole-v0 --headless --video --video_length 200 + isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless --video --video_length 200 All the scripts above log the training progress to `Tensorboard`_ in the ``logs`` directory in the root of the repository. The logs directory follows the pattern ``logs///``, where ```` diff --git a/docs/source/overview/reinforcement-learning/rl_frameworks.rst b/docs/source/overview/reinforcement-learning/rl_frameworks.rst index 00ceb8bb458..34d47c17cdc 100644 --- a/docs/source/overview/reinforcement-learning/rl_frameworks.rst +++ b/docs/source/overview/reinforcement-learning/rl_frameworks.rst @@ -71,17 +71,18 @@ Training Performance -------------------- We performed training with each RL library on the same ``Isaac-Humanoid-v0`` environment -with ``--headless`` on a single RTX 4090 GPU +with ``--headless`` on a single RTX PRO 6000 GPU using 4096 environments and logged the total training time for 65.5M steps for each RL library. + +--------------------+-----------------+ | RL Library | Time in seconds | +====================+=================+ -| RL-Games | 203 | +| RL-Games | 207 | +--------------------+-----------------+ -| SKRL | 204 | +| SKRL | 208 | +--------------------+-----------------+ -| RSL RL | 207 | +| RSL RL | 199 | +--------------------+-----------------+ -| Stable-Baselines3 | 6320 | +| Stable-Baselines3 | 322 | +--------------------+-----------------+ diff --git a/docs/source/setup/installation/binaries_installation.rst b/docs/source/setup/installation/binaries_installation.rst index 94ec856ed9e..5910fd151f7 100644 --- a/docs/source/setup/installation/binaries_installation.rst +++ b/docs/source/setup/installation/binaries_installation.rst @@ -451,6 +451,10 @@ On Windows machines, please terminate the process from Command Prompt using If you see this, then the installation was successful! |:tada:| +If you see an error ``ModuleNotFoundError: No module named 'isaacsim'``, ensure that the conda environment is activated +and ``source _isaac_sim/setup_conda_env.sh`` has been executed. + + Train a robot! ~~~~~~~~~~~~~~~ diff --git a/docs/source/setup/walkthrough/api_env_design.rst b/docs/source/setup/walkthrough/api_env_design.rst index 4f4967aee63..a2fbb070e59 100644 --- a/docs/source/setup/walkthrough/api_env_design.rst +++ b/docs/source/setup/walkthrough/api_env_design.rst @@ -78,7 +78,7 @@ the prim path with ``/World/envs/env_.*/Robot`` we are implicitly saying that ev The Environment ----------------- -Next, let's take a look at the contents of the other python file in our task directory: ``isaac_lab_tutorial_env_cfg.py`` +Next, let's take a look at the contents of the other python file in our task directory: ``isaac_lab_tutorial_env.py`` .. code-block:: python diff --git a/docs/source/setup/walkthrough/training_jetbot_gt.rst b/docs/source/setup/walkthrough/training_jetbot_gt.rst index f31684a78c0..05e89ef4564 100644 --- a/docs/source/setup/walkthrough/training_jetbot_gt.rst +++ b/docs/source/setup/walkthrough/training_jetbot_gt.rst @@ -104,7 +104,7 @@ Next, we need to expand the initialization and setup steps to construct the data Most of this is setting up the book keeping for the commands and markers, but the command initialization and the yaw calculations are worth diving into. The commands are sampled from a multivariate normal distribution via ``torch.randn`` with the z component fixed to zero and then normalized to unit length. In order to point our -command markers along these vectors, we need to rotate the base arrow mesh appropriately. This means we need to define a `quaternion `_` that will rotate the arrow +command markers along these vectors, we need to rotate the base arrow mesh appropriately. This means we need to define a `quaternion `_ that will rotate the arrow prim about the z axis by some angle defined by the command. By convention, rotations about the z axis are called a "yaw" rotation (akin to roll and pitch). Luckily for us, Isaac Lab provides a utility to generate a quaternion from an axis of rotation and an angle: :func:`isaaclab.utils.math.quat_from_axis_angle`, so the only diff --git a/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst b/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst index 25f39ee8c93..bf9d734270b 100644 --- a/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst +++ b/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst @@ -75,6 +75,8 @@ from linear algebra! Replace the contents of ``_get_observations`` with the foll observations = {"policy": obs} return observations +We also need to **edit the ``IsaacLabTutorialEnvCfg`` to set the observation space back to 3** which includes the dot product, the z component of the cross product, and the forward speed. + The dot or inner product tells us how aligned two vectors are as a single scalar quantity. If they are very aligned and pointed in the same direction, then the inner product will be large and positive, but if they are aligned and in opposite directions, it will be large and negative. If two vectors are perpendicular, the inner product is zero. This means that the inner product between the forward vector and the command vector can tell us diff --git a/scripts/reinforcement_learning/rl_games/play.py b/scripts/reinforcement_learning/rl_games/play.py index fae93845c92..38c8061d6a4 100644 --- a/scripts/reinforcement_learning/rl_games/play.py +++ b/scripts/reinforcement_learning/rl_games/play.py @@ -188,7 +188,7 @@ def main(): s[:, dones, :] = 0.0 if args_cli.video: timestep += 1 - # Exit the play loop after recording one video + # exit the play loop after recording one video if timestep == args_cli.video_length: break diff --git a/scripts/reinforcement_learning/rl_games/train.py b/scripts/reinforcement_learning/rl_games/train.py index 79053fbc845..eb350382979 100644 --- a/scripts/reinforcement_learning/rl_games/train.py +++ b/scripts/reinforcement_learning/rl_games/train.py @@ -194,10 +194,11 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen entity=args_cli.wandb_entity, name=experiment_name, sync_tensorboard=True, - config=agent_cfg, monitor_gym=True, save_code=True, ) + wandb.config.update({"env_cfg": env_cfg.to_dict()}) + wandb.config.update({"agent_cfg": agent_cfg}) if args_cli.checkpoint is not None: runner.run({"train": True, "play": False, "sigma": train_sigma, "checkpoint": resume_path}) diff --git a/scripts/reinforcement_learning/sb3/play.py b/scripts/reinforcement_learning/sb3/play.py index 78b958b31ca..0c3246a327e 100644 --- a/scripts/reinforcement_learning/sb3/play.py +++ b/scripts/reinforcement_learning/sb3/play.py @@ -8,6 +8,7 @@ """Launch Isaac Sim Simulator first.""" import argparse +from pathlib import Path from isaaclab.app import AppLauncher @@ -32,6 +33,12 @@ help="When no checkpoint provided, use the last saved model. Otherwise use the best saved model.", ) parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.") +parser.add_argument( + "--keep_all_info", + action="store_true", + default=False, + help="Use a slower SB3 wrapper but keep all the extra training info.", +) # append AppLauncher cli args AppLauncher.add_app_launcher_args(parser) # parse the arguments @@ -47,7 +54,6 @@ """Rest everything follows.""" import gymnasium as gym -import numpy as np import os import time import torch @@ -57,12 +63,13 @@ from isaaclab.envs import DirectMARLEnv, multi_agent_to_single_agent from isaaclab.utils.dict import print_dict +from isaaclab.utils.io import load_yaml from isaaclab.utils.pretrained_checkpoint import get_published_pretrained_checkpoint from isaaclab_rl.sb3 import Sb3VecEnvWrapper, process_sb3_cfg import isaaclab_tasks # noqa: F401 -from isaaclab_tasks.utils.parse_cfg import get_checkpoint_path, load_cfg_from_registry, parse_env_cfg +from isaaclab_tasks.utils.parse_cfg import get_checkpoint_path, parse_env_cfg # PLACEHOLDER: Extension template (do not remove this comment) @@ -73,35 +80,38 @@ def main(): env_cfg = parse_env_cfg( args_cli.task, device=args_cli.device, num_envs=args_cli.num_envs, use_fabric=not args_cli.disable_fabric ) - agent_cfg = load_cfg_from_registry(args_cli.task, "sb3_cfg_entry_point") task_name = args_cli.task.split(":")[-1] + train_task_name = task_name.replace("-Play", "") # directory for logging into - log_root_path = os.path.join("logs", "sb3", task_name) + log_root_path = os.path.join("logs", "sb3", train_task_name) log_root_path = os.path.abspath(log_root_path) # checkpoint and log_dir stuff if args_cli.use_pretrained_checkpoint: - checkpoint_path = get_published_pretrained_checkpoint("sb3", task_name) + checkpoint_path = get_published_pretrained_checkpoint("sb3", train_task_name) if not checkpoint_path: print("[INFO] Unfortunately a pre-trained checkpoint is currently unavailable for this task.") return elif args_cli.checkpoint is None: + # FIXME: last checkpoint doesn't seem to really use the last one' if args_cli.use_last_checkpoint: checkpoint = "model_.*.zip" else: checkpoint = "model.zip" - checkpoint_path = get_checkpoint_path(log_root_path, ".*", checkpoint) + checkpoint_path = get_checkpoint_path(log_root_path, ".*", checkpoint, sort_alpha=False) else: checkpoint_path = args_cli.checkpoint log_dir = os.path.dirname(checkpoint_path) - # post-process agent configuration - agent_cfg = process_sb3_cfg(agent_cfg) - # create isaac environment env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None) + # load the exact config used for training (instead of the default config) + agent_cfg = load_yaml(os.path.join(log_dir, "params", "agent.yaml")) + # post-process agent configuration + agent_cfg = process_sb3_cfg(agent_cfg, env.unwrapped.num_envs) + # convert to single-agent instance if required by the RL algorithm if isinstance(env.unwrapped, DirectMARLEnv): env = multi_agent_to_single_agent(env) @@ -118,18 +128,25 @@ def main(): print_dict(video_kwargs, nesting=4) env = gym.wrappers.RecordVideo(env, **video_kwargs) # wrap around environment for stable baselines - env = Sb3VecEnvWrapper(env) + env = Sb3VecEnvWrapper(env, fast_variant=not args_cli.keep_all_info) + + vec_norm_path = checkpoint_path.replace("/model", "/model_vecnormalize").replace(".zip", ".pkl") + vec_norm_path = Path(vec_norm_path) # normalize environment (if needed) - if "normalize_input" in agent_cfg: + if vec_norm_path.exists(): + print(f"Loading saved normalization: {vec_norm_path}") + env = VecNormalize.load(vec_norm_path, env) + # do not update them at test time + env.training = False + # reward normalization is not needed at test time + env.norm_reward = False + elif "normalize_input" in agent_cfg: env = VecNormalize( env, training=True, norm_obs="normalize_input" in agent_cfg and agent_cfg.pop("normalize_input"), - norm_reward="normalize_value" in agent_cfg and agent_cfg.pop("normalize_value"), clip_obs="clip_obs" in agent_cfg and agent_cfg.pop("clip_obs"), - gamma=agent_cfg["gamma"], - clip_reward=np.inf, ) # create agent from stable baselines diff --git a/scripts/reinforcement_learning/sb3/train.py b/scripts/reinforcement_learning/sb3/train.py index b4ab9fe8e7f..0b723d51d57 100644 --- a/scripts/reinforcement_learning/sb3/train.py +++ b/scripts/reinforcement_learning/sb3/train.py @@ -3,17 +3,16 @@ # # SPDX-License-Identifier: BSD-3-Clause -"""Script to train RL agent with Stable Baselines3. -Since Stable-Baselines3 does not support buffers living on GPU directly, -we recommend using smaller number of environments. Otherwise, -there will be significant overhead in GPU->CPU transfer. -""" +"""Script to train RL agent with Stable Baselines3.""" """Launch Isaac Sim Simulator first.""" import argparse +import contextlib +import signal import sys +from pathlib import Path from isaaclab.app import AppLauncher @@ -25,7 +24,14 @@ parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") parser.add_argument("--task", type=str, default=None, help="Name of the task.") parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") +parser.add_argument("--log_interval", type=int, default=100_000, help="Log data every n timesteps.") parser.add_argument("--max_iterations", type=int, default=None, help="RL Policy training iterations.") +parser.add_argument( + "--keep_all_info", + action="store_true", + default=False, + help="Use a slower SB3 wrapper but keep all the extra training info.", +) # append AppLauncher cli args AppLauncher.add_app_launcher_args(parser) # parse the arguments @@ -41,6 +47,24 @@ app_launcher = AppLauncher(args_cli) simulation_app = app_launcher.app + +def cleanup_pbar(*args): + """ + A small helper to stop training and + cleanup progress bar properly on ctrl+c + """ + import gc + + tqdm_objects = [obj for obj in gc.get_objects() if "tqdm" in type(obj).__name__] + for tqdm_object in tqdm_objects: + if "tqdm_rich" in type(tqdm_object).__name__: + tqdm_object.close() + raise KeyboardInterrupt + + +# disable KeyboardInterrupt override +signal.signal(signal.SIGINT, cleanup_pbar) + """Rest everything follows.""" import gymnasium as gym @@ -50,8 +74,7 @@ from datetime import datetime from stable_baselines3 import PPO -from stable_baselines3.common.callbacks import CheckpointCallback -from stable_baselines3.common.logger import configure +from stable_baselines3.common.callbacks import CheckpointCallback, LogEveryNTimesteps from stable_baselines3.common.vec_env import VecNormalize from isaaclab.envs import ( @@ -104,8 +127,12 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen dump_pickle(os.path.join(log_dir, "params", "env.pkl"), env_cfg) dump_pickle(os.path.join(log_dir, "params", "agent.pkl"), agent_cfg) + # save command used to run the script + command = " ".join(sys.orig_argv) + (Path(log_dir) / "command.txt").write_text(command) + # post-process agent configuration - agent_cfg = process_sb3_cfg(agent_cfg) + agent_cfg = process_sb3_cfg(agent_cfg, env_cfg.scene.num_envs) # read configurations about the agent-training policy_arch = agent_cfg.pop("policy") n_timesteps = agent_cfg.pop("n_timesteps") @@ -130,31 +157,49 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen env = gym.wrappers.RecordVideo(env, **video_kwargs) # wrap around environment for stable baselines - env = Sb3VecEnvWrapper(env) + env = Sb3VecEnvWrapper(env, fast_variant=not args_cli.keep_all_info) + + norm_keys = {"normalize_input", "normalize_value", "clip_obs"} + norm_args = {} + for key in norm_keys: + if key in agent_cfg: + norm_args[key] = agent_cfg.pop(key) - if "normalize_input" in agent_cfg: + if norm_args and norm_args.get("normalize_input"): + print(f"Normalizing input, {norm_args=}") env = VecNormalize( env, training=True, - norm_obs="normalize_input" in agent_cfg and agent_cfg.pop("normalize_input"), - norm_reward="normalize_value" in agent_cfg and agent_cfg.pop("normalize_value"), - clip_obs="clip_obs" in agent_cfg and agent_cfg.pop("clip_obs"), + norm_obs=norm_args["normalize_input"], + norm_reward=norm_args.get("normalize_value", False), + clip_obs=norm_args.get("clip_obs", 100.0), gamma=agent_cfg["gamma"], clip_reward=np.inf, ) # create agent from stable baselines - agent = PPO(policy_arch, env, verbose=1, **agent_cfg) - # configure the logger - new_logger = configure(log_dir, ["stdout", "tensorboard"]) - agent.set_logger(new_logger) + agent = PPO(policy_arch, env, verbose=1, tensorboard_log=log_dir, **agent_cfg) # callbacks for agent checkpoint_callback = CheckpointCallback(save_freq=1000, save_path=log_dir, name_prefix="model", verbose=2) + callbacks = [checkpoint_callback, LogEveryNTimesteps(n_steps=args_cli.log_interval)] + # train the agent - agent.learn(total_timesteps=n_timesteps, callback=checkpoint_callback) + with contextlib.suppress(KeyboardInterrupt): + agent.learn( + total_timesteps=n_timesteps, + callback=callbacks, + progress_bar=True, + log_interval=None, + ) # save the final model agent.save(os.path.join(log_dir, "model")) + print("Saving to:") + print(os.path.join(log_dir, "model.zip")) + + if isinstance(env, VecNormalize): + print("Saving normalization") + env.save(os.path.join(log_dir, "model_vecnormalize.pkl")) # close the simulator env.close() diff --git a/source/isaaclab/config/extension.toml b/source/isaaclab/config/extension.toml index ed7c2997613..b4719ba9424 100644 --- a/source/isaaclab/config/extension.toml +++ b/source/isaaclab/config/extension.toml @@ -1,7 +1,7 @@ [package] # Note: Semantic Versioning is used: https://semver.org/ -version = "0.40.5" +version = "0.40.12" # Description title = "Isaac Lab framework for Robot Learning" diff --git a/source/isaaclab/docs/CHANGELOG.rst b/source/isaaclab/docs/CHANGELOG.rst index f515d3411ae..d560029f05d 100644 --- a/source/isaaclab/docs/CHANGELOG.rst +++ b/source/isaaclab/docs/CHANGELOG.rst @@ -1,6 +1,84 @@ Changelog --------- +0.40.12 (2025-07-03) +~~~~~~~~~~~~~~~~~~~~ + +Changed +^^^^^^^ + +* Updated gymnasium to v1.2.0. This update includes fixes for a memory leak that appears when recording + videos with the ``--video`` flag. + + +0.40.11 (2025-06-27) +~~~~~~~~~~~~~~~~~~~~ + +Added +^^^^^ + +* Added unit test for :func:`~isaaclab.utils.math.quat_inv`. + +Fixed +^^^^^ + +* Fixed the implementation mistake in :func:`~isaaclab.utils.math.quat_inv`. + + +0.40.10 (2025-06-25) +~~~~~~~~~~~~~~~~~~~~ + +Fixed +^^^^^ + +* Fixed :func:`~isaaclab.utils.dict.update_class_from_dict` preventing setting flat Iterables with different lengths. + + +0.40.9 (2025-06-25) +~~~~~~~~~~~~~~~~~~~ + +Added +^^^^^ + +* Added ``sample_bias_per_component`` flag to :class:`~isaaclab.utils.noise.noise_model.NoiseModelWithAdditiveBias` to enable independent per-component bias + sampling, which is now the default behavior. If set to False, the previous behavior of sharing the same bias value across all components is retained. + + +0.40.8 (2025-06-18) +~~~~~~~~~~~~~~~~~~~ + +Fixed +^^^^^ + +* Fixed data inconsistency between read_body, read_link, read_com when write_body, write_com, write_joint performed, in + :class:`~isaaclab.assets.Articulation`, :class:`~isaaclab.assets.RigidObject`, and + :class:`~isaaclab.assets.RigidObjectCollection` +* added pytest that check against these data consistencies + + +0.40.7 (2025-06-24) +~~~~~~~~~~~~~~~~~~~ + +Added +^^^^^ + +* :class:`~isaaclab.utils.noise.NoiseModel` support for manager-based workflows. + +Changed +^^^^^^^ + +* Renamed :func:`~isaaclab.utils.noise.NoiseModel.apply` method to :func:`~isaaclab.utils.noise.NoiseModel.__call__`. + + +0.40.6 (2025-06-12) +~~~~~~~~~~~~~~~~~~~ + +Fixed +^^^^^ + +* Fixed potential issues in :func:`~isaaclab.envs.mdp.events.randomize_visual_texture_material` related to handling visual prims during texture randomization. + + 0.40.5 (2025-05-22) ~~~~~~~~~~~~~~~~~~~ diff --git a/source/isaaclab/isaaclab/app/app_launcher.py b/source/isaaclab/isaaclab/app/app_launcher.py index 2bdeb427e83..2c47175614b 100644 --- a/source/isaaclab/isaaclab/app/app_launcher.py +++ b/source/isaaclab/isaaclab/app/app_launcher.py @@ -661,7 +661,8 @@ def _resolve_device_settings(self, launcher_args: dict): # pass command line variable to kit sys.argv.append(f"--/plugins/carb.tasking.plugin/threadCount={num_threads_per_process}") - # set physics and rendering device + # set rendering device. We do not need to set physics_gpu because it will automatically pick the same one + # as the active_gpu device. Setting physics_gpu explicitly may result in a different device to be used. launcher_args["physics_gpu"] = self.device_id launcher_args["active_gpu"] = self.device_id diff --git a/source/isaaclab/isaaclab/assets/articulation/articulation.py b/source/isaaclab/isaaclab/assets/articulation/articulation.py index f12e9abbf10..44b0315e946 100644 --- a/source/isaaclab/isaaclab/assets/articulation/articulation.py +++ b/source/isaaclab/isaaclab/assets/articulation/articulation.py @@ -517,6 +517,12 @@ def write_joint_position_to_sim( # set into internal buffers self._data.joint_pos[env_ids, joint_ids] = position # Need to invalidate the buffer to trigger the update with the new root pose. + self._data._body_com_vel_w.timestamp = -1.0 + self._data._body_link_vel_w.timestamp = -1.0 + self._data._body_com_pose_b.timestamp = -1.0 + self._data._body_com_pose_w.timestamp = -1.0 + self._data._body_link_pose_w.timestamp = -1.0 + self._data._body_state_w.timestamp = -1.0 self._data._body_link_state_w.timestamp = -1.0 self._data._body_com_state_w.timestamp = -1.0 diff --git a/source/isaaclab/isaaclab/assets/rigid_object/rigid_object.py b/source/isaaclab/isaaclab/assets/rigid_object/rigid_object.py index ba0f2e5f0a8..f348f4b9193 100644 --- a/source/isaaclab/isaaclab/assets/rigid_object/rigid_object.py +++ b/source/isaaclab/isaaclab/assets/rigid_object/rigid_object.py @@ -220,11 +220,18 @@ def write_root_link_pose_to_sim(self, root_pose: torch.Tensor, env_ids: Sequence self._data.root_link_state_w[env_ids, :7] = self._data.root_link_pose_w[env_ids] if self._data._root_state_w.data is not None: self._data.root_state_w[env_ids, :7] = self._data.root_link_pose_w[env_ids] - + if self._data._root_com_state_w.data is not None: + expected_com_pos, expected_com_quat = math_utils.combine_frame_transforms( + self._data.root_link_pose_w[env_ids, :3], + self._data.root_link_pose_w[env_ids, 3:7], + self.data.body_com_pos_b[env_ids, 0, :], + self.data.body_com_quat_b[env_ids, 0, :], + ) + self._data.root_com_state_w[env_ids, :3] = expected_com_pos + self._data.root_com_state_w[env_ids, 3:7] = expected_com_quat # convert root quaternion from wxyz to xyzw root_poses_xyzw = self._data.root_link_pose_w.clone() root_poses_xyzw[:, 3:] = math_utils.convert_quat(root_poses_xyzw[:, 3:], to="xyzw") - # set into simulation self.root_physx_view.set_transforms(root_poses_xyzw, indices=physx_env_ids) @@ -301,9 +308,10 @@ def write_root_com_velocity_to_sim(self, root_velocity: torch.Tensor, env_ids: S self._data.root_com_state_w[env_ids, 7:] = self._data.root_com_vel_w[env_ids] if self._data._root_state_w.data is not None: self._data.root_state_w[env_ids, 7:] = self._data.root_com_vel_w[env_ids] + if self._data._root_link_state_w.data is not None: + self._data.root_link_state_w[env_ids, 7:] = self._data.root_com_vel_w[env_ids] # make the acceleration zero to prevent reporting old values self._data.body_com_acc_w[env_ids] = 0.0 - # set into simulation self.root_physx_view.set_velocities(self._data.root_com_vel_w, indices=physx_env_ids) diff --git a/source/isaaclab/isaaclab/assets/rigid_object_collection/rigid_object_collection.py b/source/isaaclab/isaaclab/assets/rigid_object_collection/rigid_object_collection.py index e719c7cdb0e..aad72582bd0 100644 --- a/source/isaaclab/isaaclab/assets/rigid_object_collection/rigid_object_collection.py +++ b/source/isaaclab/isaaclab/assets/rigid_object_collection/rigid_object_collection.py @@ -317,6 +317,18 @@ def write_object_link_pose_to_sim( self._data.object_link_state_w[env_ids[:, None], object_ids, :7] = object_pose.clone() if self._data._object_state_w.data is not None: self._data.object_state_w[env_ids[:, None], object_ids, :7] = object_pose.clone() + if self._data._object_com_state_w.data is not None: + # get CoM pose in link frame + com_pos_b = self.data.object_com_pos_b[env_ids[:, None], object_ids] + com_quat_b = self.data.object_com_quat_b[env_ids[:, None], object_ids] + com_pos, com_quat = math_utils.combine_frame_transforms( + object_pose[..., :3], + object_pose[..., 3:7], + com_pos_b, + com_quat_b, + ) + self._data.object_com_state_w[env_ids[:, None], object_ids, :3] = com_pos + self._data.object_com_state_w[env_ids[:, None], object_ids, 3:7] = com_quat # convert the quaternion from wxyz to xyzw poses_xyzw = self._data.object_link_pose_w.clone() @@ -415,6 +427,8 @@ def write_object_com_velocity_to_sim( self._data.object_com_state_w[env_ids[:, None], object_ids, 7:] = object_velocity.clone() if self._data._object_state_w.data is not None: self._data.object_state_w[env_ids[:, None], object_ids, 7:] = object_velocity.clone() + if self._data._object_link_state_w.data is not None: + self._data.object_link_state_w[env_ids[:, None], object_ids, 7:] = object_velocity.clone() # make the acceleration zero to prevent reporting old values self._data.object_com_acc_w[env_ids[:, None], object_ids] = 0.0 diff --git a/source/isaaclab/isaaclab/envs/direct_marl_env.py b/source/isaaclab/isaaclab/envs/direct_marl_env.py index 3f4867bb864..81f06b7cec7 100644 --- a/source/isaaclab/isaaclab/envs/direct_marl_env.py +++ b/source/isaaclab/isaaclab/envs/direct_marl_env.py @@ -356,7 +356,7 @@ def step(self, actions: dict[AgentID, ActionType]) -> EnvStepReturn: if self.cfg.action_noise_model: for agent, action in actions.items(): if agent in self._action_noise_model: - actions[agent] = self._action_noise_model[agent].apply(action) + actions[agent] = self._action_noise_model[agent](action) # process actions self._pre_physics_step(actions) @@ -409,7 +409,7 @@ def step(self, actions: dict[AgentID, ActionType]) -> EnvStepReturn: if self.cfg.observation_noise_model: for agent, obs in self.obs_dict.items(): if agent in self._observation_noise_model: - self.obs_dict[agent] = self._observation_noise_model[agent].apply(obs) + self.obs_dict[agent] = self._observation_noise_model[agent](obs) # return observations, rewards, resets and extras return self.obs_dict, self.reward_dict, self.terminated_dict, self.time_out_dict, self.extras diff --git a/source/isaaclab/isaaclab/envs/direct_rl_env.py b/source/isaaclab/isaaclab/envs/direct_rl_env.py index 81d7b02ebfc..147bc31ef4e 100644 --- a/source/isaaclab/isaaclab/envs/direct_rl_env.py +++ b/source/isaaclab/isaaclab/envs/direct_rl_env.py @@ -329,7 +329,7 @@ def step(self, action: torch.Tensor) -> VecEnvStepReturn: action = action.to(self.device) # add action noise if self.cfg.action_noise_model: - action = self._action_noise_model.apply(action) + action = self._action_noise_model(action) # process actions self._pre_physics_step(action) @@ -386,7 +386,7 @@ def step(self, action: torch.Tensor) -> VecEnvStepReturn: # add observation noise # note: we apply no noise to the state space (since it is used for critic networks) if self.cfg.observation_noise_model: - self.obs_buf["policy"] = self._observation_noise_model.apply(self.obs_buf["policy"]) + self.obs_buf["policy"] = self._observation_noise_model(self.obs_buf["policy"]) # return observations, rewards, resets and extras return self.obs_buf, self.reward_buf, self.reset_terminated, self.reset_time_outs, self.extras diff --git a/source/isaaclab/isaaclab/envs/mdp/events.py b/source/isaaclab/isaaclab/envs/mdp/events.py index 120dbd70d53..22b7a995212 100644 --- a/source/isaaclab/isaaclab/envs/mdp/events.py +++ b/source/isaaclab/isaaclab/envs/mdp/events.py @@ -1156,6 +1156,8 @@ def reset_scene_to_default(env: ManagerBasedEnv, env_ids: torch.Tensor): default_joint_pos = articulation_asset.data.default_joint_pos[env_ids].clone() default_joint_vel = articulation_asset.data.default_joint_vel[env_ids].clone() # set into the physics simulation + articulation_asset.set_joint_position_target(default_joint_pos, env_ids=env_ids) + articulation_asset.set_joint_velocity_target(default_joint_vel, env_ids=env_ids) articulation_asset.write_joint_state_to_sim(default_joint_pos, default_joint_vel, env_ids=env_ids) # deformable objects for deformable_object in env.scene.deformable_objects.values(): @@ -1229,8 +1231,25 @@ def __init__(self, cfg: EventTermCfg, env: ManagerBasedEnv): body_names_regex = ".*" # create the affected prim path - # TODO: Remove the hard-coded "/visuals" part. - prim_path = f"{asset.cfg.prim_path}/{body_names_regex}/visuals" + # Check if the pattern with '/visuals' yields results when matching `body_names_regex`. + # If not, fall back to a broader pattern without '/visuals'. + asset_main_prim_path = asset.cfg.prim_path + # Try the pattern with '/visuals' first for the generic case + pattern_with_visuals = f"{asset_main_prim_path}/{body_names_regex}/visuals" + # Use sim_utils to check if any prims currently match this pattern + matching_prims = sim_utils.find_matching_prim_paths(pattern_with_visuals) + if matching_prims: + # If matches are found, use the pattern with /visuals + prim_path = pattern_with_visuals + else: + # If no matches found, fall back to the broader pattern without /visuals + # This pattern (e.g., /World/envs/env_.*/Table/.*) should match visual prims + # whether they end in /visuals or have other structures. + prim_path = f"{asset_main_prim_path}/.*" + carb.log_info( + f"Pattern '{pattern_with_visuals}' found no prims. Falling back to '{prim_path}' for texture" + " randomization." + ) # Create the omni-graph node for the randomization term def rep_texture_randomization(): @@ -1240,7 +1259,6 @@ def rep_texture_randomization(): rep.randomizer.texture( textures=texture_paths, project_uvw=True, texture_rotate=rep.distribution.uniform(*texture_rotation) ) - return prims_group.node # Register the event to the replicator diff --git a/source/isaaclab/isaaclab/envs/mdp/rewards.py b/source/isaaclab/isaaclab/envs/mdp/rewards.py index 5f0cdda5836..20df2a557eb 100644 --- a/source/isaaclab/isaaclab/envs/mdp/rewards.py +++ b/source/isaaclab/isaaclab/envs/mdp/rewards.py @@ -268,6 +268,16 @@ def undesired_contacts(env: ManagerBasedRLEnv, threshold: float, sensor_cfg: Sce return torch.sum(is_contact, dim=1) +def desired_contacts(env, sensor_cfg: SceneEntityCfg, threshold: float = 1.0) -> torch.Tensor: + """Penalize if none of the desired contacts are present.""" + contact_sensor: ContactSensor = env.scene.sensors[sensor_cfg.name] + contacts = ( + contact_sensor.data.net_forces_w_history[:, :, sensor_cfg.body_ids, :].norm(dim=-1).max(dim=1)[0] > threshold + ) + zero_contact = (~contacts).all(dim=1) + return 1.0 * zero_contact + + def contact_forces(env: ManagerBasedRLEnv, threshold: float, sensor_cfg: SceneEntityCfg) -> torch.Tensor: """Penalize contact forces as the amount of violations of the net contact force.""" # extract the used quantities (to enable type-hinting) diff --git a/source/isaaclab/isaaclab/managers/manager_term_cfg.py b/source/isaaclab/isaaclab/managers/manager_term_cfg.py index 9927f91ce1a..137d91aae59 100644 --- a/source/isaaclab/isaaclab/managers/manager_term_cfg.py +++ b/source/isaaclab/isaaclab/managers/manager_term_cfg.py @@ -14,7 +14,7 @@ from isaaclab.utils import configclass from isaaclab.utils.modifiers import ModifierCfg -from isaaclab.utils.noise import NoiseCfg +from isaaclab.utils.noise import NoiseCfg, NoiseModelCfg from .scene_entity_cfg import SceneEntityCfg @@ -165,7 +165,7 @@ class ObservationTermCfg(ManagerTermBaseCfg): For more information on modifiers, see the :class:`~isaaclab.utils.modifiers.ModifierCfg` class. """ - noise: NoiseCfg | None = None + noise: NoiseCfg | NoiseModelCfg | None = None """The noise to add to the observation. Defaults to None, in which case no noise is added.""" clip: tuple[float, float] | None = None diff --git a/source/isaaclab/isaaclab/managers/observation_manager.py b/source/isaaclab/isaaclab/managers/observation_manager.py index 3524260b241..9b3f5fcaa2a 100644 --- a/source/isaaclab/isaaclab/managers/observation_manager.py +++ b/source/isaaclab/isaaclab/managers/observation_manager.py @@ -14,7 +14,7 @@ from prettytable import PrettyTable from typing import TYPE_CHECKING -from isaaclab.utils import class_to_dict, modifiers +from isaaclab.utils import class_to_dict, modifiers, noise from isaaclab.utils.buffers import CircularBuffer from .manager_base import ManagerBase, ManagerTermBase @@ -239,7 +239,7 @@ def reset(self, env_ids: Sequence[int] | None = None) -> dict[str, float]: if term_name in self._group_obs_term_history_buffer[group_name]: self._group_obs_term_history_buffer[group_name][term_name].reset(batch_ids=env_ids) # call all modifiers that are classes - for mod in self._group_obs_class_modifiers: + for mod in self._group_obs_class_instances: mod.reset(env_ids=env_ids) # nothing to log here @@ -320,8 +320,10 @@ def compute_group(self, group_name: str) -> torch.Tensor | dict[str, torch.Tenso if term_cfg.modifiers is not None: for modifier in term_cfg.modifiers: obs = modifier.func(obs, **modifier.params) - if term_cfg.noise: + if isinstance(term_cfg.noise, noise.NoiseCfg): obs = term_cfg.noise.func(obs, term_cfg.noise) + elif isinstance(term_cfg.noise, noise.NoiseModelCfg) and term_cfg.noise.func is not None: + obs = term_cfg.noise.func(obs) if term_cfg.clip: obs = obs.clip_(min=term_cfg.clip[0], max=term_cfg.clip[1]) if term_cfg.scale is not None: @@ -384,9 +386,9 @@ def _prepare_terms(self): self._group_obs_concatenate_dim: dict[str, int] = dict() self._group_obs_term_history_buffer: dict[str, dict] = dict() - # create a list to store modifiers that are classes + # create a list to store classes instances, e.g., for modifiers and noise models # we store it as a separate list to only call reset on them and prevent unnecessary calls - self._group_obs_class_modifiers: list[modifiers.ModifierBase] = list() + self._group_obs_class_instances: list[modifiers.ModifierBase | noise.NoiseModel] = list() # make sure the simulation is playing since we compute obs dims which needs asset quantities if not self._env.sim.is_playing(): @@ -497,7 +499,7 @@ def _prepare_terms(self): mod_cfg.func = mod_cfg.func(cfg=mod_cfg, data_dim=obs_dims, device=self._env.device) # add to list of class modifiers - self._group_obs_class_modifiers.append(mod_cfg.func) + self._group_obs_class_instances.append(mod_cfg.func) else: raise TypeError( f"Modifier configuration '{mod_cfg}' of observation term '{term_name}' is not of" @@ -527,6 +529,20 @@ def _prepare_terms(self): f" and optional parameters: {args_with_defaults}, but received: {term_params}." ) + # prepare noise model classes + if term_cfg.noise is not None and isinstance(term_cfg.noise, noise.NoiseModelCfg): + noise_model_cls = term_cfg.noise.class_type + if not issubclass(noise_model_cls, noise.NoiseModel): + raise TypeError( + f"Class type for observation term '{term_name}' NoiseModelCfg" + f" is not a subclass of 'NoiseModel'. Received: '{type(noise_model_cls)}'." + ) + # initialize func to be the noise model class instance + term_cfg.noise.func = noise_model_cls( + term_cfg.noise, num_envs=self._env.num_envs, device=self._env.device + ) + self._group_obs_class_instances.append(term_cfg.noise.func) + # create history buffers and calculate history term dimensions if term_cfg.history_length > 0: group_entry_history_buffer[term_name] = CircularBuffer( diff --git a/source/isaaclab/isaaclab/sim/simulation_cfg.py b/source/isaaclab/isaaclab/sim/simulation_cfg.py index 5a98a01e22c..e56389b3262 100644 --- a/source/isaaclab/isaaclab/sim/simulation_cfg.py +++ b/source/isaaclab/isaaclab/sim/simulation_cfg.py @@ -309,6 +309,11 @@ class SimulationCfg: Note: When enabled, the GUI will not update the physics parameters in real-time. To enable real-time updates, please set this flag to :obj:`False`. + + When using GPU simulation, it is required to enable Fabric to visualize updates in the renderer. + Transform updates are propagated to the renderer through Fabric. If Fabric is disabled with GPU simulation, + the renderer will not be able to render any updates in the simulation, although simulation will still be + running under the hood. """ physx: PhysxCfg = PhysxCfg() diff --git a/source/isaaclab/isaaclab/sim/spawners/from_files/from_files.py b/source/isaaclab/isaaclab/sim/spawners/from_files/from_files.py index 26643df3408..c10cdf7c92f 100644 --- a/source/isaaclab/isaaclab/sim/spawners/from_files/from_files.py +++ b/source/isaaclab/isaaclab/sim/spawners/from_files/from_files.py @@ -11,7 +11,13 @@ import isaacsim.core.utils.stage as stage_utils import omni.kit.commands import omni.log -from pxr import Gf, Sdf, Semantics, Usd +from pxr import Gf, Sdf, Usd + +# from Isaac Sim 4.2 onwards, pxr.Semantics is deprecated +try: + import Semantics +except ModuleNotFoundError: + from pxr import Semantics from isaaclab.sim import converters, schemas from isaaclab.sim.utils import bind_physics_material, bind_visual_material, clone, select_usd_variants diff --git a/source/isaaclab/isaaclab/utils/dict.py b/source/isaaclab/isaaclab/utils/dict.py index fe178b54573..6e3a3d71046 100644 --- a/source/isaaclab/isaaclab/utils/dict.py +++ b/source/isaaclab/isaaclab/utils/dict.py @@ -9,7 +9,7 @@ import hashlib import json import torch -from collections.abc import Iterable, Mapping +from collections.abc import Iterable, Mapping, Sized from typing import Any from .array import TENSOR_TYPE_CONVERSIONS, TENSOR_TYPES @@ -90,47 +90,79 @@ def update_class_from_dict(obj, data: dict[str, Any], _ns: str = "") -> None: for key, value in data.items(): # key_ns is the full namespace of the key key_ns = _ns + "/" + key - # check if key is present in the object - if hasattr(obj, key) or isinstance(obj, dict): + + # -- A) if key is present in the object ------------------------------------ + if hasattr(obj, key) or (isinstance(obj, dict) and key in obj): obj_mem = obj[key] if isinstance(obj, dict) else getattr(obj, key) + + # -- 1) nested mapping → recurse --------------------------- if isinstance(value, Mapping): # recursively call if it is a dictionary update_class_from_dict(obj_mem, value, _ns=key_ns) continue + + # -- 2) iterable (list / tuple / etc.) --------------------- if isinstance(value, Iterable) and not isinstance(value, str): - # check length of value to be safe - if len(obj_mem) != len(value) and obj_mem is not None: + + # ---- 2a) flat iterable → replace wholesale ---------- + if all(not isinstance(el, Mapping) for el in value): + out_val = tuple(value) if isinstance(obj_mem, tuple) else value + if isinstance(obj, dict): + obj[key] = out_val + else: + setattr(obj, key, out_val) + continue + + # ---- 2b) existing value is None → abort ------------- + if obj_mem is None: + raise ValueError( + f"[Config]: Cannot merge list under namespace: {key_ns} because the existing value is None." + ) + + # ---- 2c) length mismatch → abort ------------------- + if isinstance(obj_mem, Sized) and isinstance(value, Sized) and len(obj_mem) != len(value): raise ValueError( f"[Config]: Incorrect length under namespace: {key_ns}." f" Expected: {len(obj_mem)}, Received: {len(value)}." ) + + # ---- 2d) keep tuple/list parity & recurse ---------- if isinstance(obj_mem, tuple): value = tuple(value) else: set_obj = True - # recursively call if iterable contains dictionaries + # recursively call if iterable contains Mappings for i in range(len(obj_mem)): - if isinstance(value[i], dict): + if isinstance(value[i], Mapping): update_class_from_dict(obj_mem[i], value[i], _ns=key_ns) set_obj = False # do not set value to obj, otherwise it overwrites the cfg class with the dict if not set_obj: continue + + # -- 3) callable attribute → resolve string -------------- elif callable(obj_mem): # update function name value = string_to_callable(value) - elif isinstance(value, type(obj_mem)) or value is None: + + # -- 4) simple scalar / explicit None --------------------- + elif value is None or isinstance(value, type(obj_mem)): pass + + # -- 5) type mismatch → abort ----------------------------- else: raise ValueError( f"[Config]: Incorrect type under namespace: {key_ns}." f" Expected: {type(obj_mem)}, Received: {type(value)}." ) - # set value + + # -- 6) final assignment --------------------------------- if isinstance(obj, dict): obj[key] = value else: setattr(obj, key, value) + + # -- B) if key is not present ------------------------------------ else: raise KeyError(f"[Config]: Key not found under namespace: {key_ns}.") @@ -267,6 +299,8 @@ def replace_slices_with_strings(data: dict) -> dict: """ if isinstance(data, dict): return {k: replace_slices_with_strings(v) for k, v in data.items()} + elif isinstance(data, list): + return [replace_slices_with_strings(v) for v in data] elif isinstance(data, slice): return f"slice({data.start},{data.stop},{data.step})" else: @@ -284,6 +318,8 @@ def replace_strings_with_slices(data: dict) -> dict: """ if isinstance(data, dict): return {k: replace_strings_with_slices(v) for k, v in data.items()} + elif isinstance(data, list): + return [replace_strings_with_slices(v) for v in data] elif isinstance(data, str) and data.startswith("slice("): return string_to_slice(data) else: diff --git a/source/isaaclab/isaaclab/utils/math.py b/source/isaaclab/isaaclab/utils/math.py index 23d31bda5dc..c1e879a87a4 100644 --- a/source/isaaclab/isaaclab/utils/math.py +++ b/source/isaaclab/isaaclab/utils/math.py @@ -254,16 +254,17 @@ def quat_conjugate(q: torch.Tensor) -> torch.Tensor: @torch.jit.script -def quat_inv(q: torch.Tensor) -> torch.Tensor: - """Compute the inverse of a quaternion. +def quat_inv(q: torch.Tensor, eps: float = 1e-9) -> torch.Tensor: + """Computes the inverse of a quaternion. Args: q: The quaternion orientation in (w, x, y, z). Shape is (N, 4). + eps: A small value to avoid division by zero. Defaults to 1e-9. Returns: The inverse quaternion in (w, x, y, z). Shape is (N, 4). """ - return normalize(quat_conjugate(q)) + return quat_conjugate(q) / q.pow(2).sum(dim=-1, keepdim=True).clamp(min=eps) @torch.jit.script @@ -429,7 +430,9 @@ def matrix_from_euler(euler_angles: torch.Tensor, convention: str) -> torch.Tens @torch.jit.script -def euler_xyz_from_quat(quat: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]: +def euler_xyz_from_quat( + quat: torch.Tensor, wrap_to_2pi: bool = False +) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]: """Convert rotations given as quaternions to Euler angles in radians. Note: @@ -437,6 +440,9 @@ def euler_xyz_from_quat(quat: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, Args: quat: The quaternion orientation in (w, x, y, z). Shape is (N, 4). + wrap_to_2pi (bool): Whether to wrap output Euler angles into [0, 2π). If + False, angles are returned in the default range (−π, π]. Defaults to + False. Returns: A tuple containing roll-pitch-yaw. Each element is a tensor of shape (N,). @@ -459,7 +465,9 @@ def euler_xyz_from_quat(quat: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, cos_yaw = 1 - 2 * (q_y * q_y + q_z * q_z) yaw = torch.atan2(sin_yaw, cos_yaw) - return roll % (2 * torch.pi), pitch % (2 * torch.pi), yaw % (2 * torch.pi) # TODO: why not wrap_to_pi here ? + if wrap_to_2pi: + return roll % (2 * torch.pi), pitch % (2 * torch.pi), yaw % (2 * torch.pi) + return roll, pitch, yaw @torch.jit.script diff --git a/source/isaaclab/isaaclab/utils/noise/noise_cfg.py b/source/isaaclab/isaaclab/utils/noise/noise_cfg.py index 2f996af92d0..0c49828b3ff 100644 --- a/source/isaaclab/isaaclab/utils/noise/noise_cfg.py +++ b/source/isaaclab/isaaclab/utils/noise/noise_cfg.py @@ -78,6 +78,19 @@ class NoiseModelCfg: noise_cfg: NoiseCfg = MISSING """The noise configuration to use.""" + func: Callable[[torch.Tensor], torch.Tensor] | None = None + """Function or callable class used by this noise model. + + The function must take a single `torch.Tensor` (the batch of observations) as input + and return a `torch.Tensor` of the same shape with noise applied. + + It also supports `callable classes `_, + i.e. classes that implement the ``__call__()`` method. In this case, the class should inherit from the + :class:`NoiseModel` class and implement the required methods. + + This field is used internally by :class:ObservationManager and is not meant to be set directly. + """ + @configclass class NoiseModelWithAdditiveBiasCfg(NoiseModelCfg): @@ -90,3 +103,9 @@ class NoiseModelWithAdditiveBiasCfg(NoiseModelCfg): Based on this configuration, the bias is sampled at every reset of the noise model. """ + + sample_bias_per_component: bool = True + """Whether to sample a separate bias for each data component. + + Defaults to True. + """ diff --git a/source/isaaclab/isaaclab/utils/noise/noise_model.py b/source/isaaclab/isaaclab/utils/noise/noise_model.py index 6a6d4973237..dae36b55c72 100644 --- a/source/isaaclab/isaaclab/utils/noise/noise_model.py +++ b/source/isaaclab/isaaclab/utils/noise/noise_model.py @@ -130,7 +130,7 @@ def reset(self, env_ids: Sequence[int] | None = None): """ pass - def apply(self, data: torch.Tensor) -> torch.Tensor: + def __call__(self, data: torch.Tensor) -> torch.Tensor: """Apply the noise to the data. Args: @@ -154,6 +154,8 @@ def __init__(self, noise_model_cfg: noise_cfg.NoiseModelWithAdditiveBiasCfg, num # store the bias noise configuration self._bias_noise_cfg = noise_model_cfg.bias_noise_cfg self._bias = torch.zeros((num_envs, 1), device=self._device) + self._num_components: int | None = None + self._sample_bias_per_component = noise_model_cfg.sample_bias_per_component def reset(self, env_ids: Sequence[int] | None = None): """Reset the noise model. @@ -170,7 +172,7 @@ def reset(self, env_ids: Sequence[int] | None = None): # reset the bias term self._bias[env_ids] = self._bias_noise_cfg.func(self._bias[env_ids], self._bias_noise_cfg) - def apply(self, data: torch.Tensor) -> torch.Tensor: + def __call__(self, data: torch.Tensor) -> torch.Tensor: """Apply bias noise to the data. Args: @@ -179,4 +181,11 @@ def apply(self, data: torch.Tensor) -> torch.Tensor: Returns: The data with the noise applied. Shape is the same as the input data. """ - return super().apply(data) + self._bias + # if sample_bias_per_component, on first apply, expand bias to match last dim of data + if self._sample_bias_per_component and self._num_components is None: + *_, self._num_components = data.shape + # expand bias from (num_envs,1) to (num_envs, num_components) + self._bias = self._bias.repeat(1, self._num_components) + # now re-sample that expanded bias in-place + self.reset() + return super().__call__(data) + self._bias diff --git a/source/isaaclab/setup.py b/source/isaaclab/setup.py index a5be0711a4a..6c972ab776d 100644 --- a/source/isaaclab/setup.py +++ b/source/isaaclab/setup.py @@ -27,7 +27,7 @@ # devices "hidapi==0.14.0.post2", # reinforcement learning - "gymnasium>=1.0", + "gymnasium==1.2.0", # procedural-generation "trimesh", "pyglet<2", diff --git a/source/isaaclab/test/assets/test_articulation.py b/source/isaaclab/test/assets/test_articulation.py index 33ec8ec7537..5f8a3b7847a 100644 --- a/source/isaaclab/test/assets/test_articulation.py +++ b/source/isaaclab/test/assets/test_articulation.py @@ -1608,5 +1608,88 @@ def test_setting_invalid_articulation_root_prim_path(self): sim.reset() +@pytest.mark.parametrize("num_articulations", [1, 2]) +@pytest.mark.parametrize("device", ["cuda:0", "cpu"]) +@pytest.mark.parametrize("gravity_enabled", [False]) +def test_write_joint_state_data_consistency(sim, num_articulations, device, gravity_enabled): + """Test the setters for root_state using both the link frame and center of mass as reference frame. + + This test verifies that after write_joint_state_to_sim operations: + 1. state, com_state, link_state value consistency + 2. body_pose, link + Args: + sim: The simulation fixture + num_articulations: Number of articulations to test + device: The device to run the simulation on + """ + sim._app_control_on_stop_handle = None + articulation_cfg = generate_articulation_cfg(articulation_type="anymal") + articulation, env_pos = generate_articulation(articulation_cfg, num_articulations, device) + env_idx = torch.tensor([x for x in range(num_articulations)]) + + # Play sim + sim.reset() + + limits = torch.zeros(num_articulations, articulation.num_joints, 2, device=device) + limits[..., 0] = (torch.rand(num_articulations, articulation.num_joints, device=device) + 5.0) * -1.0 + limits[..., 1] = torch.rand(num_articulations, articulation.num_joints, device=device) + 5.0 + articulation.write_joint_position_limit_to_sim(limits) + + from torch.distributions import Uniform + + pos_dist = Uniform(articulation.data.joint_pos_limits[..., 0], articulation.data.joint_pos_limits[..., 1]) + vel_dist = Uniform(-articulation.data.joint_vel_limits, articulation.data.joint_vel_limits) + + original_body_states = articulation.data.body_state_w.clone() + + rand_joint_pos = pos_dist.sample() + rand_joint_vel = vel_dist.sample() + + articulation.write_joint_state_to_sim(rand_joint_pos, rand_joint_vel) + articulation.root_physx_view.get_jacobians() + # make sure valued updated + assert torch.count_nonzero(original_body_states[:, 1:] != articulation.data.body_state_w[:, 1:]) > ( + len(original_body_states[:, 1:]) / 2 + ) + # validate body - link consistency + torch.testing.assert_close(articulation.data.body_state_w[..., :7], articulation.data.body_link_state_w[..., :7]) + # skip 7:10 because they differs from link frame, this should be fine because we are only checking + # if velocity update is triggered, which can be determined by comparing angular velocity + torch.testing.assert_close(articulation.data.body_state_w[..., 10:], articulation.data.body_link_state_w[..., 10:]) + + # validate link - com conistency + expected_com_pos, expected_com_quat = math_utils.combine_frame_transforms( + articulation.data.body_link_state_w[..., :3].view(-1, 3), + articulation.data.body_link_state_w[..., 3:7].view(-1, 4), + articulation.data.body_com_pos_b.view(-1, 3), + articulation.data.body_com_quat_b.view(-1, 4), + ) + torch.testing.assert_close(expected_com_pos.view(len(env_idx), -1, 3), articulation.data.body_com_pos_w) + torch.testing.assert_close(expected_com_quat.view(len(env_idx), -1, 4), articulation.data.body_com_quat_w) + + # validate body - com consistency + torch.testing.assert_close(articulation.data.body_state_w[..., 7:10], articulation.data.body_com_lin_vel_w) + torch.testing.assert_close(articulation.data.body_state_w[..., 10:], articulation.data.body_com_ang_vel_w) + + # validate pos_w, quat_w, pos_b, quat_b is consistent with pose_w and pose_b + expected_com_pose_w = torch.cat((articulation.data.body_com_pos_w, articulation.data.body_com_quat_w), dim=2) + expected_com_pose_b = torch.cat((articulation.data.body_com_pos_b, articulation.data.body_com_quat_b), dim=2) + expected_body_pose_w = torch.cat((articulation.data.body_pos_w, articulation.data.body_quat_w), dim=2) + expected_body_link_pose_w = torch.cat( + (articulation.data.body_link_pos_w, articulation.data.body_link_quat_w), dim=2 + ) + torch.testing.assert_close(articulation.data.body_com_pose_w, expected_com_pose_w) + torch.testing.assert_close(articulation.data.body_com_pose_b, expected_com_pose_b) + torch.testing.assert_close(articulation.data.body_pose_w, expected_body_pose_w) + torch.testing.assert_close(articulation.data.body_link_pose_w, expected_body_link_pose_w) + + # validate pose_w is consistent state[..., :7] + torch.testing.assert_close(articulation.data.body_pose_w, articulation.data.body_state_w[..., :7]) + torch.testing.assert_close(articulation.data.body_vel_w, articulation.data.body_state_w[..., 7:]) + torch.testing.assert_close(articulation.data.body_link_pose_w, articulation.data.body_link_state_w[..., :7]) + torch.testing.assert_close(articulation.data.body_com_pose_w, articulation.data.body_com_state_w[..., :7]) + torch.testing.assert_close(articulation.data.body_vel_w, articulation.data.body_state_w[..., 7:]) + + if __name__ == "__main__": pytest.main([__file__, "-v", "--maxfail=1"]) diff --git a/source/isaaclab/test/assets/test_rigid_object.py b/source/isaaclab/test/assets/test_rigid_object.py index 5a11f15d165..6dc7b0b7d77 100644 --- a/source/isaaclab/test/assets/test_rigid_object.py +++ b/source/isaaclab/test/assets/test_rigid_object.py @@ -28,7 +28,15 @@ from isaaclab.sim import build_simulation_context from isaaclab.sim.spawners import materials from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR, ISAACLAB_NUCLEUS_DIR -from isaaclab.utils.math import default_orientation, quat_apply_inverse, quat_mul, random_orientation +from isaaclab.utils.math import ( + combine_frame_transforms, + default_orientation, + quat_apply_inverse, + quat_inv, + quat_mul, + quat_rotate, + random_orientation, +) def generate_cubes_scene( @@ -910,3 +918,109 @@ def test_write_root_state(num_cubes, device, with_offset, state_location): torch.testing.assert_close(rand_state, cube_object.data.root_com_state_w) elif state_location == "link": torch.testing.assert_close(rand_state, cube_object.data.root_link_state_w) + + +@pytest.mark.parametrize("num_cubes", [1, 2]) +@pytest.mark.parametrize("device", ["cuda:0", "cpu"]) +@pytest.mark.parametrize("with_offset", [True]) +@pytest.mark.parametrize("state_location", ["com", "link", "root"]) +def test_write_state_functions_data_consistency(num_cubes, device, with_offset, state_location): + """Test the setters for root_state using both the link frame and center of mass as reference frame.""" + with build_simulation_context(device=device, gravity_enabled=False, auto_add_lighting=True) as sim: + sim._app_control_on_stop_handle = None + # Create a scene with random cubes + cube_object, env_pos = generate_cubes_scene(num_cubes=num_cubes, height=0.0, device=device) + env_idx = torch.tensor([x for x in range(num_cubes)]) + + # Play sim + sim.reset() + + # Check if cube_object is initialized + assert cube_object.is_initialized + + # change center of mass offset from link frame + if with_offset: + offset = torch.tensor([0.1, 0.0, 0.0], device=device).repeat(num_cubes, 1) + else: + offset = torch.tensor([0.0, 0.0, 0.0], device=device).repeat(num_cubes, 1) + + com = cube_object.root_physx_view.get_coms() + com[..., :3] = offset.to("cpu") + cube_object.root_physx_view.set_coms(com, env_idx) + + # check ceter of mass has been set + torch.testing.assert_close(cube_object.root_physx_view.get_coms(), com) + + rand_state = torch.rand_like(cube_object.data.root_state_w) + # rand_state[..., :7] = cube_object.data.default_root_state[..., :7] + rand_state[..., :3] += env_pos + # make quaternion a unit vector + rand_state[..., 3:7] = torch.nn.functional.normalize(rand_state[..., 3:7], dim=-1) + + env_idx = env_idx.to(device) + + # perform step + sim.step() + # update buffers + cube_object.update(sim.cfg.dt) + + if state_location == "com": + cube_object.write_root_com_state_to_sim(rand_state) + elif state_location == "link": + cube_object.write_root_link_state_to_sim(rand_state) + elif state_location == "root": + cube_object.write_root_state_to_sim(rand_state) + + if state_location == "com": + expected_root_link_pos, expected_root_link_quat = combine_frame_transforms( + cube_object.data.root_com_state_w[:, :3], + cube_object.data.root_com_state_w[:, 3:7], + quat_rotate( + quat_inv(cube_object.data.body_com_pose_b[:, 0, 3:7]), -cube_object.data.body_com_pose_b[:, 0, :3] + ), + quat_inv(cube_object.data.body_com_pose_b[:, 0, 3:7]), + ) + expected_root_link_pose = torch.cat((expected_root_link_pos, expected_root_link_quat), dim=1) + # test both root_pose and root_link_state_w successfully updated when root_com_state_w updates + torch.testing.assert_close(expected_root_link_pose, cube_object.data.root_link_state_w[:, :7]) + # skip 7:10 because they differs from link frame, this should be fine because we are only checking + # if velocity update is triggered, which can be determined by comparing angular velocity + torch.testing.assert_close( + cube_object.data.root_com_state_w[:, 10:], cube_object.data.root_link_state_w[:, 10:] + ) + torch.testing.assert_close(expected_root_link_pose, cube_object.data.root_state_w[:, :7]) + torch.testing.assert_close(cube_object.data.root_com_state_w[:, 10:], cube_object.data.root_state_w[:, 10:]) + elif state_location == "link": + expected_com_pos, expected_com_quat = combine_frame_transforms( + cube_object.data.root_link_state_w[:, :3], + cube_object.data.root_link_state_w[:, 3:7], + cube_object.data.body_com_pose_b[:, 0, :3], + cube_object.data.body_com_pose_b[:, 0, 3:7], + ) + expected_com_pose = torch.cat((expected_com_pos, expected_com_quat), dim=1) + # test both root_pose and root_com_state_w successfully updated when root_link_state_w updates + torch.testing.assert_close(expected_com_pose, cube_object.data.root_com_state_w[:, :7]) + # skip 7:10 because they differs from link frame, this should be fine because we are only checking + # if velocity update is triggered, which can be determined by comparing angular velocity + torch.testing.assert_close( + cube_object.data.root_link_state_w[:, 10:], cube_object.data.root_com_state_w[:, 10:] + ) + torch.testing.assert_close(cube_object.data.root_link_state_w[:, :7], cube_object.data.root_state_w[:, :7]) + torch.testing.assert_close( + cube_object.data.root_link_state_w[:, 10:], cube_object.data.root_state_w[:, 10:] + ) + elif state_location == "root": + expected_com_pos, expected_com_quat = combine_frame_transforms( + cube_object.data.root_state_w[:, :3], + cube_object.data.root_state_w[:, 3:7], + cube_object.data.body_com_pose_b[:, 0, :3], + cube_object.data.body_com_pose_b[:, 0, 3:7], + ) + expected_com_pose = torch.cat((expected_com_pos, expected_com_quat), dim=1) + # test both root_com_state_w and root_link_state_w successfully updated when root_pose updates + torch.testing.assert_close(expected_com_pose, cube_object.data.root_com_state_w[:, :7]) + torch.testing.assert_close(cube_object.data.root_state_w[:, 7:], cube_object.data.root_com_state_w[:, 7:]) + torch.testing.assert_close(cube_object.data.root_state_w[:, :7], cube_object.data.root_link_state_w[:, :7]) + torch.testing.assert_close( + cube_object.data.root_state_w[:, 10:], cube_object.data.root_link_state_w[:, 10:] + ) diff --git a/source/isaaclab/test/assets/test_rigid_object_collection.py b/source/isaaclab/test/assets/test_rigid_object_collection.py index 21e5d832ee8..eb690a19d42 100644 --- a/source/isaaclab/test/assets/test_rigid_object_collection.py +++ b/source/isaaclab/test/assets/test_rigid_object_collection.py @@ -26,7 +26,16 @@ from isaaclab.assets import RigidObjectCfg, RigidObjectCollection, RigidObjectCollectionCfg from isaaclab.sim import build_simulation_context from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR -from isaaclab.utils.math import default_orientation, quat_apply_inverse, quat_mul, random_orientation +from isaaclab.utils.math import ( + combine_frame_transforms, + default_orientation, + quat_apply_inverse, + quat_inv, + quat_mul, + quat_rotate, + random_orientation, + subtract_frame_transforms, +) def generate_cubes_scene( @@ -601,3 +610,128 @@ def test_gravity_vec_w(sim, num_envs, num_cubes, device, gravity_enabled): # Check the body accelerations are correct torch.testing.assert_close(object_collection.data.object_acc_w, gravity) + + +@pytest.mark.parametrize("num_envs", [1, 3]) +@pytest.mark.parametrize("num_cubes", [1, 2]) +@pytest.mark.parametrize("device", ["cuda:0", "cpu"]) +@pytest.mark.parametrize("with_offset", [True]) +@pytest.mark.parametrize("state_location", ["com", "link", "root"]) +@pytest.mark.parametrize("gravity_enabled", [False]) +def test_write_object_state_functions_data_consistency( + sim, num_envs, num_cubes, device, with_offset, state_location, gravity_enabled +): + """Test the setters for object_state using both the link frame and center of mass as reference frame.""" + # Create a scene with random cubes + cube_object, env_pos = generate_cubes_scene(num_envs=num_envs, num_cubes=num_cubes, height=0.0, device=device) + view_ids = torch.tensor([x for x in range(num_cubes * num_cubes)]) + env_ids = torch.tensor([x for x in range(num_envs)]) + object_ids = torch.tensor([x for x in range(num_cubes)]) + + sim.reset() + + # Check if cube_object is initialized + assert cube_object.is_initialized + + # change center of mass offset from link frame + offset = ( + torch.tensor([0.1, 0.0, 0.0], device=device).repeat(num_envs, num_cubes, 1) + if with_offset + else torch.tensor([0.0, 0.0, 0.0], device=device).repeat(num_envs, num_cubes, 1) + ) + + com = cube_object.reshape_view_to_data(cube_object.root_physx_view.get_coms()) + com[..., :3] = offset.to("cpu") + cube_object.root_physx_view.set_coms(cube_object.reshape_data_to_view(com.clone()), view_ids) + + # check center of mass has been set + torch.testing.assert_close(cube_object.reshape_view_to_data(cube_object.root_physx_view.get_coms()), com) + + rand_state = torch.rand_like(cube_object.data.object_link_state_w) + rand_state[..., :3] += cube_object.data.object_link_pos_w + # make quaternion a unit vector + rand_state[..., 3:7] = torch.nn.functional.normalize(rand_state[..., 3:7], dim=-1) + + env_ids = env_ids.to(device) + object_ids = object_ids.to(device) + sim.step() + cube_object.update(sim.cfg.dt) + + object_link_to_com_pos, object_link_to_com_quat = subtract_frame_transforms( + cube_object.data.object_link_state_w[..., :3].view(-1, 3), + cube_object.data.object_link_state_w[..., 3:7].view(-1, 4), + cube_object.data.object_com_state_w[..., :3].view(-1, 3), + cube_object.data.object_com_state_w[..., 3:7].view(-1, 4), + ) + + if state_location == "com": + cube_object.write_object_com_state_to_sim(rand_state, env_ids=env_ids, object_ids=object_ids) + elif state_location == "link": + cube_object.write_object_link_state_to_sim(rand_state, env_ids=env_ids, object_ids=object_ids) + elif state_location == "root": + cube_object.write_object_state_to_sim(rand_state, env_ids=env_ids, object_ids=object_ids) + + if state_location == "com": + expected_root_link_pos, expected_root_link_quat = combine_frame_transforms( + cube_object.data.object_com_state_w[..., :3].view(-1, 3), + cube_object.data.object_com_state_w[..., 3:7].view(-1, 4), + quat_rotate(quat_inv(object_link_to_com_quat), -object_link_to_com_pos), + quat_inv(object_link_to_com_quat), + ) + # torch.testing.assert_close(rand_state, cube_object.data.object_com_state_w) + expected_object_link_pose = torch.cat((expected_root_link_pos, expected_root_link_quat), dim=1).view( + num_envs, -1, 7 + ) + # test both root_pose and root_link_state_w successfully updated when root_com_state_w updates + torch.testing.assert_close(expected_object_link_pose, cube_object.data.object_link_state_w[..., :7]) + # skip 7:10 because they differs from link frame, this should be fine because we are only checking + # if velocity update is triggered, which can be determined by comparing angular velocity + torch.testing.assert_close( + cube_object.data.object_com_state_w[..., 10:], cube_object.data.object_link_state_w[..., 10:] + ) + torch.testing.assert_close(expected_object_link_pose, cube_object.data.object_state_w[..., :7]) + torch.testing.assert_close( + cube_object.data.object_com_state_w[..., 10:], cube_object.data.object_state_w[..., 10:] + ) + elif state_location == "link": + expected_com_pos, expected_com_quat = combine_frame_transforms( + cube_object.data.object_link_state_w[..., :3].view(-1, 3), + cube_object.data.object_link_state_w[..., 3:7].view(-1, 4), + object_link_to_com_pos, + object_link_to_com_quat, + ) + expected_object_com_pose = torch.cat((expected_com_pos, expected_com_quat), dim=1).view(num_envs, -1, 7) + # test both root_pose and root_com_state_w successfully updated when root_link_state_w updates + torch.testing.assert_close(expected_object_com_pose, cube_object.data.object_com_state_w[..., :7]) + # skip 7:10 because they differs from link frame, this should be fine because we are only checking + # if velocity update is triggered, which can be determined by comparing angular velocity + torch.testing.assert_close( + cube_object.data.object_link_state_w[..., 10:], cube_object.data.object_com_state_w[..., 10:] + ) + torch.testing.assert_close( + cube_object.data.object_link_state_w[..., :7], cube_object.data.object_state_w[..., :7] + ) + torch.testing.assert_close( + cube_object.data.object_link_state_w[..., 10:], cube_object.data.object_state_w[..., 10:] + ) + elif state_location == "root": + expected_object_com_pos, expected_object_com_quat = combine_frame_transforms( + cube_object.data.object_state_w[..., :3].view(-1, 3), + cube_object.data.object_state_w[..., 3:7].view(-1, 4), + object_link_to_com_pos, + object_link_to_com_quat, + ) + expected_object_com_pose = torch.cat((expected_object_com_pos, expected_object_com_quat), dim=1).view( + num_envs, -1, 7 + ) + # test both root_com_state_w and root_link_state_w successfully updated when root_pose updates + torch.testing.assert_close(expected_object_com_pose, cube_object.data.object_com_state_w[..., :7]) + torch.testing.assert_close( + cube_object.data.object_state_w[..., 7:], cube_object.data.object_com_state_w[..., 7:] + ) + torch.testing.assert_close( + cube_object.data.object_state_w[..., :7], cube_object.data.object_link_state_w[..., :7] + ) + torch.testing.assert_close( + cube_object.data.object_state_w[..., 10:], cube_object.data.object_link_state_w[..., 10:] + ) diff --git a/source/isaaclab/test/envs/test_texture_randomization.py b/source/isaaclab/test/envs/test_texture_randomization.py index a034929f145..2f8fc580e26 100644 --- a/source/isaaclab/test/envs/test_texture_randomization.py +++ b/source/isaaclab/test/envs/test_texture_randomization.py @@ -20,6 +20,7 @@ import math import torch import unittest +from unittest.mock import patch import omni.usd @@ -127,6 +128,36 @@ class EventCfg: ) +@configclass +class EventCfgFallback: + """Configuration for events that tests the fallback mechanism.""" + + # Test fallback when /visuals pattern doesn't match + test_fallback_texture_randomizer = EventTerm( + func=mdp.randomize_visual_texture_material, + mode="reset", + params={ + "asset_cfg": SceneEntityCfg("robot", body_names=["slider"]), + "texture_paths": [ + f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Bamboo_Planks/Bamboo_Planks_BaseColor.png", + f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Cherry/Cherry_BaseColor.png", + ], + "event_name": "test_fallback_texture_randomizer", + "texture_rotation": (0.0, 0.0), + }, + ) + + reset_cart_position = EventTerm( + func=mdp.reset_joints_by_offset, + mode="reset", + params={ + "asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"]), + "position_range": (-1.0, 1.0), + "velocity_range": (-0.1, 0.1), + }, + ) + + @configclass class CartpoleEnvCfg(ManagerBasedEnvCfg): """Configuration for the cartpole environment.""" @@ -150,6 +181,29 @@ def __post_init__(self): self.sim.dt = 0.005 # sim step every 5ms: 200Hz +@configclass +class CartpoleEnvCfgFallback(ManagerBasedEnvCfg): + """Configuration for the cartpole environment that tests fallback mechanism.""" + + # Scene settings + scene = CartpoleSceneCfg(env_spacing=2.5) + + # Basic settings + actions = ActionsCfg() + observations = ObservationsCfg() + events = EventCfgFallback() + + def __post_init__(self): + """Post initialization.""" + # viewer settings + self.viewer.eye = (4.5, 0.0, 6.0) + self.viewer.lookat = (0.0, 0.0, 2.0) + # step settings + self.decimation = 4 # env step every 4 sim steps: 200Hz / 4 = 50Hz + # simulation settings + self.sim.dt = 0.005 # sim step every 5ms: 200Hz + + class TestTextureRandomization(unittest.TestCase): """Test for texture randomization""" @@ -186,6 +240,46 @@ def test_texture_randomization(self): env.close() + def test_texture_randomization_fallback(self): + """Test texture randomization fallback mechanism when /visuals pattern doesn't match.""" + + def mock_find_matching_prim_paths(pattern): + """Mock function that simulates a case where /visuals pattern doesn't match.""" + # If the pattern contains '/visuals', return empty list to trigger fallback + if pattern.endswith("/visuals"): + return [] + return None + + for device in ["cpu", "cuda"]: + with self.subTest(device=device): + # create a new stage + omni.usd.get_context().new_stage() + + # set the arguments - use fallback config + env_cfg = CartpoleEnvCfgFallback() + env_cfg.scene.num_envs = 16 + env_cfg.scene.replicate_physics = False + env_cfg.sim.device = device + + with patch.object( + mdp.events.sim_utils, "find_matching_prim_paths", side_effect=mock_find_matching_prim_paths + ): + # This should trigger the fallback mechanism and log the fallback message + env = ManagerBasedEnv(cfg=env_cfg) + + # simulate physics + with torch.inference_mode(): + for count in range(20): # shorter test for fallback + # reset every few steps to check nothing breaks + if count % 10 == 0: + env.reset() + # sample random actions + joint_efforts = torch.randn_like(env.action_manager.action) + # step the environment + env.step(joint_efforts) + + env.close() + def test_texture_randomization_failure_replicate_physics(self): """Test texture randomization failure when replicate physics is set to True.""" # create a new stage diff --git a/source/isaaclab/test/sensors/test_tiled_camera.py b/source/isaaclab/test/sensors/test_tiled_camera.py index a00bc44a2d6..79f70a78809 100644 --- a/source/isaaclab/test/sensors/test_tiled_camera.py +++ b/source/isaaclab/test/sensors/test_tiled_camera.py @@ -25,7 +25,13 @@ import omni.replicator.core as rep import pytest from isaacsim.core.prims import SingleGeometryPrim, SingleRigidPrim -from pxr import Gf, Semantics, UsdGeom +from pxr import Gf, UsdGeom + +# from Isaac Sim 4.2 onwards, pxr.Semantics is deprecated +try: + import Semantics +except ModuleNotFoundError: + from pxr import Semantics import isaaclab.sim as sim_utils from isaaclab.sensors.camera import Camera, CameraCfg, TiledCamera, TiledCameraCfg diff --git a/source/isaaclab/test/utils/test_configclass.py b/source/isaaclab/test/utils/test_configclass.py index 81e876bd97e..6fbfb4ee8f9 100644 --- a/source/isaaclab/test/utils/test_configclass.py +++ b/source/isaaclab/test/utils/test_configclass.py @@ -643,6 +643,28 @@ def test_config_update_nested_dict(): assert isinstance(cfg.list_1[1].viewer, ViewerCfg) +def test_config_update_different_iterable_lengths(): + """Iterables are whole replaced, even if their lengths are different.""" + + # original cfg has length-6 tuple and list + cfg = RobotDefaultStateCfg() + assert cfg.dof_pos == (0.0, 0.0, 0.0, 0.0, 0.0, 0.0) + assert cfg.dof_vel == [0.0, 0.0, 0.0, 0.0, 0.0, 1.0] + + # patch uses different lengths + patch = { + "dof_pos": (1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0), # longer tuple + "dof_vel": [9.0, 8.0, 7.0], # shorter list + } + + # should not raise + update_class_from_dict(cfg, patch) + + # whole sequences are replaced + assert cfg.dof_pos == (1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0) + assert cfg.dof_vel == [9.0, 8.0, 7.0] + + def test_config_update_dict_using_internal(): """Test updating configclass from a dictionary using configclass method.""" cfg = BasicDemoCfg() diff --git a/source/isaaclab/test/utils/test_math.py b/source/isaaclab/test/utils/test_math.py index a78d735fb13..b159182121f 100644 --- a/source/isaaclab/test/utils/test_math.py +++ b/source/isaaclab/test/utils/test_math.py @@ -593,6 +593,35 @@ def test_quat_apply_inverse(device): torch.testing.assert_close(scipy_result.to(device=device), apply_result, atol=2e-4, rtol=2e-4) +@pytest.mark.parametrize("device", ["cpu", "cuda:0"]) +def test_quat_inv(device): + """Test for quat_inv method. + + For random unit and non-unit quaternions q, the Hamilton products + q ⊗ q⁻¹ and q⁻¹ ⊗ q must both equal the identity quaternion (1,0,0,0) + within numerical precision. + """ + num = 2048 + + # -------- non-unit sample (average ‖q‖ ≈ 10) -------- + q_nonunit = torch.randn(num, 4, device=device) * 5.0 + + # -------- unit sample (‖q‖ = 1) -------- + q_unit = torch.randn(num, 4, device=device) + q_unit = q_unit / q_unit.norm(dim=-1, keepdim=True) + + identity = torch.tensor([1.0, 0.0, 0.0, 0.0], device=device) + + for q in (q_nonunit, q_unit): + q_inv = math_utils.quat_inv(q) + + id_batch = identity.expand_as(q) + + # left and right products must both be identity + torch.testing.assert_close(math_utils.quat_mul(q, q_inv), id_batch, atol=1e-4, rtol=1e-4) + torch.testing.assert_close(math_utils.quat_mul(q_inv, q), id_batch, atol=1e-4, rtol=1e-4) + + def test_quat_apply_benchmarks(): """Test for quat_apply and quat_apply_inverse methods compared to old methods using torch.bmm and torch.einsum. The new implementation uses :meth:`torch.einsum` instead of `torch.bmm` which allows @@ -847,3 +876,57 @@ def test_interpolate_rotations(): # Assert that the result is almost equal to the expected quaternion np.testing.assert_array_almost_equal(result_axis_angle.cpu(), expected, decimal=DECIMAL_PRECISION) + + +def test_euler_xyz_from_quat(): + """Test euler_xyz_from_quat function. + + This test checks the output from the :meth:`~isaaclab.utils.math_utils.euler_xyz_from_quat` function + against the expected output for various quaternions. + The test includes quaternions representing different rotations around the x, y, and z axes. + The test is performed for both the default output range (-π, π] and the wrapped output range [0, 2π). + """ + quats = [ + torch.Tensor([[1.0, 0.0, 0.0, 0.0]]), # 0° around x, y, z + torch.Tensor([ + [0.9238795, 0.3826834, 0.0, 0.0], # 45° around x + [0.9238795, 0.0, -0.3826834, 0.0], # -45° around y + [0.9238795, 0.0, 0.0, -0.3826834], # -45° around z + ]), + torch.Tensor([ + [0.7071068, -0.7071068, 0.0, 0.0], # -90° around x + [0.7071068, 0.0, 0.0, -0.7071068], # -90° around z + ]), + torch.Tensor([ + [0.3826834, -0.9238795, 0.0, 0.0], # -135° around x + [0.3826834, 0.0, 0.0, -0.9238795], # -135° around y + ]), + ] + + expected_euler_angles = [ + torch.Tensor([[0.0, 0.0, 0.0]]), # identity + torch.Tensor([ + [torch.pi / 4, 0.0, 0.0], # 45° about x + [0.0, -torch.pi / 4, 0.0], # -45° about y + [0.0, 0.0, -torch.pi / 4], # -45° about z + ]), + torch.Tensor([ + [-torch.pi / 2, 0.0, 0.0], # -90° about x + [0.0, 0.0, -torch.pi / 2], # -90° about z + ]), + torch.Tensor([ + [-3 * torch.pi / 4, 0.0, 0.0], # -135° about x + [0.0, 0.0, -3 * torch.pi / 4], # -135° about y + ]), + ] + + # Test 1: default no-wrap range from (-π, π] + for quat, expected in zip(quats, expected_euler_angles): + output = torch.stack(math_utils.euler_xyz_from_quat(quat), dim=-1) + torch.testing.assert_close(output, expected) + + # Test 2: wrap to [0, 2π) + for quat, expected in zip(quats, expected_euler_angles): + wrapped = expected % (2 * torch.pi) + output = torch.stack(math_utils.euler_xyz_from_quat(quat, wrap_to_2pi=True), dim=-1) + torch.testing.assert_close(output, wrapped) diff --git a/source/isaaclab_assets/isaaclab_assets/robots/agility.py b/source/isaaclab_assets/isaaclab_assets/robots/agility.py new file mode 100644 index 00000000000..090298d5996 --- /dev/null +++ b/source/isaaclab_assets/isaaclab_assets/robots/agility.py @@ -0,0 +1,39 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import isaaclab.sim as sim_utils +from isaaclab.actuators import ImplicitActuatorCfg +from isaaclab.assets.articulation import ArticulationCfg +from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR + +LEG_JOINT_NAMES = [ + ".*_hip_roll", + ".*_hip_yaw", + ".*_hip_pitch", + ".*_knee", + ".*_toe_a", + ".*_toe_b", +] + +ARM_JOINT_NAMES = [".*_arm_.*"] + + +DIGIT_V4_CFG = ArticulationCfg( + spawn=sim_utils.UsdFileCfg( + usd_path=f"{ISAAC_NUCLEUS_DIR}/Robots/Agility/Digit/digit_v4.usd", + activate_contact_sensors=True, + ), + init_state=ArticulationCfg.InitialStateCfg( + pos=(0.0, 0.0, 1.05), + ), + soft_joint_pos_limit_factor=0.9, + actuators={ + "all": ImplicitActuatorCfg( + joint_names_expr=".*", + stiffness=None, + damping=None, + ), + }, +) diff --git a/source/isaaclab_mimic/setup.py b/source/isaaclab_mimic/setup.py index e8bc75b4ab2..510e16a8dba 100644 --- a/source/isaaclab_mimic/setup.py +++ b/source/isaaclab_mimic/setup.py @@ -29,7 +29,7 @@ # Check if the platform is Linux and add the dependency if platform.system() == "Linux": - EXTRAS_REQUIRE["robomimic"].append("robomimic@git+https://github.com/ARISE-Initiative/robomimic.git") + EXTRAS_REQUIRE["robomimic"].append("robomimic@git+https://github.com/ARISE-Initiative/robomimic.git@v0.4.0") # Cumulation of all extra-requires EXTRAS_REQUIRE["all"] = list(itertools.chain.from_iterable(EXTRAS_REQUIRE.values())) diff --git a/source/isaaclab_rl/config/extension.toml b/source/isaaclab_rl/config/extension.toml index 6eeca9c1a97..9e2c873573f 100644 --- a/source/isaaclab_rl/config/extension.toml +++ b/source/isaaclab_rl/config/extension.toml @@ -1,7 +1,7 @@ [package] # Note: Semantic Versioning is used: https://semver.org/ -version = "0.1.4" +version = "0.1.6" # Description title = "Isaac Lab RL" diff --git a/source/isaaclab_rl/docs/CHANGELOG.rst b/source/isaaclab_rl/docs/CHANGELOG.rst index 9ade85682f0..c39bed61f50 100644 --- a/source/isaaclab_rl/docs/CHANGELOG.rst +++ b/source/isaaclab_rl/docs/CHANGELOG.rst @@ -1,6 +1,23 @@ Changelog --------- +0.1.6 (2025-06-26) +~~~~~~~~~~~~~~~~~~ + +Fixed +^^^^^ + +* Relaxed upper range pin for protobuf python dependency for more permissive installation. + + +0.1.5 (2025-04-11) +~~~~~~~~~~~~~~~~~~ + +Changed +^^^^^^^ +* Optimized Stable-Baselines3 wrapper ``Sb3VecEnvWrapper`` (now 4x faster) by using Numpy buffers and only logging episode and truncation information by default. +* Upgraded minimum SB3 version to 2.6.0 and added optional dependencies for progress bar + 0.1.4 (2025-04-10) ~~~~~~~~~~~~~~~~~~ diff --git a/source/isaaclab_rl/isaaclab_rl/sb3.py b/source/isaaclab_rl/isaaclab_rl/sb3.py index d4974c4c3d1..46322a154f6 100644 --- a/source/isaaclab_rl/isaaclab_rl/sb3.py +++ b/source/isaaclab_rl/isaaclab_rl/sb3.py @@ -22,6 +22,7 @@ import numpy as np import torch import torch.nn as nn # noqa: F401 +import warnings from typing import Any from stable_baselines3.common.utils import constant_fn @@ -29,16 +30,20 @@ from isaaclab.envs import DirectRLEnv, ManagerBasedRLEnv +# remove SB3 warnings because PPO with bigger net actually benefits from GPU +warnings.filterwarnings("ignore", message="You are trying to run PPO on the GPU") + """ Configuration Parser. """ -def process_sb3_cfg(cfg: dict) -> dict: +def process_sb3_cfg(cfg: dict, num_envs: int) -> dict: """Convert simple YAML types to Stable-Baselines classes/components. Args: cfg: A configuration dictionary. + num_envs: the number of parallel environments (used to compute `batch_size` for a desired number of minibatches) Returns: A dictionary containing the converted configuration. @@ -54,19 +59,24 @@ def update_dict(hyperparams: dict[str, Any]) -> dict[str, Any]: else: if key in ["policy_kwargs", "replay_buffer_class", "replay_buffer_kwargs"]: hyperparams[key] = eval(value) - elif key in ["learning_rate", "clip_range", "clip_range_vf", "delta_std"]: + elif key in ["learning_rate", "clip_range", "clip_range_vf"]: if isinstance(value, str): _, initial_value = value.split("_") initial_value = float(initial_value) hyperparams[key] = lambda progress_remaining: progress_remaining * initial_value elif isinstance(value, (float, int)): - # Negative value: ignore (ex: for clipping) + # negative value: ignore (ex: for clipping) if value < 0: continue hyperparams[key] = constant_fn(float(value)) else: raise ValueError(f"Invalid value for {key}: {hyperparams[key]}") + # Convert to a desired batch_size (n_steps=2048 by default for SB3 PPO) + if "n_minibatches" in hyperparams: + hyperparams["batch_size"] = (hyperparams.get("n_steps", 2048) * num_envs) // hyperparams["n_minibatches"] + del hyperparams["n_minibatches"] + return hyperparams # parse agent configuration and convert to classes @@ -89,8 +99,8 @@ class Sb3VecEnvWrapper(VecEnv): Note: While Stable-Baselines3 supports Gym 0.26+ API, their vectorized environment - still uses the old API (i.e. it is closer to Gym 0.21). Thus, we implement - the old API for the vectorized environment. + uses their own API (i.e. it is closer to Gym 0.21). Thus, we implement + the API for the vectorized environment. We also add monitoring functionality that computes the un-discounted episode return and length. This information is added to the info dicts under key `episode`. @@ -123,12 +133,13 @@ class Sb3VecEnvWrapper(VecEnv): """ - def __init__(self, env: ManagerBasedRLEnv | DirectRLEnv): + def __init__(self, env: ManagerBasedRLEnv | DirectRLEnv, fast_variant: bool = True): """Initialize the wrapper. Args: env: The environment to wrap around. - + fast_variant: Use fast variant for processing info + (Only episodic reward, lengths and truncation info are included) Raises: ValueError: When the environment is not an instance of :class:`ManagerBasedRLEnv` or :class:`DirectRLEnv`. """ @@ -140,6 +151,7 @@ def __init__(self, env: ManagerBasedRLEnv | DirectRLEnv): ) # initialize the wrapper self.env = env + self.fast_variant = fast_variant # collect common information self.num_envs = self.unwrapped.num_envs self.sim_device = self.unwrapped.device @@ -156,8 +168,8 @@ def __init__(self, env: ManagerBasedRLEnv | DirectRLEnv): # initialize vec-env VecEnv.__init__(self, self.num_envs, observation_space, action_space) # add buffer for logging episodic information - self._ep_rew_buf = torch.zeros(self.num_envs, device=self.sim_device) - self._ep_len_buf = torch.zeros(self.num_envs, device=self.sim_device) + self._ep_rew_buf = np.zeros(self.num_envs) + self._ep_len_buf = np.zeros(self.num_envs) def __str__(self): """Returns the wrapper name and the :attr:`env` representation string.""" @@ -190,11 +202,11 @@ def unwrapped(self) -> ManagerBasedRLEnv | DirectRLEnv: def get_episode_rewards(self) -> list[float]: """Returns the rewards of all the episodes.""" - return self._ep_rew_buf.cpu().tolist() + return self._ep_rew_buf.tolist() def get_episode_lengths(self) -> list[int]: """Returns the number of time-steps of all the episodes.""" - return self._ep_len_buf.cpu().tolist() + return self._ep_len_buf.tolist() """ Operations - MDP @@ -206,8 +218,8 @@ def seed(self, seed: int | None = None) -> list[int | None]: # noqa: D102 def reset(self) -> VecEnvObs: # noqa: D102 obs_dict, _ = self.env.reset() # reset episodic information buffers - self._ep_rew_buf.zero_() - self._ep_len_buf.zero_() + self._ep_rew_buf = np.zeros(self.num_envs) + self._ep_len_buf = np.zeros(self.num_envs) # convert data types to numpy depending on backend return self._process_obs(obs_dict) @@ -224,28 +236,30 @@ def step_async(self, actions): # noqa: D102 def step_wait(self) -> VecEnvStepReturn: # noqa: D102 # record step information obs_dict, rew, terminated, truncated, extras = self.env.step(self._async_actions) - # update episode un-discounted return and length - self._ep_rew_buf += rew - self._ep_len_buf += 1 # compute reset ids dones = terminated | truncated - reset_ids = (dones > 0).nonzero(as_tuple=False) # convert data types to numpy depending on backend # note: ManagerBasedRLEnv uses torch backend (by default). obs = self._process_obs(obs_dict) - rew = rew.detach().cpu().numpy() + rewards = rew.detach().cpu().numpy() terminated = terminated.detach().cpu().numpy() truncated = truncated.detach().cpu().numpy() dones = dones.detach().cpu().numpy() + + reset_ids = dones.nonzero()[0] + + # update episode un-discounted return and length + self._ep_rew_buf += rewards + self._ep_len_buf += 1 # convert extra information to list of dicts infos = self._process_extras(obs, terminated, truncated, extras, reset_ids) # reset info for terminated environments - self._ep_rew_buf[reset_ids] = 0 + self._ep_rew_buf[reset_ids] = 0.0 self._ep_len_buf[reset_ids] = 0 - return obs, rew, dones, infos + return obs, rewards, dones, infos def close(self): # noqa: D102 self.env.close() @@ -279,7 +293,8 @@ def env_method(self, method_name: str, *method_args, indices=None, **method_kwar return env_method(*method_args, indices=indices, **method_kwargs) def env_is_wrapped(self, wrapper_class, indices=None): # noqa: D102 - raise NotImplementedError("Checking if environment is wrapped is not supported.") + # fake implementation to be able to use `evaluate_policy()` helper + return [False] def get_images(self): # noqa: D102 raise NotImplementedError("Getting images is not supported.") @@ -306,6 +321,29 @@ def _process_extras( self, obs: np.ndarray, terminated: np.ndarray, truncated: np.ndarray, extras: dict, reset_ids: np.ndarray ) -> list[dict[str, Any]]: """Convert miscellaneous information into dictionary for each sub-environment.""" + # faster version: only process env that terminated and add bootstrapping info + if self.fast_variant: + infos = [{} for _ in range(self.num_envs)] + + for idx in reset_ids: + # fill-in episode monitoring info + infos[idx]["episode"] = { + "r": self._ep_rew_buf[idx], + "l": self._ep_len_buf[idx], + } + + # fill-in bootstrap information + infos[idx]["TimeLimit.truncated"] = truncated[idx] and not terminated[idx] + + # add information about terminal observation separately + if isinstance(obs, dict): + terminal_obs = {key: value[idx] for key, value in obs.items()} + else: + terminal_obs = obs[idx] + infos[idx]["terminal_observation"] = terminal_obs + + return infos + # create empty list of dictionaries to fill infos: list[dict[str, Any]] = [dict.fromkeys(extras.keys()) for _ in range(self.num_envs)] # fill-in information for each sub-environment diff --git a/source/isaaclab_rl/setup.py b/source/isaaclab_rl/setup.py index 5cff8720605..8320cb866b0 100644 --- a/source/isaaclab_rl/setup.py +++ b/source/isaaclab_rl/setup.py @@ -19,12 +19,10 @@ # Minimum dependencies required prior to installation INSTALL_REQUIRES = [ # generic - "numpy", + "numpy<2", "torch==2.5.1", "torchvision>=0.14.1", # ensure compatibility with torch 1.13.1 - # 5.26.0 introduced a breaking change, so we restricted it for now. - # See issue https://github.com/tensorflow/tensorboard/issues/6808 for details. - "protobuf>=3.20.2, < 5.0.0", + "protobuf>=3.20.2,!=5.26.0", # configuration management "hydra-core", # data collection @@ -41,7 +39,7 @@ # Extra dependencies for RL agents EXTRAS_REQUIRE = { - "sb3": ["stable-baselines3>=2.1"], + "sb3": ["stable-baselines3>=2.6", "tqdm", "rich"], # tqdm/rich for progress bar "skrl": ["skrl>=1.4.2"], "rl-games": ["rl-games==1.6.1", "gym"], # rl-games still needs gym :( "rsl-rl": ["rsl-rl-lib==2.3.3"], diff --git a/source/isaaclab_tasks/config/extension.toml b/source/isaaclab_tasks/config/extension.toml index 12b09689da3..d827c066495 100644 --- a/source/isaaclab_tasks/config/extension.toml +++ b/source/isaaclab_tasks/config/extension.toml @@ -1,7 +1,7 @@ [package] # Note: Semantic Versioning is used: https://semver.org/ -version = "0.10.34" +version = "0.10.36" # Description title = "Isaac Lab Environments" diff --git a/source/isaaclab_tasks/docs/CHANGELOG.rst b/source/isaaclab_tasks/docs/CHANGELOG.rst index ed2cc69e1a1..d5f59b1b7ac 100644 --- a/source/isaaclab_tasks/docs/CHANGELOG.rst +++ b/source/isaaclab_tasks/docs/CHANGELOG.rst @@ -1,6 +1,24 @@ Changelog --------- +0.10.36 (2025-06-26) +~~~~~~~~~~~~~~~~~~~~ + +Fixed +^^^^^ + +* Relaxed upper range pin for protobuf python dependency for more permissive installation. + + +0.10.35 (2025-05-22) +~~~~~~~~~~~~~~~~~~~~ + +Fixed +^^^^^ + +* Fixed redundant body_names assignment in rough_env_cfg.py for H1 robot. + + 0.10.34 (2025-06-16) ~~~~~~~~~~~~~~~~~~~~ diff --git a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/disassembly_env.py b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/disassembly_env.py index 8c79914f76c..c74b5da124b 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/disassembly_env.py +++ b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/disassembly_env.py @@ -555,7 +555,7 @@ def _move_gripper_to_eef_pose(self, env_ids, goal_pos, goal_quat, sim_steps, if_ for _ in range(sim_steps): if if_log: self._log_robot_state_per_timestep() - # print('finger', self.fingertip_midpoint_pos[0], 'goal', goal_pos[0]) + # Compute error to target. pos_error, axis_angle_error = fc.get_pose_error( fingertip_midpoint_pos=self.fingertip_midpoint_pos[env_ids], @@ -567,10 +567,7 @@ def _move_gripper_to_eef_pose(self, env_ids, goal_pos, goal_quat, sim_steps, if_ ) delta_hand_pose = torch.cat((pos_error, axis_angle_error), dim=-1) - # print('delta hand pose', delta_hand_pose[0]) self.actions *= 0.0 - # print('action shape', self.actions[env_ids, :6].shape) - # print('delta hand shape', delta_hand_pose.shape) self.actions[env_ids, :6] = delta_hand_pose is_rendering = self.sim.has_gui() or self.sim.has_rtx_sensors() @@ -746,7 +743,6 @@ def _disassemble_plug_from_socket(self): if_intersect = (self.held_pos[:, 2] < self.fixed_pos[:, 2] + self.disassembly_dists).cpu().numpy() env_ids = np.argwhere(if_intersect == 0).reshape(-1) - # print('env ids', env_ids) self._randomize_gripper_pose(self.cfg_task.move_gripper_sim_steps, env_ids) def _lift_gripper(self, lift_distance, sim_steps, env_ids=None): @@ -880,6 +876,9 @@ def _save_log_traj(self): with open(log_filename, "w+") as out_file: json.dump(log_item, out_file, indent=6) + print(f"Trajectory collection complete! Collected {len(self.log_arm_dof_pos)} trajectories!") exit(0) else: - print("current logging item num: ", len(self.log_arm_dof_pos)) + print( + f"Collected {len(self.log_arm_dof_pos)} trajectories so far (target: > {self.cfg_task.num_log_traj})." + ) diff --git a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/industreal_algo_utils.py b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/industreal_algo_utils.py index 6acd883017f..89cf4d2553d 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/industreal_algo_utils.py +++ b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/industreal_algo_utils.py @@ -38,6 +38,8 @@ Not intended to be executed as a standalone script. """ +# Force garbage collection for large arrays +import gc import numpy as np import os @@ -150,8 +152,14 @@ def get_sdf_reward( sdf_reward[i] = torch.mean(sdf_dist) + del mesh_copy + del mesh_points + del mesh_indices + del sampled_points + sdf_reward = -torch.log(sdf_reward) + gc.collect() # Force garbage collection to free memory return sdf_reward diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml b/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml index b6db545ff22..0259e5240f8 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml @@ -1,22 +1,22 @@ -# Reference: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml#L245 +# Adapted from rsl_rl config seed: 42 - -policy: 'MlpPolicy' +policy: "MlpPolicy" n_timesteps: !!float 5e7 -batch_size: 256 -n_steps: 512 +# For 4 minibatches with 4096 envs +# batch_size = (n_envs * n_steps) / n_minibatches = 32768 +n_minibatches: 4 +n_steps: 32 gamma: 0.99 -learning_rate: !!float 2.5e-4 +learning_rate: !!float 5e-4 ent_coef: 0.0 clip_range: 0.2 -n_epochs: 10 +n_epochs: 5 gae_lambda: 0.95 max_grad_norm: 1.0 vf_coef: 0.5 -device: "cuda:0" policy_kwargs: "dict( - log_std_init=-1, - ortho_init=False, - activation_fn=nn.ReLU, - net_arch=dict(pi=[256, 256], vf=[256, 256]) - )" + activation_fn=nn.ELU, + net_arch=[400, 200, 100], + optimizer_kwargs=dict(eps=1e-8), + ortho_init=False, + )" diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/__init__.py new file mode 100644 index 00000000000..cb907a3f0c8 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/__init__.py @@ -0,0 +1,8 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Locomotion environments for legged robots.""" + +from .tracking import * # noqa diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/__init__.py new file mode 100644 index 00000000000..2e924fbf1b1 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/__init__.py new file mode 100644 index 00000000000..2e924fbf1b1 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/__init__.py new file mode 100644 index 00000000000..df0a1cfdf72 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/__init__.py @@ -0,0 +1,32 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import gymnasium as gym + +from . import agents + +## +# Register Gym environments. +## +gym.register( + id="Isaac-Tracking-LocoManip-Digit-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.loco_manip_env_cfg:DigitLocoManipEnvCfg", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DigitLocoManipPPORunnerCfg", + }, +) + + +gym.register( + id="Isaac-Tracking-LocoManip-Digit-Play-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.loco_manip_env_cfg:DigitLocoManipEnvCfg_PLAY", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DigitLocoManipPPORunnerCfg", + }, +) diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/agents/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/agents/__init__.py new file mode 100644 index 00000000000..2e924fbf1b1 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/agents/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/agents/rsl_rl_ppo_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/agents/rsl_rl_ppo_cfg.py new file mode 100644 index 00000000000..cb898b1e89c --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/agents/rsl_rl_ppo_cfg.py @@ -0,0 +1,37 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +from isaaclab.utils import configclass + +from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg + + +@configclass +class DigitLocoManipPPORunnerCfg(RslRlOnPolicyRunnerCfg): + num_steps_per_env = 24 + max_iterations = 2000 + save_interval = 50 + experiment_name = "digit_loco_manip" + empirical_normalization = False + policy = RslRlPpoActorCriticCfg( + init_noise_std=1.0, + actor_hidden_dims=[256, 128, 128], + critic_hidden_dims=[256, 128, 128], + activation="elu", + ) + algorithm = RslRlPpoAlgorithmCfg( + value_loss_coef=1.0, + use_clipped_value_loss=True, + clip_param=0.2, + entropy_coef=0.01, + num_learning_epochs=5, + num_mini_batches=4, + learning_rate=1.0e-3, + schedule="adaptive", + gamma=0.99, + lam=0.95, + desired_kl=0.01, + max_grad_norm=1.0, + ) diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/loco_manip_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/loco_manip_env_cfg.py new file mode 100644 index 00000000000..8a04320822b --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/loco_manipulation/tracking/config/digit/loco_manip_env_cfg.py @@ -0,0 +1,250 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import math + +from isaaclab_assets.robots.agility import ARM_JOINT_NAMES, LEG_JOINT_NAMES + +from isaaclab.managers import EventTermCfg +from isaaclab.managers import ObservationGroupCfg as ObsGroup +from isaaclab.managers import ObservationTermCfg as ObsTerm +from isaaclab.managers import RewardTermCfg as RewTerm +from isaaclab.managers import SceneEntityCfg +from isaaclab.utils import configclass +from isaaclab.utils.noise import AdditiveUniformNoiseCfg as Unoise + +import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp +import isaaclab_tasks.manager_based.manipulation.reach.mdp as manipulation_mdp +from isaaclab_tasks.manager_based.locomotion.velocity.config.digit.rough_env_cfg import DigitRewards, DigitRoughEnvCfg +from isaaclab_tasks.manager_based.locomotion.velocity.velocity_env_cfg import EventCfg + + +@configclass +class DigitLocoManipRewards(DigitRewards): + joint_deviation_arms = None + + joint_vel_hip_yaw = RewTerm( + func=mdp.joint_vel_l2, + weight=-0.001, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=[".*_leg_hip_yaw"])}, + ) + + left_ee_pos_tracking = RewTerm( + func=manipulation_mdp.position_command_error, + weight=-2.0, + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="left_arm_wrist_yaw"), + "command_name": "left_ee_pose", + }, + ) + + left_ee_pos_tracking_fine_grained = RewTerm( + func=manipulation_mdp.position_command_error_tanh, + weight=2.0, + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="left_arm_wrist_yaw"), + "std": 0.05, + "command_name": "left_ee_pose", + }, + ) + + left_end_effector_orientation_tracking = RewTerm( + func=manipulation_mdp.orientation_command_error, + weight=-0.2, + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="left_arm_wrist_yaw"), + "command_name": "left_ee_pose", + }, + ) + + right_ee_pos_tracking = RewTerm( + func=manipulation_mdp.position_command_error, + weight=-2.0, + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="right_arm_wrist_yaw"), + "command_name": "right_ee_pose", + }, + ) + + right_ee_pos_tracking_fine_grained = RewTerm( + func=manipulation_mdp.position_command_error_tanh, + weight=2.0, + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="right_arm_wrist_yaw"), + "std": 0.05, + "command_name": "right_ee_pose", + }, + ) + + right_end_effector_orientation_tracking = RewTerm( + func=manipulation_mdp.orientation_command_error, + weight=-0.2, + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="right_arm_wrist_yaw"), + "command_name": "right_ee_pose", + }, + ) + + +@configclass +class DigitLocoManipObservations: + + @configclass + class PolicyCfg(ObsGroup): + base_lin_vel = ObsTerm( + func=mdp.base_lin_vel, + noise=Unoise(n_min=-0.1, n_max=0.1), + ) + base_ang_vel = ObsTerm( + func=mdp.base_ang_vel, + noise=Unoise(n_min=-0.2, n_max=0.2), + ) + projected_gravity = ObsTerm( + func=mdp.projected_gravity, + noise=Unoise(n_min=-0.05, n_max=0.05), + ) + velocity_commands = ObsTerm( + func=mdp.generated_commands, + params={"command_name": "base_velocity"}, + ) + left_ee_pose_command = ObsTerm( + func=mdp.generated_commands, + params={"command_name": "left_ee_pose"}, + ) + right_ee_pose_command = ObsTerm( + func=mdp.generated_commands, + params={"command_name": "right_ee_pose"}, + ) + joint_pos = ObsTerm( + func=mdp.joint_pos_rel, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=LEG_JOINT_NAMES + ARM_JOINT_NAMES)}, + noise=Unoise(n_min=-0.01, n_max=0.01), + ) + joint_vel = ObsTerm( + func=mdp.joint_vel_rel, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=LEG_JOINT_NAMES + ARM_JOINT_NAMES)}, + noise=Unoise(n_min=-1.5, n_max=1.5), + ) + actions = ObsTerm(func=mdp.last_action) + + def __post_init__(self): + self.enable_corruption = True + self.concatenate_terms = True + + policy = PolicyCfg() + + +@configclass +class DigitLocoManipCommands: + base_velocity = mdp.UniformVelocityCommandCfg( + asset_name="robot", + resampling_time_range=(10.0, 10.0), + rel_standing_envs=0.25, + rel_heading_envs=1.0, + heading_command=True, + debug_vis=True, + ranges=mdp.UniformVelocityCommandCfg.Ranges( + lin_vel_x=(-1.0, 1.0), + lin_vel_y=(-1.0, 1.0), + ang_vel_z=(-1.0, 1.0), + heading=(-math.pi, math.pi), + ), + ) + + left_ee_pose = mdp.UniformPoseCommandCfg( + asset_name="robot", + body_name="left_arm_wrist_yaw", + resampling_time_range=(1.0, 3.0), + debug_vis=True, + ranges=mdp.UniformPoseCommandCfg.Ranges( + pos_x=(0.10, 0.50), + pos_y=(0.05, 0.50), + pos_z=(-0.20, 0.20), + roll=(-0.1, 0.1), + pitch=(-0.1, 0.1), + yaw=(math.pi / 2.0 - 0.1, math.pi / 2.0 + 0.1), + ), + ) + + right_ee_pose = mdp.UniformPoseCommandCfg( + asset_name="robot", + body_name="right_arm_wrist_yaw", + resampling_time_range=(1.0, 3.0), + debug_vis=True, + ranges=mdp.UniformPoseCommandCfg.Ranges( + pos_x=(0.10, 0.50), + pos_y=(-0.50, -0.05), + pos_z=(-0.20, 0.20), + roll=(-0.1, 0.1), + pitch=(-0.1, 0.1), + yaw=(-math.pi / 2.0 - 0.1, -math.pi / 2.0 + 0.1), + ), + ) + + +@configclass +class DigitEvents(EventCfg): + # Add an external force to simulate a payload being carried. + left_hand_force = EventTermCfg( + func=mdp.apply_external_force_torque, + mode="interval", + interval_range_s=(10.0, 15.0), + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="left_arm_wrist_yaw"), + "force_range": (-10.0, 10.0), + "torque_range": (-1.0, 1.0), + }, + ) + + right_hand_force = EventTermCfg( + func=mdp.apply_external_force_torque, + mode="interval", + interval_range_s=(10.0, 15.0), + params={ + "asset_cfg": SceneEntityCfg("robot", body_names="right_arm_wrist_yaw"), + "force_range": (-10.0, 10.0), + "torque_range": (-1.0, 1.0), + }, + ) + + +@configclass +class DigitLocoManipEnvCfg(DigitRoughEnvCfg): + rewards: DigitLocoManipRewards = DigitLocoManipRewards() + observations: DigitLocoManipObservations = DigitLocoManipObservations() + commands: DigitLocoManipCommands = DigitLocoManipCommands() + + def __post_init__(self): + super().__post_init__() + + self.episode_length_s = 14.0 + + # Rewards: + self.rewards.flat_orientation_l2.weight = -10.5 + self.rewards.termination_penalty.weight = -100.0 + + # Change terrain to flat. + self.scene.terrain.terrain_type = "plane" + self.scene.terrain.terrain_generator = None + # Remove height scanner. + self.scene.height_scanner = None + self.observations.policy.height_scan = None + # Remove terrain curriculum. + self.curriculum.terrain_levels = None + + +class DigitLocoManipEnvCfg_PLAY(DigitLocoManipEnvCfg): + + def __post_init__(self) -> None: + super().__post_init__() + + # Make a smaller scene for play. + self.scene.num_envs = 50 + self.scene.env_spacing = 2.5 + # Disable randomization for play. + self.observations.policy.enable_corruption = False + # Remove random pushing. + self.randomization.base_external_force_torque = None + self.randomization.push_robot = None diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/__init__.py index 6f9591a9466..8a89b4bb5c1 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/__init__.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/__init__.py @@ -19,6 +19,7 @@ "env_cfg_entry_point": f"{__name__}.flat_env_cfg:UnitreeA1FlatEnvCfg", "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UnitreeA1FlatPPORunnerCfg", "skrl_cfg_entry_point": f"{agents.__name__}:skrl_flat_ppo_cfg.yaml", + "sb3_cfg_entry_point": f"{agents.__name__}:sb3_ppo_cfg.yaml", }, ) @@ -30,6 +31,7 @@ "env_cfg_entry_point": f"{__name__}.flat_env_cfg:UnitreeA1FlatEnvCfg_PLAY", "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UnitreeA1FlatPPORunnerCfg", "skrl_cfg_entry_point": f"{agents.__name__}:skrl_flat_ppo_cfg.yaml", + "sb3_cfg_entry_point": f"{agents.__name__}:sb3_ppo_cfg.yaml", }, ) @@ -41,6 +43,7 @@ "env_cfg_entry_point": f"{__name__}.rough_env_cfg:UnitreeA1RoughEnvCfg", "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UnitreeA1RoughPPORunnerCfg", "skrl_cfg_entry_point": f"{agents.__name__}:skrl_rough_ppo_cfg.yaml", + "sb3_cfg_entry_point": f"{agents.__name__}:sb3_ppo_cfg.yaml", }, ) @@ -52,5 +55,6 @@ "env_cfg_entry_point": f"{__name__}.rough_env_cfg:UnitreeA1RoughEnvCfg_PLAY", "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UnitreeA1RoughPPORunnerCfg", "skrl_cfg_entry_point": f"{agents.__name__}:skrl_rough_ppo_cfg.yaml", + "sb3_cfg_entry_point": f"{agents.__name__}:sb3_ppo_cfg.yaml", }, ) diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/agents/sb3_ppo_cfg.yaml b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/agents/sb3_ppo_cfg.yaml new file mode 100644 index 00000000000..bbaa36c2fb7 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/agents/sb3_ppo_cfg.yaml @@ -0,0 +1,23 @@ +# Adapted from rsl_rl config +seed: 42 +n_timesteps: !!float 5e7 +policy: 'MlpPolicy' +n_steps: 24 +n_minibatches: 4 # batch_size=24576 for n_envs=4096 and n_steps=24 +gae_lambda: 0.95 +gamma: 0.99 +n_epochs: 5 +ent_coef: 0.005 +learning_rate: !!float 1e-3 +clip_range: !!float 0.2 +policy_kwargs: "dict( + activation_fn=nn.ELU, + net_arch=[512, 256, 128], + optimizer_kwargs=dict(eps=1e-8), + ortho_init=False, + )" +vf_coef: 1.0 +max_grad_norm: 1.0 +normalize_input: True +normalize_value: False +clip_obs: 10.0 diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/rough_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/rough_env_cfg.py index 684ee2b6476..238e3bd03ea 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/rough_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/rough_env_cfg.py @@ -46,6 +46,7 @@ def __post_init__(self): "yaw": (0.0, 0.0), }, } + self.events.base_com = None # rewards self.rewards.feet_air_time.params["sensor_cfg"].body_names = ".*_foot" diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/cassie/rough_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/cassie/rough_env_cfg.py index 7ab1f855eb5..f99f36f642d 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/cassie/rough_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/cassie/rough_env_cfg.py @@ -77,6 +77,7 @@ def __post_init__(self): "yaw": (0.0, 0.0), }, } + self.events.base_com = None # terminations self.terminations.base_contact.params["sensor_cfg"].body_names = [".*pelvis"] diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/__init__.py new file mode 100644 index 00000000000..1f9915fb27e --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/__init__.py @@ -0,0 +1,54 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import gymnasium as gym + +from . import agents + +## +# Register Gym environments. +## +gym.register( + id="Isaac-Velocity-Flat-Digit-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.flat_env_cfg:DigitFlatEnvCfg", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DigitFlatPPORunnerCfg", + }, +) + + +gym.register( + id="Isaac-Velocity-Flat-Digit-Play-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.flat_env_cfg:DigitFlatEnvCfg_PLAY", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DigitFlatPPORunnerCfg", + }, +) + + +gym.register( + id="Isaac-Velocity-Rough-Digit-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.rough_env_cfg:DigitRoughEnvCfg", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DigitRoughPPORunnerCfg", + }, +) + + +gym.register( + id="Isaac-Velocity-Rough-Digit-Play-v0", + entry_point="isaaclab.envs:ManagerBasedRLEnv", + disable_env_checker=True, + kwargs={ + "env_cfg_entry_point": f"{__name__}.rough_env_cfg:DigitRoughEnvCfg_PLAY", + "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:DigitRoughPPORunnerCfg", + }, +) diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/agents/__init__.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/agents/__init__.py new file mode 100644 index 00000000000..2e924fbf1b1 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/agents/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/agents/rsl_rl_ppo_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/agents/rsl_rl_ppo_cfg.py new file mode 100644 index 00000000000..ab23e2c7b71 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/agents/rsl_rl_ppo_cfg.py @@ -0,0 +1,49 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +from isaaclab.utils import configclass + +from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg + + +@configclass +class DigitRoughPPORunnerCfg(RslRlOnPolicyRunnerCfg): + num_steps_per_env = 24 + max_iterations = 3000 + save_interval = 50 + experiment_name = "digit_rough" + empirical_normalization = False + policy = RslRlPpoActorCriticCfg( + init_noise_std=1.0, + actor_hidden_dims=[512, 256, 128], + critic_hidden_dims=[512, 256, 128], + activation="elu", + ) + algorithm = RslRlPpoAlgorithmCfg( + value_loss_coef=1.0, + use_clipped_value_loss=True, + clip_param=0.2, + entropy_coef=0.01, + num_learning_epochs=5, + num_mini_batches=4, + learning_rate=1.0e-3, + schedule="adaptive", + gamma=0.99, + lam=0.95, + desired_kl=0.01, + max_grad_norm=1.0, + ) + + +@configclass +class DigitFlatPPORunnerCfg(DigitRoughPPORunnerCfg): + def __post_init__(self): + super().__post_init__() + + self.max_iterations = 2000 + self.experiment_name = "digit_flat" + + self.policy.actor_hidden_dims = [128, 128, 128] + self.policy.critic_hidden_dims = [128, 128, 128] diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/flat_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/flat_env_cfg.py new file mode 100644 index 00000000000..5d94e7008a8 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/flat_env_cfg.py @@ -0,0 +1,37 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +from isaaclab.utils import configclass + +from .rough_env_cfg import DigitRoughEnvCfg + + +@configclass +class DigitFlatEnvCfg(DigitRoughEnvCfg): + def __post_init__(self): + super().__post_init__() + + # Change terrain to flat. + self.scene.terrain.terrain_type = "plane" + self.scene.terrain.terrain_generator = None + # Remove height scanner. + self.scene.height_scanner = None + self.observations.policy.height_scan = None + # Remove terrain curriculum. + self.curriculum.terrain_levels = None + + +class DigitFlatEnvCfg_PLAY(DigitFlatEnvCfg): + def __post_init__(self) -> None: + super().__post_init__() + + # Make a smaller scene for play. + self.scene.num_envs = 50 + self.scene.env_spacing = 2.5 + # Disable randomization for play. + self.observations.policy.enable_corruption = False + # Remove random pushing. + self.randomization.base_external_force_torque = None + self.randomization.push_robot = None diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/rough_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/rough_env_cfg.py new file mode 100644 index 00000000000..524e03c6fe8 --- /dev/null +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/digit/rough_env_cfg.py @@ -0,0 +1,266 @@ +# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +import math + +from isaaclab_assets.robots.agility import ARM_JOINT_NAMES, DIGIT_V4_CFG, LEG_JOINT_NAMES + +from isaaclab.managers import ObservationGroupCfg, ObservationTermCfg, RewardTermCfg, SceneEntityCfg, TerminationTermCfg +from isaaclab.utils import configclass +from isaaclab.utils.noise import AdditiveUniformNoiseCfg as Unoise + +import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp +from isaaclab_tasks.manager_based.locomotion.velocity.velocity_env_cfg import LocomotionVelocityRoughEnvCfg + + +@configclass +class DigitRewards: + termination_penalty = RewardTermCfg( + func=mdp.is_terminated, + weight=-100.0, + ) + track_lin_vel_xy_exp = RewardTermCfg( + func=mdp.track_lin_vel_xy_yaw_frame_exp, + weight=1.0, + params={"command_name": "base_velocity", "std": math.sqrt(0.25)}, + ) + track_ang_vel_z_exp = RewardTermCfg( + func=mdp.track_ang_vel_z_world_exp, + weight=1.0, + params={ + "command_name": "base_velocity", + "std": math.sqrt(0.25), + }, + ) + feet_air_time = RewardTermCfg( + func=mdp.feet_air_time_positive_biped, + weight=0.25, + params={ + "sensor_cfg": SceneEntityCfg("contact_forces", body_names=".*_leg_toe_roll"), + "threshold": 0.8, + "command_name": "base_velocity", + }, + ) + feet_slide = RewardTermCfg( + func=mdp.feet_slide, + weight=-0.25, + params={ + "sensor_cfg": SceneEntityCfg("contact_forces", body_names=".*_leg_toe_roll"), + "asset_cfg": SceneEntityCfg("robot", body_names=".*_leg_toe_roll"), + }, + ) + dof_torques_l2 = RewardTermCfg( + func=mdp.joint_torques_l2, + weight=-1.0e-6, + ) + dof_acc_l2 = RewardTermCfg( + func=mdp.joint_acc_l2, + weight=-2.0e-7, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=LEG_JOINT_NAMES + ARM_JOINT_NAMES)}, + ) + action_rate_l2 = RewardTermCfg( + func=mdp.action_rate_l2, + weight=-0.008, + ) + flat_orientation_l2 = RewardTermCfg( + func=mdp.flat_orientation_l2, + weight=-2.5, + ) + stand_still = RewardTermCfg( + func=mdp.stand_still_joint_deviation_l1, + weight=-0.4, + params={ + "command_name": "base_velocity", + "asset_cfg": SceneEntityCfg("robot", joint_names=LEG_JOINT_NAMES), + }, + ) + lin_vel_z_l2 = RewardTermCfg( + func=mdp.lin_vel_z_l2, + weight=-2.0, + ) + ang_vel_xy_l2 = RewardTermCfg( + func=mdp.ang_vel_xy_l2, + weight=-0.1, + ) + no_jumps = RewardTermCfg( + func=mdp.desired_contacts, + weight=-0.5, + params={"sensor_cfg": SceneEntityCfg("contact_forces", body_names=[".*_leg_toe_roll"])}, + ) + dof_pos_limits = RewardTermCfg( + func=mdp.joint_pos_limits, + weight=-1.0, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=[".*_leg_toe_roll", ".*_leg_toe_pitch", ".*_tarsus"])}, + ) + joint_deviation_hip_roll = RewardTermCfg( + func=mdp.joint_deviation_l1, + weight=-0.1, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=".*_leg_hip_roll")}, + ) + joint_deviation_hip_yaw = RewardTermCfg( + func=mdp.joint_deviation_l1, + weight=-0.2, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=".*_leg_hip_yaw")}, + ) + joint_deviation_knee = RewardTermCfg( + func=mdp.joint_deviation_l1, + weight=-0.2, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=".*_tarsus")}, + ) + joint_deviation_feet = RewardTermCfg( + func=mdp.joint_deviation_l1, + weight=-0.1, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=[".*_toe_a", ".*_toe_b"])}, + ) + joint_deviation_arms = RewardTermCfg( + func=mdp.joint_deviation_l1, + weight=-0.2, + params={ + "asset_cfg": SceneEntityCfg("robot", joint_names=".*_arm_.*"), + }, + ) + + undesired_contacts = RewardTermCfg( + func=mdp.undesired_contacts, + weight=-0.1, + params={ + "sensor_cfg": SceneEntityCfg("contact_forces", body_names=[".*_rod", ".*_tarsus"]), + "threshold": 1.0, + }, + ) + + +@configclass +class DigitObservations: + @configclass + class PolicyCfg(ObservationGroupCfg): + base_lin_vel = ObservationTermCfg( + func=mdp.base_lin_vel, + noise=Unoise(n_min=-0.1, n_max=0.1), + ) + base_ang_vel = ObservationTermCfg( + func=mdp.base_ang_vel, + noise=Unoise(n_min=-0.2, n_max=0.2), + ) + projected_gravity = ObservationTermCfg( + func=mdp.projected_gravity, + noise=Unoise(n_min=-0.05, n_max=0.05), + ) + velocity_commands = ObservationTermCfg( + func=mdp.generated_commands, + params={"command_name": "base_velocity"}, + ) + joint_pos = ObservationTermCfg( + func=mdp.joint_pos_rel, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=LEG_JOINT_NAMES + ARM_JOINT_NAMES)}, + noise=Unoise(n_min=-0.01, n_max=0.01), + ) + joint_vel = ObservationTermCfg( + func=mdp.joint_vel_rel, + params={"asset_cfg": SceneEntityCfg("robot", joint_names=LEG_JOINT_NAMES + ARM_JOINT_NAMES)}, + noise=Unoise(n_min=-1.5, n_max=1.5), + ) + actions = ObservationTermCfg(func=mdp.last_action) + height_scan = ObservationTermCfg( + func=mdp.height_scan, + params={"sensor_cfg": SceneEntityCfg("height_scanner")}, + noise=Unoise(n_min=-0.1, n_max=0.1), + clip=(-1.0, 1.0), + ) + + def __post_init__(self): + self.enable_corruption = True + self.concatenate_terms = True + + # Observation groups: + policy: PolicyCfg = PolicyCfg() + + +@configclass +class TerminationsCfg: + """Termination terms for the MDP.""" + + time_out = TerminationTermCfg(func=mdp.time_out, time_out=True) + base_contact = TerminationTermCfg( + func=mdp.illegal_contact, + params={ + "sensor_cfg": SceneEntityCfg("contact_forces", body_names=["torso_base"]), + "threshold": 1.0, + }, + ) + base_orientation = TerminationTermCfg( + func=mdp.bad_orientation, + params={"limit_angle": 0.7}, + ) + + +@configclass +class ActionsCfg: + """Action specifications for the MDP.""" + + joint_pos = mdp.JointPositionActionCfg( + asset_name="robot", + joint_names=LEG_JOINT_NAMES + ARM_JOINT_NAMES, + scale=0.5, + use_default_offset=True, + ) + + +@configclass +class DigitRoughEnvCfg(LocomotionVelocityRoughEnvCfg): + rewards: DigitRewards = DigitRewards() + observations: DigitObservations = DigitObservations() + terminations: TerminationsCfg = TerminationsCfg() + actions: ActionsCfg = ActionsCfg() + + def __post_init__(self): + super().__post_init__() + self.decimation = 4 + self.sim.dt = 0.005 + + # Scene + self.scene.robot = DIGIT_V4_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot") + self.scene.height_scanner.prim_path = "{ENV_REGEX_NS}/Robot/torso_base" + self.scene.contact_forces.history_length = self.decimation + self.scene.contact_forces.update_period = self.sim.dt + self.scene.height_scanner.update_period = self.decimation * self.sim.dt + + # Events: + self.events.add_base_mass.params["asset_cfg"] = SceneEntityCfg("robot", body_names="torso_base") + self.events.base_external_force_torque.params["asset_cfg"] = SceneEntityCfg("robot", body_names="torso_base") + # Don't randomize the initial joint positions because we have closed loops. + self.events.reset_robot_joints.params["position_range"] = (1.0, 1.0) + # remove COM randomization + self.events.base_com = None + + # Commands + self.commands.base_velocity.ranges.lin_vel_x = (-0.8, 0.8) + self.commands.base_velocity.ranges.lin_vel_y = (-0.5, 0.5) + self.commands.base_velocity.ranges.ang_vel_z = (-1.0, 1.0) + self.commands.base_velocity.rel_standing_envs = 0.1 + self.commands.base_velocity.resampling_time_range = (3.0, 8.0) + + +@configclass +class DigitRoughEnvCfg_PLAY(DigitRoughEnvCfg): + def __post_init__(self): + super().__post_init__() + + # Make a smaller scene for play. + self.scene.num_envs = 50 + self.scene.env_spacing = 2.5 + # Spawn the robot randomly in the grid (instead of their terrain levels). + self.scene.terrain.max_init_terrain_level = None + # Reduce the number of terrains to save memory. + if self.scene.terrain.terrain_generator is not None: + self.scene.terrain.terrain_generator.num_rows = 5 + self.scene.terrain.terrain_generator.num_cols = 5 + self.scene.terrain.terrain_generator.curriculum = False + + # Disable randomization for play. + self.observations.policy.enable_corruption = False + # Remove random pushing. + self.randomization.base_external_force_torque = None + self.randomization.push_robot = None diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/g1/rough_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/g1/rough_env_cfg.py index 5023ecc4dc6..4ef14c81542 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/g1/rough_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/g1/rough_env_cfg.py @@ -127,6 +127,7 @@ def __post_init__(self): "yaw": (0.0, 0.0), }, } + self.events.base_com = None # Rewards self.rewards.lin_vel_z_l2.weight = 0.0 diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go1/rough_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go1/rough_env_cfg.py index bae64af0257..01236ae7d69 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go1/rough_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go1/rough_env_cfg.py @@ -46,6 +46,7 @@ def __post_init__(self): "yaw": (0.0, 0.0), }, } + self.events.base_com = None # rewards self.rewards.feet_air_time.params["sensor_cfg"].body_names = ".*_foot" diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go2/rough_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go2/rough_env_cfg.py index 707b3f9a3f9..22fb69cff44 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go2/rough_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/go2/rough_env_cfg.py @@ -46,6 +46,7 @@ def __post_init__(self): "yaw": (0.0, 0.0), }, } + self.events.base_com = None # rewards self.rewards.feet_air_time.params["sensor_cfg"].body_names = ".*_foot" diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/h1/rough_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/h1/rough_env_cfg.py index bd3b96195b4..31864701c47 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/h1/rough_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/h1/rough_env_cfg.py @@ -95,9 +95,7 @@ def __post_init__(self): "yaw": (0.0, 0.0), }, } - - # Terminations - self.terminations.base_contact.params["sensor_cfg"].body_names = [".*torso_link"] + self.events.base_com = None # Rewards self.rewards.undesired_contacts = None @@ -111,7 +109,7 @@ def __post_init__(self): self.commands.base_velocity.ranges.lin_vel_y = (0.0, 0.0) self.commands.base_velocity.ranges.ang_vel_z = (-1.0, 1.0) - # terminations + # Terminations self.terminations.base_contact.params["sensor_cfg"].body_names = ".*torso_link" diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/spot/flat_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/spot/flat_env_cfg.py index 7afcb432fcc..4b2f5509bab 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/spot/flat_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/spot/flat_env_cfg.py @@ -296,7 +296,7 @@ class SpotTerminationsCfg: @configclass class SpotFlatEnvCfg(LocomotionVelocityRoughEnvCfg): - # Basic settings' + # Basic settings observations: SpotObservationsCfg = SpotObservationsCfg() actions: SpotActionsCfg = SpotActionsCfg() commands: SpotCommandsCfg = SpotCommandsCfg() @@ -375,5 +375,3 @@ def __post_init__(self) -> None: # disable randomization for play self.observations.policy.enable_corruption = False # remove random pushing event - # self.events.base_external_force_torque = None - # self.events.push_robot = None diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/mdp/rewards.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/mdp/rewards.py index 52383e43045..7a1fc12a1db 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/mdp/rewards.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/mdp/rewards.py @@ -14,6 +14,7 @@ import torch from typing import TYPE_CHECKING +from isaaclab.envs import mdp from isaaclab.managers import SceneEntityCfg from isaaclab.sensors import ContactSensor from isaaclab.utils.math import quat_apply_inverse, yaw_quat @@ -104,3 +105,12 @@ def track_ang_vel_z_world_exp( asset = env.scene[asset_cfg.name] ang_vel_error = torch.square(env.command_manager.get_command(command_name)[:, 2] - asset.data.root_ang_vel_w[:, 2]) return torch.exp(-ang_vel_error / std**2) + + +def stand_still_joint_deviation_l1( + env, command_name: str, command_threshold: float = 0.06, asset_cfg: SceneEntityCfg = SceneEntityCfg("robot") +) -> torch.Tensor: + """Penalize offsets from the default joint positions when the command is very small.""" + command = env.command_manager.get_command(command_name) + # Penalize motion when command is nearly zero. + return mdp.joint_deviation_l1(env, asset_cfg) * (torch.norm(command[:, :2], dim=1) < command_threshold) diff --git a/source/isaaclab_tasks/setup.py b/source/isaaclab_tasks/setup.py index 31933298a75..3f17fd21016 100644 --- a/source/isaaclab_tasks/setup.py +++ b/source/isaaclab_tasks/setup.py @@ -18,12 +18,10 @@ # Minimum dependencies required prior to installation INSTALL_REQUIRES = [ # generic - "numpy", + "numpy<2", "torch==2.5.1", "torchvision>=0.14.1", # ensure compatibility with torch 1.13.1 - # 5.26.0 introduced a breaking change, so we restricted it for now. - # See issue https://github.com/tensorflow/tensorboard/issues/6808 for details. - "protobuf>=3.20.2, < 5.0.0", + "protobuf>=3.20.2,!=5.26.0", # basic logger "tensorboard", # automate