Segmentation fault   (core dumped) when MJWarp Training for Locomotion (Velocity Tracking)


Very great work !

I meet one problem when I run:
source scripts/source_mujoco_setup.sh
python src/holosoma/holosoma/train_agent.py \
    exp:g1-29dof-fast-sac \
    simulator:mjwarp \
    logger:wandb
as following:

==========================================================================
(hsmujoco) root@autel:/workspace/holosoma/holosoma# ./run_train_MJWarp.sh
MuJoCo environment activated successfully
MuJoCo version: 3.4.0
PyTorch version: 2.7.0+cu128
MuJoCo Warp commit: 09ec1da
CurriculumManagerCfg(params={'num_compute_average_epl': 1000}, setup_terms={'average_episode_tracker': CurriculumTermCfg(func='holosoma.managers.curriculum.terms.locomotion:AverageEpisodeLengthTracker', params={}), 'penalty_curriculum': CurriculumTermCfg(func='holosoma.managers.curriculum.terms.locomotion:PenaltyCurriculum', params={'enabled': True, 'tag': 'penalty_curriculum', 'initial_scale': 0.1, 'min_scale': 0.0, 'max_scale': 1.0, 'level_down_threshold': 150.0, 'level_up_threshold': 750.0, 'degree': 0.00025})}, reset_terms={}, step_terms={})

simulator type:  SimulatorConfig(_target_='holosoma.simulator.mujoco.mujoco.MuJoCo', _recursive_=False, config=SimulatorInitConfig(name='mujoco', sim=SimEngineConfig(fps=200, control_decimation=4, substeps=1, physx=PhysxConfig(solver_type=1, num_position_iterations=4, num_velocity_iterations=0, num_threads=4, enable_dof_force_sensors=False, bounce_threshold_velocity=0.5), render_mode='fake', render_interval=1, max_episode_length_s=20.0), debug_viz=True, scene=SceneConfig(replicate_physics=True, asset_root=None, scene_files=None, rigid_objects=None, env_spacing=20.0), reset_manager=ResetManagerConfig(events=[]), contact_sensor_history_length=3, robot_mjcf_filter=MujocoXMLFilterCfg(enable=False, remove_lights=True, remove_ground=True, ground_names=['floor', 'ground', 'plane']), mujoco_backend=<MujocoBackend.WARP: 'warp'>, mujoco_warp=MujocoWarpConfig(nconmax_per_env=96, njmax_per_env=None), bridge=BridgeConfig(enabled=False, use_joystick=False, joystick_device=0, joystick_type='xbox', domain_id=0, interface=None, rate_limit_dt=None, use_ros=False), virtual_gantry=VirtualGantryCfg(enabled=False, attachment_body_names=['Trunk', 'torso_link', 'torso', 'base_link', 'pelvis', 'base'], stiffness=200.0, damping=100.0, height=3.0, point=None, length=0.0, apply_force=0.0, apply_force_sign=-1)))
/root/.holosoma_deps/miniconda3/envs/hsmujoco/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/root/.holosoma_deps/miniconda3/envs/hsmujoco/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
INFO:root:Setting seed: 42
2025-12-27 19:58:47.424 | INFO     | holosoma.utils.common:seeding:95 - Setting seed: 42
2025-12-27 19:58:47.425 | INFO     | __main__:train:212 - Saving wandb logs to logs/hv-g1-manager/20251227_115847-g1_29dof_manager-locomotion/.wandb
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:git.util:sys.platform='linux', git_executable='git'
DEBUG:git.cmd:Popen(['git', 'rev-parse', '--show-toplevel'], cwd=/workspace/holosoma/holosoma, stdin=None, shell=False, universal_newlines=False)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.wandb.ai:443
DEBUG:urllib3.connectionpool:https://api.wandb.ai:443 "POST /graphql HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://api.wandb.ai:443 "POST /graphql HTTP/1.1" 200 None
wandb: Currently logged in as: 891303908 (891303908-autel-robotics) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
DEBUG:git.util:sys.platform='linux', git_executable='git'
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch-check'], cwd=/workspace/holosoma/holosoma, stdin=<valid stream>, shell=False, universal_newlines=False)
wandb: Tracking run with wandb version 0.22.0
wandb: Run data is saved locally in logs/hv-g1-manager/20251227_115847-g1_29dof_manager-locomotion/.wandb/wandb/run-20251227_195848-3bv4dgzi
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run 20251227_115847_g1_29dof_manager_default_g1_29dof
wandb: ⭐️ View project at https://wandb.ai/891303908-autel-robotics/hv-g1-manager
wandb: 🚀 View run at https://wandb.ai/891303908-autel-robotics/hv-g1-manager/runs/3bv4dgzi
Warp 1.10.0 initialized:
   CUDA Toolkit 12.8, Driver 12.6
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "NVIDIA RTX 6000 Ada Generation" (47 GiB, sm_89, mempool enabled)
   Kernel cache:
     /root/.cache/warp/1.10.0
generating randomized terrains 199 / 200     
 generated all randomized terrains!
2025-12-27 19:58:54.925 | INFO     | holosoma.simulator.mujoco.mujoco:__init__:108 - === MuJoCo Simulator Initialization Started ===
2025-12-27 19:58:54.925 | INFO     | holosoma.simulator.mujoco.mujoco:__init__:109 - Device: cuda:0
2025-12-27 19:58:54.926 | INFO     | holosoma.simulator.mujoco.mujoco:__init__:110 - Simulator config: SimulatorInitConfig(name='mujoco', sim=SimEngineConfig(fps=200, control_decimation=4, substeps=1, physx=PhysxConfig(solver_type=1, num_position_iterations=4, num_velocity_iterations=0, num_threads=4, enable_dof_force_sensors=False, bounce_threshold_velocity=0.5), render_mode='fake', render_interval=1, max_episode_length_s=20.0), debug_viz=True, scene=SceneConfig(replicate_physics=True, asset_root=None, scene_files=None, rigid_objects=None, env_spacing=20.0), reset_manager=ResetManagerConfig(events=[]), contact_sensor_history_length=3, robot_mjcf_filter=MujocoXMLFilterCfg(enable=False, remove_lights=True, remove_ground=True, ground_names=['floor', 'ground', 'plane']), mujoco_backend=<MujocoBackend.WARP: 'warp'>, mujoco_warp=MujocoWarpConfig(nconmax_per_env=96, njmax_per_env=None), bridge=BridgeConfig(enabled=False, use_joystick=False, joystick_device=0, joystick_type='xbox', domain_id=0, interface=None, rate_limit_dt=None, use_ros=False), virtual_gantry=VirtualGantryCfg(enabled=False, attachment_body_names=['Trunk', 'torso_link', 'torso', 'base_link', 'pelvis', 'base'], stiffness=200.0, damping=100.0, height=3.0, point=None, length=0.0, apply_force=0.0, apply_force_sign=-1))
2025-12-27 19:58:54.926 | INFO     | holosoma.simulator.mujoco.mujoco:__init__:155 - === MuJoCo Simulator Initialization Completed ===
2025-12-27 19:58:54.926 | INFO     | holosoma.simulator.mujoco.mujoco:load_assets:318 - === Loading assets ===
2025-12-27 19:58:54.934 | INFO     | holosoma.simulator.mujoco.scene_manager:_create_hfield:274 - Shifted heightfield by 0.070m to ensure non-negative heights
2025-12-27 19:58:55.066 | INFO     | holosoma.simulator.mujoco.scene_manager:_create_hfield:292 - Created heightfield: 2400x1600, size=[{0.5 * total_length:.2f}, {0.5 * total_width:.2f}, {height_range:.3f}, {min_height_final:.3f}]
2025-12-27 19:58:55.068 | INFO     | holosoma.simulator.mujoco.scene_manager:add_robot:344 - Adding robot from: /workspace/holosoma/holosoma/src/holosoma/holosoma/data/robots/g1/g1_29dof.xml with prefix: robot_
2025-12-27 19:58:55.071 | INFO     | holosoma.simulator.mujoco.scene_manager:_configure_robot_collisions:432 - Applied collision settings to 29 geoms across 33 bodies
2025-12-27 19:58:55.071 | INFO     | holosoma.simulator.mujoco.scene_manager:compile:494 - Compiling world model using MjSpec
2025-12-27 19:58:55.640 | INFO     | holosoma.simulator.mujoco.mujoco:load_assets:342 - Initializing WarpBackend (GPU multi-environment)
2025-12-27 19:58:57.355 | INFO     | holosoma.simulator.mujoco.backends.warp_backend:__init__:106 - Initializing WarpBackend: 4096 envs on cuda:0
2025-12-27 19:58:57.355 | INFO     | holosoma.simulator.mujoco.backends.warp_backend:__init__:117 - GPU memory allocation: nconmax=96 per env, njmax=576 per env
2025-12-27 19:58:57.381 | INFO     | holosoma.simulator.mujoco.backends.warp_backend:__init__:159 - Capturing CUDA graph for simulation step...
Module mujoco_warp._src.smooth 4d4b71e load on device 'cuda:0' took 3216.47 ms  (compiled)
Module mujoco_warp._src.collision_driver e72006d load on device 'cuda:0' took 156.64 ms  (compiled)
Module _nxn_broadphase__locals__kernel_5e1f554f 5e1f554 load on device 'cuda:0' took 327.96 ms  (compiled)
Module ccd_kernel_builder__locals__ccd_kernel_81523608 8152360 load on device 'cuda:0' took 8976.50 ms  (compiled)
./run_train_MJWarp.sh: line 15: 54786 Segmentation fault      (core dumped) python src/holosoma/holosoma/train_agent.py exp:g1-29dof simulator:mjwarp logger:wandb

==========================================================================

and the install script: setup_mujoco.sh  only changed in line 99 as follows:
  #Scientific computing stack (ensure compatibility)
  #pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
  pip install -U torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
  pip install numpy scipy matplotlib

I have tried in docker container and host machine, the same problem occurs.

So,  have you ever meet the similar problem  or  could you please point out where the problem is.

Many Thanks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segmentation fault (core dumped) when MJWarp Training for Locomotion (Velocity Tracking) #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segmentation fault (core dumped) when MJWarp Training for Locomotion (Velocity Tracking) #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions