Skip to content

Segmentation fault (core dumped) when MJWarp Training for Locomotion (Velocity Tracking) #21

@qingchenwuhou

Description

@qingchenwuhou

Very great work !

I meet one problem when I run:
source scripts/source_mujoco_setup.sh
python src/holosoma/holosoma/train_agent.py
exp:g1-29dof-fast-sac
simulator:mjwarp
logger:wandb
as following:

==========================================================================
(hsmujoco) root@autel:/workspace/holosoma/holosoma# ./run_train_MJWarp.sh
MuJoCo environment activated successfully
MuJoCo version: 3.4.0
PyTorch version: 2.7.0+cu128
MuJoCo Warp commit: 09ec1da
CurriculumManagerCfg(params={'num_compute_average_epl': 1000}, setup_terms={'average_episode_tracker': CurriculumTermCfg(func='holosoma.managers.curriculum.terms.locomotion:AverageEpisodeLengthTracker', params={}), 'penalty_curriculum': CurriculumTermCfg(func='holosoma.managers.curriculum.terms.locomotion:PenaltyCurriculum', params={'enabled': True, 'tag': 'penalty_curriculum', 'initial_scale': 0.1, 'min_scale': 0.0, 'max_scale': 1.0, 'level_down_threshold': 150.0, 'level_up_threshold': 750.0, 'degree': 0.00025})}, reset_terms={}, step_terms={})

simulator type: SimulatorConfig(target='holosoma.simulator.mujoco.mujoco.MuJoCo', recursive=False, config=SimulatorInitConfig(name='mujoco', sim=SimEngineConfig(fps=200, control_decimation=4, substeps=1, physx=PhysxConfig(solver_type=1, num_position_iterations=4, num_velocity_iterations=0, num_threads=4, enable_dof_force_sensors=False, bounce_threshold_velocity=0.5), render_mode='fake', render_interval=1, max_episode_length_s=20.0), debug_viz=True, scene=SceneConfig(replicate_physics=True, asset_root=None, scene_files=None, rigid_objects=None, env_spacing=20.0), reset_manager=ResetManagerConfig(events=[]), contact_sensor_history_length=3, robot_mjcf_filter=MujocoXMLFilterCfg(enable=False, remove_lights=True, remove_ground=True, ground_names=['floor', 'ground', 'plane']), mujoco_backend=<MujocoBackend.WARP: 'warp'>, mujoco_warp=MujocoWarpConfig(nconmax_per_env=96, njmax_per_env=None), bridge=BridgeConfig(enabled=False, use_joystick=False, joystick_device=0, joystick_type='xbox', domain_id=0, interface=None, rate_limit_dt=None, use_ros=False), virtual_gantry=VirtualGantryCfg(enabled=False, attachment_body_names=['Trunk', 'torso_link', 'torso', 'base_link', 'pelvis', 'base'], stiffness=200.0, damping=100.0, height=3.0, point=None, length=0.0, apply_force=0.0, apply_force_sign=-1)))
/root/.holosoma_deps/miniconda3/envs/hsmujoco/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the Field() function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using Annotated metadata or by assignment. This may have happened because an Annotated type alias using the type statement was used, or if the Field() function was attached to a single member of a union type.
warnings.warn(
/root/.holosoma_deps/miniconda3/envs/hsmujoco/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the Field() function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using Annotated metadata or by assignment. This may have happened because an Annotated type alias using the type statement was used, or if the Field() function was attached to a single member of a union type.
warnings.warn(
INFO:root:Setting seed: 42
2025-12-27 19:58:47.424 | INFO | holosoma.utils.common:seeding:95 - Setting seed: 42
2025-12-27 19:58:47.425 | INFO | main🚋212 - Saving wandb logs to logs/hv-g1-manager/20251227_115847-g1_29dof_manager-locomotion/.wandb
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:git.util:sys.platform='linux', git_executable='git'
DEBUG:git.cmd:Popen(['git', 'rev-parse', '--show-toplevel'], cwd=/workspace/holosoma/holosoma, stdin=None, shell=False, universal_newlines=False)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.wandb.ai:443
DEBUG:urllib3.connectionpool:https://api.wandb.ai:443 "POST /graphql HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://api.wandb.ai:443 "POST /graphql HTTP/1.1" 200 None
wandb: Currently logged in as: 891303908 (891303908-autel-robotics) to https://api.wandb.ai. Use wandb login --relogin to force relogin
DEBUG:git.util:sys.platform='linux', git_executable='git'
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch-check'], cwd=/workspace/holosoma/holosoma, stdin=, shell=False, universal_newlines=False)
wandb: Tracking run with wandb version 0.22.0
wandb: Run data is saved locally in logs/hv-g1-manager/20251227_115847-g1_29dof_manager-locomotion/.wandb/wandb/run-20251227_195848-3bv4dgzi
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run 20251227_115847_g1_29dof_manager_default_g1_29dof
wandb: ⭐️ View project at https://wandb.ai/891303908-autel-robotics/hv-g1-manager
wandb: 🚀 View run at https://wandb.ai/891303908-autel-robotics/hv-g1-manager/runs/3bv4dgzi
Warp 1.10.0 initialized:
CUDA Toolkit 12.8, Driver 12.6
Devices:
"cpu" : "x86_64"
"cuda:0" : "NVIDIA RTX 6000 Ada Generation" (47 GiB, sm_89, mempool enabled)
Kernel cache:
/root/.cache/warp/1.10.0
generating randomized terrains 199 / 200
generated all randomized terrains!
2025-12-27 19:58:54.925 | INFO | holosoma.simulator.mujoco.mujoco:init:108 - === MuJoCo Simulator Initialization Started ===
2025-12-27 19:58:54.925 | INFO | holosoma.simulator.mujoco.mujoco:init:109 - Device: cuda:0
2025-12-27 19:58:54.926 | INFO | holosoma.simulator.mujoco.mujoco:init:110 - Simulator config: SimulatorInitConfig(name='mujoco', sim=SimEngineConfig(fps=200, control_decimation=4, substeps=1, physx=PhysxConfig(solver_type=1, num_position_iterations=4, num_velocity_iterations=0, num_threads=4, enable_dof_force_sensors=False, bounce_threshold_velocity=0.5), render_mode='fake', render_interval=1, max_episode_length_s=20.0), debug_viz=True, scene=SceneConfig(replicate_physics=True, asset_root=None, scene_files=None, rigid_objects=None, env_spacing=20.0), reset_manager=ResetManagerConfig(events=[]), contact_sensor_history_length=3, robot_mjcf_filter=MujocoXMLFilterCfg(enable=False, remove_lights=True, remove_ground=True, ground_names=['floor', 'ground', 'plane']), mujoco_backend=<MujocoBackend.WARP: 'warp'>, mujoco_warp=MujocoWarpConfig(nconmax_per_env=96, njmax_per_env=None), bridge=BridgeConfig(enabled=False, use_joystick=False, joystick_device=0, joystick_type='xbox', domain_id=0, interface=None, rate_limit_dt=None, use_ros=False), virtual_gantry=VirtualGantryCfg(enabled=False, attachment_body_names=['Trunk', 'torso_link', 'torso', 'base_link', 'pelvis', 'base'], stiffness=200.0, damping=100.0, height=3.0, point=None, length=0.0, apply_force=0.0, apply_force_sign=-1))
2025-12-27 19:58:54.926 | INFO | holosoma.simulator.mujoco.mujoco:init:155 - === MuJoCo Simulator Initialization Completed ===
2025-12-27 19:58:54.926 | INFO | holosoma.simulator.mujoco.mujoco:load_assets:318 - === Loading assets ===
2025-12-27 19:58:54.934 | INFO | holosoma.simulator.mujoco.scene_manager:_create_hfield:274 - Shifted heightfield by 0.070m to ensure non-negative heights
2025-12-27 19:58:55.066 | INFO | holosoma.simulator.mujoco.scene_manager:create_hfield:292 - Created heightfield: 2400x1600, size=[{0.5 * total_length:.2f}, {0.5 * total_width:.2f}, {height_range:.3f}, {min_height_final:.3f}]
2025-12-27 19:58:55.068 | INFO | holosoma.simulator.mujoco.scene_manager:add_robot:344 - Adding robot from: /workspace/holosoma/holosoma/src/holosoma/holosoma/data/robots/g1/g1_29dof.xml with prefix: robot

2025-12-27 19:58:55.071 | INFO | holosoma.simulator.mujoco.scene_manager:_configure_robot_collisions:432 - Applied collision settings to 29 geoms across 33 bodies
2025-12-27 19:58:55.071 | INFO | holosoma.simulator.mujoco.scene_manager:compile:494 - Compiling world model using MjSpec
2025-12-27 19:58:55.640 | INFO | holosoma.simulator.mujoco.mujoco:load_assets:342 - Initializing WarpBackend (GPU multi-environment)
2025-12-27 19:58:57.355 | INFO | holosoma.simulator.mujoco.backends.warp_backend:init:106 - Initializing WarpBackend: 4096 envs on cuda:0
2025-12-27 19:58:57.355 | INFO | holosoma.simulator.mujoco.backends.warp_backend:init:117 - GPU memory allocation: nconmax=96 per env, njmax=576 per env
2025-12-27 19:58:57.381 | INFO | holosoma.simulator.mujoco.backends.warp_backend:init:159 - Capturing CUDA graph for simulation step...
Module mujoco_warp._src.smooth 4d4b71e load on device 'cuda:0' took 3216.47 ms (compiled)
Module mujoco_warp._src.collision_driver e72006d load on device 'cuda:0' took 156.64 ms (compiled)
Module _nxn_broadphase__locals__kernel_5e1f554f 5e1f554 load on device 'cuda:0' took 327.96 ms (compiled)
Module ccd_kernel_builder__locals__ccd_kernel_81523608 8152360 load on device 'cuda:0' took 8976.50 ms (compiled)
./run_train_MJWarp.sh: line 15: 54786 Segmentation fault (core dumped) python src/holosoma/holosoma/train_agent.py exp:g1-29dof simulator:mjwarp logger:wandb

==========================================================================

and the install script: setup_mujoco.sh only changed in line 99 as follows:
#Scientific computing stack (ensure compatibility)
#pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install -U torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
pip install numpy scipy matplotlib

I have tried in docker container and host machine, the same problem occurs.

So, have you ever meet the similar problem or could you please point out where the problem is.

Many Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions