Skip to content

Performance degradation when controlling 6 core joints with control signal density >5, inconsistent with the paper's reported results #31

@Litt1eMarsun

Description

@Litt1eMarsun

Description

First of all, thank you for your excellent work and open-sourcing the code of OmniControl, which has brought great inspiration to my research on controllable human motion generation.

I have been reproducing the experiments in your ICLR 2024 paper recently, and found that the generation performance is significantly inconsistent with the results reported in the paper under the following settings, and I would like to ask for your advice on the possible causes and solutions.

Core Problem

When I set the control signal density >5 (i.e., number of keyframes >5, including 49 frames/25% density and 196 frames/100% density), and specify the 6 core interactive joints mentioned in the paper as the controllable joints, the generated motion has a huge gap with the paper's results in both control accuracy and motion realism.

Reproduction Environment

Item Details
Hardware NVIDIA RTX 3090
OS Ubuntu 20.04
PyTorch Version 1.13.1
CUDA Version 11.7
Checkpoint Official pre-trained Ours (on all) checkpoint (for all joints control)
Dataset HumanML3D (processed with the official preprocessing code)
Inference Hyperparameters All default values from the paper: T=1000, T_s=10, K_e=10, K_l=500, guidance strength τ calculated with the official formula

Key Experimental Settings

  1. Controllable Joints Setting
    I set controllable_joints = np.array([0, 10, 11, 15, 20, 21]), which corresponds to the 6 core joints mentioned in the paper:

    • 0: pelvis
    • 10: left foot
    • 11: right foot
    • 15: head
    • 20: left wrist
    • 21: right wrist
      This is completely consistent with the joint selection in the paper's "Ours (on all)" experiments.
  2. Control Signal Setting

    • The control signals are extracted from the ground-truth motion sequences in the HumanML3D test set (consistent with the evaluation protocol in the paper)
    • Tested 2 density levels with keyframe number >=5: 5 frames, 49 frames (25% density) and 196 frames (100% density)
    • The mask of the control signal is set correctly: valid values for the target joints at the keyframes, and 0 for the rest.

Observed Problem Phenomena

  1. Quantitative Performance Gap
    The evaluation metrics are far worse than the results reported in Table 1 of the paper: (the case blow is test on density=5)

    • Avg. err. of the controlled joints is 5-10 times nearly close to the 0.0404 average value reported , but the foot skating ratio is 0.2109
  2. Visualization Phenomena

    • The controlled joints (especially the wrists and feet) have a large position deviation from the input control signal, and cannot follow the preset trajectory
    • Severe foot sliding, unnatural limb stretching, and incoherent whole-body motion
    • The motion semantics are inconsistent with the text prompt in some cases

Questions to the Authors

  1. For the Ours (on all) model that supports 6-joint control, is there any special training strategy for multi-joint joint control in the training phase? For example, the weight of the loss function, the sampling method of the control signal for different joints, or the joint-specific guidance strength?
  2. When performing dense control with density >5 (49/196 frames) for multiple joints, do we need to adjust the inference hyperparameters (such as τ, the number of iterations K in spatial guidance)? Is the default parameter in the paper only optimized for single-joint control, not for multi-joint dense control?
  3. Is there a possible mismatch in the joint index? Is the index of the 6 core joints in the HumanML3D dataset used in the paper consistent with the SMPL-H 22-joint index I used above?

I can provide the complete reproduction code, full evaluation logs, and visualization videos of the generated motion at any time. Thank you again for your great work and look forward to your reply!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions