Performance degradation when controlling 6 core joints with control signal density >5, inconsistent with the paper's reported results


## Description
First of all, thank you for your excellent work and open-sourcing the code of OmniControl, which has brought great inspiration to my research on controllable human motion generation.

I have been reproducing the experiments in your ICLR 2024 paper recently, and found that the generation performance is significantly inconsistent with the results reported in the paper under the following settings, and I would like to ask for your advice on the possible causes and solutions.

### Core Problem
When I set the **control signal density >5 (i.e., number of keyframes >5, including 49 frames/25% density and 196 frames/100% density)**, and specify the 6 core interactive joints mentioned in the paper as the controllable joints, the generated motion has a huge gap with the paper's results in both control accuracy and motion realism.

### Reproduction Environment
| Item | Details |
|------|---------|
| Hardware | NVIDIA RTX 3090 |
| OS | Ubuntu 20.04 |
| PyTorch Version | 1.13.1 |
| CUDA Version | 11.7 |
| Checkpoint | Official pre-trained `Ours (on all)` checkpoint (for all joints control) |
| Dataset | HumanML3D (processed with the official preprocessing code) |
| Inference Hyperparameters | All default values from the paper: `T=1000`, `T_s=10`, `K_e=10`, `K_l=500`, guidance strength `τ` calculated with the official formula |

### Key Experimental Settings
1. **Controllable Joints Setting**
   I set `controllable_joints = np.array([0, 10, 11, 15, 20, 21])`, which corresponds to the 6 core joints mentioned in the paper:
   - 0: pelvis
   - 10: left foot
   - 11: right foot
   - 15: head
   - 20: left wrist
   - 21: right wrist
   This is completely consistent with the joint selection in the paper's "Ours (on all)" experiments.

2. **Control Signal Setting**
   - The control signals are extracted from the ground-truth motion sequences in the HumanML3D test set (consistent with the evaluation protocol in the paper)
   - Tested 2 density levels with keyframe number >=5: 5 frames, 49 frames (25% density) and 196 frames (100% density)
   - The mask of the control signal is set correctly: valid values for the target joints at the keyframes, and 0 for the rest.

### Observed Problem Phenomena
1. **Quantitative Performance Gap**
   The evaluation metrics are far worse than the results reported in Table 1 of the paper: (the case blow is test on density=5)
   - `Avg. err.` of the controlled joints is 5-10 times nearly close to the `0.0404` average value reported , but the foot skating ratio is 0.2109

2. **Visualization Phenomena**
   - The controlled joints (especially the wrists and feet) have a large position deviation from the input control signal, and cannot follow the preset trajectory
   - Severe foot sliding, unnatural limb stretching, and incoherent whole-body motion
   - The motion semantics are inconsistent with the text prompt in some cases

---

## Questions to the Authors
1. For the `Ours (on all)` model that supports 6-joint control, is there any special training strategy for multi-joint joint control in the training phase? For example, the weight of the loss function, the sampling method of the control signal for different joints, or the joint-specific guidance strength?
2. When performing dense control with density >5 (49/196 frames) for multiple joints, do we need to adjust the inference hyperparameters (such as `τ`, the number of iterations `K` in spatial guidance)? Is the default parameter in the paper only optimized for single-joint control, not for multi-joint dense control?
3. Is there a possible mismatch in the joint index? Is the index of the 6 core joints in the HumanML3D dataset used in the paper consistent with the SMPL-H 22-joint index I used above?

I can provide the complete reproduction code, full evaluation logs, and visualization videos of the generated motion at any time. Thank you again for your great work and look forward to your reply!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation when controlling 6 core joints with control signal density >5, inconsistent with the paper's reported results #31

Description

Core Problem

Reproduction Environment

Key Experimental Settings

Observed Problem Phenomena

Questions to the Authors

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Item	Details
Hardware	NVIDIA RTX 3090
OS	Ubuntu 20.04
PyTorch Version	1.13.1
CUDA Version	11.7
Checkpoint	Official pre-trained `Ours (on all)` checkpoint (for all joints control)
Dataset	HumanML3D (processed with the official preprocessing code)
Inference Hyperparameters	All default values from the paper: `T=1000`, `T_s=10`, `K_e=10`, `K_l=500`, guidance strength `τ` calculated with the official formula

Performance degradation when controlling 6 core joints with control signal density >5, inconsistent with the paper's reported results #31

Description

Description

Core Problem

Reproduction Environment

Key Experimental Settings

Observed Problem Phenomena

Questions to the Authors

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions