Generated video not consistent with LiDAR control when using AV-Sample model (control weight = 1.0)

Hi,
I’m trying to generate a video using a text prompt and LiDAR control only, with the AV-Sample model.
The control weight is set to 1.0, so I would expect the generated video to strictly follow the control signal (LiDAR, in this case).

However, I’ve noticed that the generated video deviates from the LiDAR control even at this maximum control weight.
In contrast, when using depth control, the generated output remains much more consistent with the input control.

**Here is the code I’m running:**
``` 
PYTHONPATH="$(pwd)" torchrun --nproc_per_node="${NUM_GPU}" --nnodes=1 --node_rank=0 cosmos_transfer1/diffusion/inference/transfer.py --checkpoint_dir "${CHECKPOINT_DIR}" \
  --video_save_folder "my_output \
  --controlnet_specs "cosmos_lidar_defult.json" \
  --is_av_sample \
  --sigma_max 80 \
  --fps 30 \
  --num_gpus "${NUM_GPU}" \
  --batch_input_path "waymo_reg_3_spec.json" 
```

**Here is the spec files:**
`cosmos_lidar_defult.json`
```
{
    "prompt": "The video is captured from a camera mounted on a car. The camera is facing forward. The video depicts a road with a clear blue sky overhead and a few scattered clouds. The road is lined with palm trees and power lines, and there are a few cars driving in both directions. The road appears to be in a suburban area, with houses and buildings visible on either side. The weather is sunny and clear, with no signs of rain or clouds. The time of day is not specified, but the lighting suggests it is daytime.",
    "lidar": {
        "input_control": "lidar_control_vid_defult.mp4",
        "control_weight": 1.0
    }
}
```
`waymo_reg_2_spec.json`
```
{"prompt": "The video is captured from a camera mounted on a car. The camera is facing forward. The video depicts a road with a clear blue sky overhead and a few scattered clouds. The road is lined with palm trees and power lines, and there are a few cars driving in both directions. The road appears to be in a suburban area, with houses and buildings visible on either side. The weather is sunny and clear, with no signs of rain or clouds. The time of day is not specified, but the lighting suggests it is daytime.", "control_overrides": {"lidar": {"input_control": "lidar_control_vid_0.mp4", "control_weight": 1.0}}, "video_save_name": "0_gen_vid"}
{"prompt": "The video is captured from a camera mounted on a car. The camera is facing forward. The video depicts a road with a clear blue sky overhead and a few scattered clouds. The road is lined with palm trees and power lines, and there are a few cars driving in both directions. The road appears to be in a suburban area, with houses and buildings visible on either side. The weather is sunny and clear, with no signs of rain or clouds. The time of day is not specified, but the lighting suggests it is daytime.", "control_overrides": {"lidar": {"input_control": "lidar_control_vid_1.mp4", "control_weight": 1.0}}, "video_save_name": "1_gen_vid"}

```

**Here are the results:**

<img width="2000" height="618" alt="Image" src="https://github.com/user-attachments/assets/618482b9-5167-42a9-b575-6f310072f1a7" />

<img width="2000" height="526" alt="Image" src="https://github.com/user-attachments/assets/3530e83b-b939-4b97-9693-988174a20e9f" />


You can observe clear inconsistencies in the highlighted zoomed-in regions between the LiDAR signal and the generated video
(left to right: generated video, LiDAR control, LiDAR overlaid on generated video).

**My question is:**
Is there a way to make the generated video more strictly consistent with the LiDAR control signal (even if this results in a less visually pleasing generation)?

@caotians1 @pjannaty 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generated video not consistent with LiDAR control when using AV-Sample model (control weight = 1.0) #226

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Generated video not consistent with LiDAR control when using AV-Sample model (control weight = 1.0) #226

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions