Skip to content

Conversation

SamratThapa120
Copy link
Collaborator

@SamratThapa120 SamratThapa120 commented Aug 26, 2025

This pull request introduces significant improvements and updates to the BEVFusion-CL base and offline configurations, focusing on data preprocessing, augmentation, model configuration, and documentation. The changes enhance training stability, improve augmentation consistency, and update documentation to reflect the latest evaluation results.

Key changes include:

Data Augmentation and Preprocessing

  • Refactored the sample_augmentation method in transforms_3d.py to handle scalar resize_lim values, ensuring consistent resizing behavior and more robust augmentation during training and testing. Rotation now uses bicubic resampling to reduce artifacts. (projects/BEVFusion/bevfusion/transforms_3d.py) [1] [2] [3]
  • Updated the ImageAug3D pipeline in both training and testing to use a maximum resize limit scalar resize_lim=0.02, allowing training with images with various aspect ratios. (projects/BEVFusion/configs/t4dataset/BEVFusion-CL-offline/bevfusion_camera_lidar_offline_voxel_second_secfpn_4xb8_base.py, projects/BEVFusion/configs/t4dataset/BEVFusion-CL/bevfusion_camera_lidar_voxel_second_secfpn_4xb8_base.py) [1] [2] [3] [4]

Model and Config Updates

  • Major overhaul of the BEVFusion-CL-offline config: increased train_gpu_size, adjusted image_size, feature_size, dbound, and other model parameters for better performance and scalability; added filter_cfg to filter frames with missing images; and enabled automatic learning rate scaling. (projects/BEVFusion/configs/t4dataset/BEVFusion-CL-offline/bevfusion_camera_lidar_offline_voxel_second_secfpn_4xb8_base.py) [1] [2] [3] [4] [5] [6] [7] [8] [9]
  • Updated the BEVFusion-CL base config with new image_size, feature_size, and dbound values, and enabled automatic learning rate scaling. (projects/BEVFusion/configs/t4dataset/BEVFusion-CL/bevfusion_camera_lidar_voxel_second_secfpn_4xb8_base.py) [1] [2] [3]

Documentation and Evaluation Results

  • Added a new documentation page summarizing the deployed BEVFusion-CL-offline base/2.X model, including training and evaluation details, metrics, and links to resources. (projects/BEVFusion/docs/BEVFusion-CL-offline/v2/base.md)
  • Updated the BEVFusion-CL base documentation to include results for a new evaluation split (C). (projects/BEVFusion/docs/BEVFusion-CL/v2/base.md)

These changes collectively improve the robustness, reproducibility, and clarity of the BEVFusion-CL and BEVFusion-CL-offline pipelines, and provide up-to-date documentation for users and collaborators.

Improvement in bevfusion-CL base/2.0.0 before and after the changes

Eval range: 120m mAP car truck bus bicycle pedestrian
BEVFusion-CL base/2.0.0 (B) 75.03 79.62 61.20 86.67 69.99 77.62
BEVFusion-CL base/2.0.0 (C) 76.3 80.50 61.90 85.90 74.70 78.70
BEVFusion-CL-offline base/2.0.0 (C) 77.8 87.30 61.60 85.90 73.20 80.90
  • BEVFusion-CL base/2.0.0 (B): Without intensity and training pedestrians without pooling
  • BEVFusion-CL base/2.0.0 (C): Same as BEVFusion-CL base/2.0.0 (B) with improved image ROI cropping, and augmentation parameter fixes.

@SamratThapa120 SamratThapa120 changed the title chore(bevfusion): update image rois appropriately for t4datasets chore(bevfusion): update parameters for improved bevfusion-cl training Aug 26, 2025
@SamratThapa120 SamratThapa120 marked this pull request as ready for review September 8, 2025 09:06
Copy link
Collaborator

@KSeangTan KSeangTan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just need to tidy up documentation a little bit

if flip:
img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
img = img.rotate(rotate)
img = img.rotate(rotate, resample=Image.BICUBIC) # Default rotation introduces artifacts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if you have examples showing artifacts with the default rotation

zbound=[-10.0, 10.0, 20.0],
# dbound=[1.0, 60.0, 0.5],
dbound=[1.0, 166.2, 1.4],
dbound=[1.0, 134, 1.4],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we change it to 134, I am thinking we should make the depth and bin size even smaller, and make sure it's evenly divided by the bin size?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin size: 1.4 could be a little big too large

# - `base_batch_size` = (8 GPUs) x (4 samples per GPU).
# auto_scale_lr = dict(enable=False, base_batch_size=32)
auto_scale_lr = dict(enable=False, base_batch_size=train_gpu_size * train_batch_size)
auto_scale_lr = dict(enable=True, base_batch_size=4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the comment, and any reason we set it to True? Does it show any significant improvement/stability for training?

if train_gpu_size > 1:
sync_bn = "torch"

randomness = dict(seed=0, diff_rank_seed=False, deterministic=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we delete it? I believe we need to keep it for reproducibility

# - `base_batch_size` = (8 GPUs) x (4 samples per GPU).
# auto_scale_lr = dict(enable=False, base_batch_size=32)
auto_scale_lr = dict(enable=False, base_batch_size=train_gpu_size * train_batch_size)
auto_scale_lr = dict(enable=True, base_batch_size=32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_batch_size should be batch size per gpu, which should be train_batch_size according to here
https://github.com/open-mmlab/mmengine/blob/main/mmengine/_strategy/base.py#L696


- BEVFusion-CL base/2.0.0 (A): Without intensity and training pedestrians with pooling pedestrians
- BEVFusion-CL base/2.0.0 (B): Same as `BEVFusion-CL base/2.0.0 (A)` without pooling pedestrians
- BEVFusion-CL base/2.0.0 (C): Same as `BEVFusion-CL base/2.0.0 (B)` with improved image ROI cropping, and augmentation parameter fixes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you meant
BEVFusion-CL base/2.0.0 (A) is without pooling pedestrians, and BEVFusion-CL base/2.0.0 (B) with pooling pedestrians? Otherwise, the performance in pedestrians doesn't make sense to me

zbound=[-10.0, 10.0, 20.0],
# dbound=[1.0, 60.0, 0.5],
dbound=[1.0, 166.2, 1.4],
dbound=[1.0, 134, 1.4],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants