-
Notifications
You must be signed in to change notification settings - Fork 15
chore(bevfusion): update parameters for improved bevfusion-cl training #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Samrat Thapa <[email protected]>
Signed-off-by: Samrat Thapa <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just need to tidy up documentation a little bit
if flip: | ||
img = img.transpose(method=Image.FLIP_LEFT_RIGHT) | ||
img = img.rotate(rotate) | ||
img = img.rotate(rotate, resample=Image.BICUBIC) # Default rotation introduces artifacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if you have examples showing artifacts with the default rotation
zbound=[-10.0, 10.0, 20.0], | ||
# dbound=[1.0, 60.0, 0.5], | ||
dbound=[1.0, 166.2, 1.4], | ||
dbound=[1.0, 134, 1.4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we change it to 134
, I am thinking we should make the depth and bin size even smaller, and make sure it's evenly divided by the bin size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bin size: 1.4
could be a little big too large
# - `base_batch_size` = (8 GPUs) x (4 samples per GPU). | ||
# auto_scale_lr = dict(enable=False, base_batch_size=32) | ||
auto_scale_lr = dict(enable=False, base_batch_size=train_gpu_size * train_batch_size) | ||
auto_scale_lr = dict(enable=True, base_batch_size=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the comment, and any reason we set it to True
? Does it show any significant improvement/stability for training?
if train_gpu_size > 1: | ||
sync_bn = "torch" | ||
|
||
randomness = dict(seed=0, diff_rank_seed=False, deterministic=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we delete it? I believe we need to keep it for reproducibility
# - `base_batch_size` = (8 GPUs) x (4 samples per GPU). | ||
# auto_scale_lr = dict(enable=False, base_batch_size=32) | ||
auto_scale_lr = dict(enable=False, base_batch_size=train_gpu_size * train_batch_size) | ||
auto_scale_lr = dict(enable=True, base_batch_size=32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_batch_size
should be batch size per gpu, which should be train_batch_size
according to here
https://github.com/open-mmlab/mmengine/blob/main/mmengine/_strategy/base.py#L696
|
||
- BEVFusion-CL base/2.0.0 (A): Without intensity and training pedestrians with pooling pedestrians | ||
- BEVFusion-CL base/2.0.0 (B): Same as `BEVFusion-CL base/2.0.0 (A)` without pooling pedestrians | ||
- BEVFusion-CL base/2.0.0 (C): Same as `BEVFusion-CL base/2.0.0 (B)` with improved image ROI cropping, and augmentation parameter fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you meant
BEVFusion-CL base/2.0.0 (A)
is without pooling pedestrians, and BEVFusion-CL base/2.0.0 (B)
with pooling pedestrians? Otherwise, the performance in pedestrians doesn't make sense to me
zbound=[-10.0, 10.0, 20.0], | ||
# dbound=[1.0, 60.0, 0.5], | ||
dbound=[1.0, 166.2, 1.4], | ||
dbound=[1.0, 134, 1.4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above
This pull request introduces significant improvements and updates to the BEVFusion-CL base and offline configurations, focusing on data preprocessing, augmentation, model configuration, and documentation. The changes enhance training stability, improve augmentation consistency, and update documentation to reflect the latest evaluation results.
Key changes include:
Data Augmentation and Preprocessing
sample_augmentation
method intransforms_3d.py
to handle scalarresize_lim
values, ensuring consistent resizing behavior and more robust augmentation during training and testing. Rotation now uses bicubic resampling to reduce artifacts. (projects/BEVFusion/bevfusion/transforms_3d.py
) [1] [2] [3]ImageAug3D
pipeline in both training and testing to use a maximum resize limit scalarresize_lim=0.02
, allowing training with images with various aspect ratios. (projects/BEVFusion/configs/t4dataset/BEVFusion-CL-offline/bevfusion_camera_lidar_offline_voxel_second_secfpn_4xb8_base.py
,projects/BEVFusion/configs/t4dataset/BEVFusion-CL/bevfusion_camera_lidar_voxel_second_secfpn_4xb8_base.py
) [1] [2] [3] [4]Model and Config Updates
train_gpu_size
, adjustedimage_size
,feature_size
,dbound
, and other model parameters for better performance and scalability; addedfilter_cfg
to filter frames with missing images; and enabled automatic learning rate scaling. (projects/BEVFusion/configs/t4dataset/BEVFusion-CL-offline/bevfusion_camera_lidar_offline_voxel_second_secfpn_4xb8_base.py
) [1] [2] [3] [4] [5] [6] [7] [8] [9]image_size
,feature_size
, anddbound
values, and enabled automatic learning rate scaling. (projects/BEVFusion/configs/t4dataset/BEVFusion-CL/bevfusion_camera_lidar_voxel_second_secfpn_4xb8_base.py
) [1] [2] [3]Documentation and Evaluation Results
projects/BEVFusion/docs/BEVFusion-CL-offline/v2/base.md
)projects/BEVFusion/docs/BEVFusion-CL/v2/base.md
)These changes collectively improve the robustness, reproducibility, and clarity of the BEVFusion-CL and BEVFusion-CL-offline pipelines, and provide up-to-date documentation for users and collaborators.
Improvement in bevfusion-CL base/2.0.0 before and after the changes
BEVFusion-CL base/2.0.0 (B)
with improved image ROI cropping, and augmentation parameter fixes.