Skip to content

Conversation

knzo25
Copy link
Contributor

@knzo25 knzo25 commented May 19, 2025

Summary

this PR ports Pointcept's PTv3 with the following features:

  • t4dataset support
  • onnx deployment support
  • most of the original codebase removed since we only want ptv3

Change point

Same as the summary

Note

Since the onnx compatible spconv had to be modified, BEVFusion and other spconv dependent modules should be trained with spconv from now instead of mmcv's implementation

Test performed

Before NaN fix

Logs [TIER IV INTERNAL LINK]

[2025-04-25 02:10:16,386 INFO test.py line 339 2191] Val result: mIoU/mAcc/allAcc 0.7411/0.8754/0.9103
[2025-04-25 02:10:16,386 INFO test.py line 345 2191] Class_0 - vehicle Result: iou/accuracy 0.9688/0.9838
[2025-04-25 02:10:16,386 INFO test.py line 345 2191] Class_1 - bicycle Result: iou/accuracy 0.3464/0.8544
[2025-04-25 02:10:16,386 INFO test.py line 345 2191] Class_2 - pedestrian Result: iou/accuracy 0.6848/0.7068
[2025-04-25 02:10:16,386 INFO test.py line 345 2191] Class_3 - road Result: iou/accuracy 0.9278/0.9616
[2025-04-25 02:10:16,386 INFO test.py line 345 2191] Class_4 - vegetation Result: iou/accuracy 0.7076/0.8744
[2025-04-25 02:10:16,386 INFO test.py line 345 2191] Class_5 - obstacle Result: iou/accuracy 0.8111/0.8714

After NaN fix

Logs [TIER IV INTERNAL LINK]

[2025-10-06 07:42:24,038 INFO test.py line 226 4469] Test: 1346372774a4ace253a23e3a2e66fe5f [696/696]-178114 Batch 4.134 (4.030) Accuracy 0.9575 (0.8678) mIoU 0.7423 (0.8008)                                                                                                   
[2025-10-06 07:42:24,103 INFO test.py line 243 4469] Syncing ...                                                                                                                                                                                                                 
[2025-10-06 07:42:24,104 INFO test.py line 269 4469] Val result: mIoU/mAcc/allAcc 0.8008/0.8678/0.9285                                                                                                                                                                           
[2025-10-06 07:42:24,105 INFO test.py line 271 4469] Class_0 - vehicle Result: iou/accuracy 0.9717/0.9880                                                                                                                                                                        
[2025-10-06 07:42:24,105 INFO test.py line 271 4469] Class_1 - bicycle Result: iou/accuracy 0.4604/0.5701                                                                                                                                                                        
[2025-10-06 07:42:24,105 INFO test.py line 271 4469] Class_2 - pedestrian Result: iou/accuracy 0.8322/0.8944                                                                                                                                                                     
[2025-10-06 07:42:24,105 INFO test.py line 271 4469] Class_3 - road Result: iou/accuracy 0.9336/0.9540                                                                                                                                                                           
[2025-10-06 07:42:24,105 INFO test.py line 271 4469] Class_4 - vegetation Result: iou/accuracy 0.7542/0.8863                                                                                                                                                                     
[2025-10-06 07:42:24,105 INFO test.py line 271 4469] Class_5 - obstacle Result: iou/accuracy 0.8523/0.9138                                                                                                                                                                       
[2025-10-06 07:42:24,105 INFO test.py line 279 4469] <<<<<<<<<<<<<<<<< End Evaluation <<<<<<<<<<<<<<<<<

knzo25 and others added 12 commits April 16, 2025 00:26
…ve more and awml-fy it (can train/test)

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
…neralize yet. no idea how many errors will appear in tensorrt yet

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
 - limited range on eval
 - used max spatial shape throughout the network for tensorrt generalization. inference may have changed somewhat so may need to retrain

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
@amadeuszsz
Copy link
Collaborator

@knzo25
Sorry for late response!
Are you still able to run environment and deploy ONNX for latest model (link)? I followed your instruction in Readme file, but seems the deployment script doesn't work due to missing ConcatDataset (I guess the true issue lies somewhere else).

@scepter914
Copy link
Collaborator

Memo
As whole design of AWML, changes of this PR looks great to me.
I asked to review code-level for @amadeuszsz 🙏

@knzo25
Copy link
Contributor Author

knzo25 commented Jun 15, 2025

@amadeuszsz
Can you look for a model compatible with the one I submitted in autowarefoundation/autoware_universe#10600?

The one you provided is 5cm per voxel, but for "real time" I recommend the 10cm one

@amadeuszsz
Copy link
Collaborator

@amadeuszsz Can you look for a model compatible with the one I submitted in autowarefoundation/autoware_universe#10600?

The one you provided is 5cm per voxel, but for "real time" I recommend the 10cm one

@knzo25
I confirm that the two available models use a grid size of 5 cm. Apart from these models, I can't find anything else in provided documentation

@amadeuszsz amadeuszsz removed the request for review from scepter914 July 3, 2025 06:26
Copy link
Collaborator

@amadeuszsz amadeuszsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR overall, but couldn't test as we miss some files.
Unfortunately @knzo25 is not available to look into it, so we may have to delve deeper into this issue (if there is time allocation)

@KSeangTan
Copy link
Collaborator

Hi @amadeuszsz @knzo25
Is the PR still ongoing? Otherwise, we can assign someone else to take over if you dont mind

@amadeuszsz
Copy link
Collaborator

amadeuszsz commented Sep 10, 2025

@KSeangTan

The code changes attached in review solve most of issues and I can push them. However, the issue regarding NaN loss still exists. Unfortunately, now I have no spare time in order to deeply investigate this issue, so if there is someone else who can take a look on this dataset issue, please let me know 🙏🏻

EDIT:
Already pushed fixes, NaN issue still has to be solved.

Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
@KSeangTan
Copy link
Collaborator

Thanks @amadeuszsz
Do you think we can close the PR first, and leave a TODO and take a look at this once we have more buffer?

@amadeuszsz
Copy link
Collaborator

@KSeangTan
Ok, then let me look at it once again. If I will not be able to find the source of issue with our dataset, we merge it with TODO comment.

Signed-off-by: Amadeusz Szymko <[email protected]>
@amadeuszsz
Copy link
Collaborator

amadeuszsz commented Sep 24, 2025

For now, we also have another issue: we can export to ONNX, but when our ROS node builds the engine, the TRT backend somehow assigns a static shape to the input tensors, even though I can see correctly defined dynamic axes in the ONNX file.

I see that this static shape overlaps with one of the GEMM block constants (160~ k). I believe the ONNX backend uses the concrete value from the sample input data and bakes it into the graph as a Constant node. Then in TRT:

[V] [TRT] Parsing node: /model/backbone/enc/enc0/block0/cpe/cpe.1/Constant [Constant]
[V] [TRT] /model/backbone/enc/enc0/block0/cpe/cpe.1/Constant [Constant] outputs: [/model/backbone/enc/enc0/block0/cpe/cpe.1/Constant_output_0 -> (161089, 32)[FLOAT]]

which further results with Nx161089 input tensor shapes.

Now we can't deploy ONNX properly, so ROS node cannot be merged as well... I will try to find the root cause.

Edit: Fixed. By accident I used wrong spconv implementation. Now I just need to properly make this project able to train and export without code modification. Also right now testing fix for crash during training.

Copy link
Collaborator

@amadeuszsz amadeuszsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Note:

  • We can deploy the model.
  • We still have an issue with NaNs during training, which later causes training loop crash. This issue is during investigation and I hope we can address it soon.
  • Code cleanup after NaNs issue fix.
  • Need to add dataset description.

@amadeuszsz
Copy link
Collaborator

@knzo25
NaN loss solved with 95f859f. Updated logs in PR description

amadeuszsz and others added 11 commits October 6, 2025 22:58
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
Signed-off-by: Amadeusz Szymko <[email protected]>
@amadeuszsz amadeuszsz requested review from KSeangTan and removed request for SamratThapa120 October 8, 2025 04:57
Copy link
Collaborator

@KSeangTan KSeangTan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, let's approve and merge first

@amadeuszsz amadeuszsz merged commit f25b474 into tier4:main Oct 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants