Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3 #36

Open
sowmiya-masterworks opened this issue Aug 21, 2024 · 7 comments

Comments

@sowmiya-masterworks
Copy link

I'm using the RetinaFace custom parser from the face-recognition-deepstream repo and encountering several issues concerning the output on keypoint detection and bounding box accuracy, using DeepStream 6.3.

Environment
Hardware: NVIDIA GeForce RTX 3060
Driver Version: 555.42.06
CUDA Version: 12.5
DeepStream Version: 6.3
Operating System: [Please specify your OS, e.g., Ubuntu 20.04]
Test Applications: DeepStream Test5 (both C++ and Python versions)
Models and Weights
Models Tried: Both ResNet50 and MobileNet architectures were tested.
Weights: The model weights used are sourced from the
[Pytorch_Retinaface repository]https://github.com/biubug6/Pytorch_Retinaface
Expected Behavior
Accurate detection and output of face landmarks and bounding boxes in the video stream.

Actual Behavior
Landmarks: Many keypoint coordinates are either zero or negative, which does not correspond to valid pixel values.
Bounding Boxes: Outputs are often unrealistic (e.g., exceedingly large dimensions).
Video Output: No detections appear in the output video.
Steps to Reproduce
Tested the setup using the DeepStream Test5 application for both primary inference engine (pgie) and secondary inference engine (sgie).
Also tested using the Python3 main.py script provided in the repository.
In all tests, inappropriate bounding boxes and keypoints were observed across different setups and models.
Console
OutputRaw output array:
output[0] = 0.937012
output[1] = 1.41797
output[2] = -1.57422
output[3] = -0.820801
output[4] = 1.58301
output[5] = 0.605469
output[6] = -0.943848
output[7] = -0.15332
output[8] = -1.27832
output[9] = 1.35547
output[10] = -0.252197
output[11] = -0.817383
output[12] = 0.0300903
output[13] = 1.14355
output[14] = -0.243774
output[15] = -0.335449
Raw output array:
output[0] = 0.937012
output[1] = 1.41797
output[2] = -1.57422
output[3] = -0.820801
output[4] = 1.58301
output[5] = 0.605469
output[6] = -0.943848
output[7] = -0.15332
output[8] = -1.27832
output[9] = 1.35547
output[10] = -0.252197
output[11] = -0.817383
output[12] = 0.0300903
output[13] = 1.14355
output[14] = -0.243774
output[15] = -0.335449
Clipped BBox: 1.41797, 0, 0, 1.58301
Detection:
Top: 0
Left: 1
Width: 4.29497e+09
Height: 1
Confidence: 0.605469
Landmarks: 0 0 -1 1 0 0 0 1 0 0
Raw output array:
output[0] = 0.935547
output[1] = 1.41797
output[2] = -1.57324
output[3] = -0.819336
output[4] = 1.58203
output[5] = 0.60498
output[6] = -0.943359
output[7] = -0.15332
output[8] = -1.27734
output[9] = 1.35352
output[10] = -0.25293
output[11] = -0.816895
output[12] = 0.0317383
output[13] = 1.14355
output[14] = -0.243042
output[15] = -0.334473
Raw output array:
output[0] = 0.935547
output[1] = 1.41797
output[2] = -1.57324
output[3] = -0.819336
output[4] = 1.58203
output[5] = 0.60498
output[6] = -0.943359
output[7] = -0.15332
output[8] = -1.27734
output[9] = 1.35352
output[10] = -0.25293
output[11] = -0.816895
output[12] = 0.0317383
output[13] = 1.14355
output[14] = -0.243042
output[15] = -0.334473
Clipped BBox: 1.41797, 0, 0, 1.58203
Detection:
Top: 0
Left: 1
Width: 4.29497e+09
Height: 1
Confidence: 0.60498
Landmarks: 0 0 -1 1 0 0 0 1 0 0

@Athuliva
Copy link

Athuliva commented Aug 21, 2024

with retinaface resnet50 i am getting correct bounding box. can you tell me how u generated engine file from https://github.com/biubug6/Pytorch_Retinaface?

@sowmiya-masterworks
Copy link
Author

sowmiya-masterworks commented Aug 21, 2024

@Athuliva https://github.com/biubug6/Pytorch_Retinaface/blob/master/convert_to_onnx.py this script for onxx converstion and for engine conversion:
/usr/src/tensorrt/bin/trtexec --onnx=FaceDetector.onnx --explicitBatch --workspace=204 --saveEngine=FaceDetector.engine --fp16 in docker deepstream 6.3

@sowmiya-masterworks
Copy link
Author

sowmiya-masterworks commented Aug 21, 2024

@Athuliva https://github.com/wang-xinyu/tensorrtx/tree/master/retinaface while using this way generated engine file, i was facing error when using with the deep stream test5 application!

@Athuliva
Copy link

Athuliva commented Aug 21, 2024

@sowmiya-masterworks have u loaded the libdecodeplugin.so while using https://github.com/wang-xinyu/tensorrtx/tree/master/retinaface

import ctypes
ctypes.cdll.LoadLibrary('/VA/retinaface_r50_63/R50/libdecodeplugin.so')

@zhouyuchong
Copy link
Owner

@sowmiya-masterworks try this, BTW, how do you decode those bbox and lmks? Since Retinaface is an anchor based model, the raw outputs should be post-processed otherwise they are unreadable.

@sowmiya-masterworks
Copy link
Author

@zhouyuchong, thanks for the suggestion! Could you provide some guidance or recommend a repository for decoding the bounding boxes and landmarks from RetinaFace inside the DeepStream environment? Since it's an anchor-based model, I understand that the raw outputs need post-processing to be interpretable, and any pointers on how to approach this within DeepStream would be greatly appreciated.

@zhouyuchong
Copy link
Owner

@sowmiya-masterworks cpp version if you use custom-lib-path in nvinfer config path, note there is no support for landmarks in official datastructure.
python version post-process. if you want to apply it to deepstream, just get raw outputs which I think you already knew, then do post-process in gst-probe callback function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants