Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3 #36

sowmiya-masterworks · 2024-08-21T13:02:02Z

I'm using the RetinaFace custom parser from the face-recognition-deepstream repo and encountering several issues concerning the output on keypoint detection and bounding box accuracy, using DeepStream 6.3.

Environment
Hardware: NVIDIA GeForce RTX 3060
Driver Version: 555.42.06
CUDA Version: 12.5
DeepStream Version: 6.3
Operating System: [Please specify your OS, e.g., Ubuntu 20.04]
Test Applications: DeepStream Test5 (both C++ and Python versions)
Models and Weights
Models Tried: Both ResNet50 and MobileNet architectures were tested.
Weights: The model weights used are sourced from the
[Pytorch_Retinaface repository]https://github.com/biubug6/Pytorch_Retinaface
Expected Behavior
Accurate detection and output of face landmarks and bounding boxes in the video stream.

Actual Behavior
Landmarks: Many keypoint coordinates are either zero or negative, which does not correspond to valid pixel values.
Bounding Boxes: Outputs are often unrealistic (e.g., exceedingly large dimensions).
Video Output: No detections appear in the output video.
Steps to Reproduce
Tested the setup using the DeepStream Test5 application for both primary inference engine (pgie) and secondary inference engine (sgie).
Also tested using the Python3 main.py script provided in the repository.
In all tests, inappropriate bounding boxes and keypoints were observed across different setups and models.
Console
OutputRaw output array:
output[0] = 0.937012
output[1] = 1.41797
output[2] = -1.57422
output[3] = -0.820801
output[4] = 1.58301
output[5] = 0.605469
output[6] = -0.943848
output[7] = -0.15332
output[8] = -1.27832
output[9] = 1.35547
output[10] = -0.252197
output[11] = -0.817383
output[12] = 0.0300903
output[13] = 1.14355
output[14] = -0.243774
output[15] = -0.335449
Raw output array:
output[0] = 0.937012
output[1] = 1.41797
output[2] = -1.57422
output[3] = -0.820801
output[4] = 1.58301
output[5] = 0.605469
output[6] = -0.943848
output[7] = -0.15332
output[8] = -1.27832
output[9] = 1.35547
output[10] = -0.252197
output[11] = -0.817383
output[12] = 0.0300903
output[13] = 1.14355
output[14] = -0.243774
output[15] = -0.335449
Clipped BBox: 1.41797, 0, 0, 1.58301
Detection:
Top: 0
Left: 1
Width: 4.29497e+09
Height: 1
Confidence: 0.605469
Landmarks: 0 0 -1 1 0 0 0 1 0 0
Raw output array:
output[0] = 0.935547
output[1] = 1.41797
output[2] = -1.57324
output[3] = -0.819336
output[4] = 1.58203
output[5] = 0.60498
output[6] = -0.943359
output[7] = -0.15332
output[8] = -1.27734
output[9] = 1.35352
output[10] = -0.25293
output[11] = -0.816895
output[12] = 0.0317383
output[13] = 1.14355
output[14] = -0.243042
output[15] = -0.334473
Raw output array:
output[0] = 0.935547
output[1] = 1.41797
output[2] = -1.57324
output[3] = -0.819336
output[4] = 1.58203
output[5] = 0.60498
output[6] = -0.943359
output[7] = -0.15332
output[8] = -1.27734
output[9] = 1.35352
output[10] = -0.25293
output[11] = -0.816895
output[12] = 0.0317383
output[13] = 1.14355
output[14] = -0.243042
output[15] = -0.334473
Clipped BBox: 1.41797, 0, 0, 1.58203
Detection:
Top: 0
Left: 1
Width: 4.29497e+09
Height: 1
Confidence: 0.60498
Landmarks: 0 0 -1 1 0 0 0 1 0 0

Athuliva · 2024-08-21T13:09:28Z

with retinaface resnet50 i am getting correct bounding box. can you tell me how u generated engine file from https://github.com/biubug6/Pytorch_Retinaface?

sowmiya-masterworks · 2024-08-21T13:12:59Z

@Athuliva https://github.com/biubug6/Pytorch_Retinaface/blob/master/convert_to_onnx.py this script for onxx converstion and for engine conversion:
/usr/src/tensorrt/bin/trtexec --onnx=FaceDetector.onnx --explicitBatch --workspace=204 --saveEngine=FaceDetector.engine --fp16 in docker deepstream 6.3

sowmiya-masterworks · 2024-08-21T13:14:12Z

@Athuliva https://github.com/wang-xinyu/tensorrtx/tree/master/retinaface while using this way generated engine file, i was facing error when using with the deep stream test5 application!

Athuliva · 2024-08-21T14:26:14Z

@sowmiya-masterworks have u loaded the libdecodeplugin.so while using https://github.com/wang-xinyu/tensorrtx/tree/master/retinaface

import ctypes
ctypes.cdll.LoadLibrary('/VA/retinaface_r50_63/R50/libdecodeplugin.so')

zhouyuchong · 2024-08-22T01:27:54Z

@sowmiya-masterworks try this, BTW, how do you decode those bbox and lmks? Since Retinaface is an anchor based model, the raw outputs should be post-processed otherwise they are unreadable.

sowmiya-masterworks · 2024-08-22T14:52:46Z

@zhouyuchong, thanks for the suggestion! Could you provide some guidance or recommend a repository for decoding the bounding boxes and landmarks from RetinaFace inside the DeepStream environment? Since it's an anchor-based model, I understand that the raw outputs need post-processing to be interpretable, and any pointers on how to approach this within DeepStream would be greatly appreciated.

zhouyuchong · 2024-08-23T05:58:26Z

@sowmiya-masterworks cpp version if you use custom-lib-path in nvinfer config path, note there is no support for landmarks in official datastructure.
python version post-process. if you want to apply it to deepstream, just get raw outputs which I think you already knew, then do post-process in gst-probe callback function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3 #36

Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3 #36

sowmiya-masterworks commented Aug 21, 2024

Athuliva commented Aug 21, 2024 •

edited

Loading

sowmiya-masterworks commented Aug 21, 2024 •

edited

Loading

sowmiya-masterworks commented Aug 21, 2024 •

edited

Loading

Athuliva commented Aug 21, 2024 •

edited

Loading

zhouyuchong commented Aug 22, 2024

sowmiya-masterworks commented Aug 22, 2024

zhouyuchong commented Aug 23, 2024

Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3 #36

Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3 #36

Comments

sowmiya-masterworks commented Aug 21, 2024

Athuliva commented Aug 21, 2024 • edited Loading

sowmiya-masterworks commented Aug 21, 2024 • edited Loading

sowmiya-masterworks commented Aug 21, 2024 • edited Loading

Athuliva commented Aug 21, 2024 • edited Loading

zhouyuchong commented Aug 22, 2024

sowmiya-masterworks commented Aug 22, 2024

zhouyuchong commented Aug 23, 2024

Athuliva commented Aug 21, 2024 •

edited

Loading

sowmiya-masterworks commented Aug 21, 2024 •

edited

Loading

sowmiya-masterworks commented Aug 21, 2024 •

edited

Loading

Athuliva commented Aug 21, 2024 •

edited

Loading