This repository contains the code for the PedRecNet (Paper: https://arxiv.org/pdf/2204.11548.pdf) as well as EHPI3D. It is the successor of our EHPI2D work (https://github.com/noboevbo/ehpi_action_recognition). The PedRecNet is a multi-purpose network that provides the following functions:
- Human BB Detection (via YoloV4).
- Human Tracking
- 2D Human Pose Estimation
- 3D Human Pose Estimation
- Human Body Orientation (currently only Phi) Estimation
- Human Head Orientation (currently only Phi) Estimation
- "Pedestrian recognizes the camera" estimation
- Human Action Recognition (via EHPI3D)
Note: This work is currently unpublished. It is part of my PhD dissertation and we are currently in the process to prepare (a? maybe more) paper. Note also, that, for now, I am no longer active in research, thus this code is provided as is.
Please cite the following paper if this code is helpful in your research.
D. Burgermeister and C. Curio, “PedRecNet: Multi-task deep neural network for full 3D human pose and orientation estimation,” in 2022 IEEE Intelligent
Vehicles Symposium (IV), 2022.
- Python 3.9 (venv suggested)
- working CUDA / CUDNN
- Clone this repository
- cd PedRec
- pip install -r requirements.txt
- Download the pretrained models if you want to run the PedRecNet
- Download the required datasets, dataframes and maybe some of the checkpoints (see Dataset Download section)
- YoloV4 weights (adapted from https://github.com/Tianxiaomo/pytorch-YOLOv4). Place it in data/models/yolo_v4/yolov4.pth.
- PedRecNet weights - place it in data/models/pedrec/experiment_pedrec_p2d3d_c_o_h36m_sim_mebow_0_net.pth.
- EHPI3D weights - Place it in models/ehpi3d/ehpi_3d_sim_c01_actionrec_gt_pred_64frames.pth.
Not required to run the network but for some experiments / trainings:
- Simple Baselines for Human Pose Estimation Weights - adapted from https://github.com/microsoft/human-pose-estimation.pytorch - place it in data/models/human_pose_baseline/pose_resnet_50_256x192.pth.tar.
- If you want to train the network(s) yourself, you need the following datasets:
- COCO (2017)
- Additionally: MEBOW body orientation annotations - train_hoe.json and val_hoe.json need to be placed in COCO/annotations
- Human3.6m
- Additionally: train/36m_train_pedrec.pkl in h36m_dir/train/ and train/36m_val_pedrec.pkl in h36m_dir/val/
- ROMb (SIM-ROM)
- RT3DValidate (SIM-Circle)
- TUD - cvpr10_multiview_pedestrians
- COCO (2017)
- For action recognition:
- SIM-C01 Pose Data (raw image data not published, but you only require the skeleton dataframe for training!)
Download the datasets and place the additional .pkls in the appropriate folders. Update the paths in experiment_path_helper.py and execute one of the experiments in training/. You might need some intermediate weights if you do not start with experiment_pedrec_2d! You can find them at https://dennisnotes.com/files/pedrec/single_results/filename.pth.
- Some C01 real examples - place it in data/demo/ or updated paths in demo_actionrec_dev.
- Pedestrians crossing a street - place it in data/demo/ or updated paths in demo_actionrec_dev.
Currently I would recommend to use a PIP environment instead of Anaconda. I tried the (recommended) Anaconda environment for PyTorch various times, but the performance is hugely inferior to the PIP environment on my system(s). Using Anaconda I get about 9FPS on videos with a single human compared to 25FPS on my PIP environment. One thing I noticed is that the performance difference shrinks the more people are in a video, thus with 7+ people the performance of the Anaconda and the PIP environment are almost equal. If someone has an idea what the problem could be, please notify me. Things tested:
- CUDA / CUDNN are working enabled and recognized by PyTorch on both environments
- Pillow-SIMD installed
- Usage of opencv-contrib-python-headless instead of the Conda version.
Check out the demo_actionrec_dev.py file. It contains examples on how to run the application on videos, image dirs, images and a webcam via the "input providers". Example (if you've downloaded the demo videos!):
python pedrec/demo_actionrec_dev.py
Check out the panda dataframes (e.g. the rt_conti_01_train_FIN.pkl from SIM-C01 dataset, or the pkls from the H36M dataset). If you provide a dataset of the same structure you can just use the pedrec dataset class. You can find some scripts I used to generate the dataframes in tools/datasets/... but I have not tested them in a while. The same applies for EHPI3D action recognition data: Check out the dataframes from the rt_conti_01_train_FIN.pkl file! You might want to checkout the notebook dataset_rtsim_conti01_ehpi as well. You can find the result files (e.g. the C01F_train_pred_df_experiment_pedrec_p2d3d_c_o_h36m_sim_mebow_0_allframes.pkl) at https://dennisnotes.com/files/pedrec/result_dfs/filename.
I've just pasted a few of my notebooks in the notebooks folder. They are not cleaned up and may contain absolute paths etc. but maybe they help the one or other to understand some concepts / validation results.
Note: probably outdated information! Need to recheck this part.
note: not really datatypes, those types are stored in numpy arrays due to performance considerations. There are helper methods providing more userfriendly access to those values (e.g. joint_helper(_3d), bb_helper). Those datatypes are the ones used internally in the PedRecNet application, there might be differences in types used in e.g. datasets etc.
"Datatype" name | Shape |
---|---|
bb_2d | center_x, center_y, width, height, confidence, class_idx |
joint_2d | x, y, confidence |
joint_3d | x, y, z, confidence |
note: n = dataset length
dataset name | Shape | DType | Description |
---|---|---|---|
img_paths | (n) | str | img path, relative to the dataset root |
joints2d | (n,17,4) | float32 | 17 = joints, 4 = x, y, confidence, visibility (coordinates in pixels, starting from top left of the image) |
skeleton_3d_hip_normalized | (n,17,5) | float32 | 17 = joints, 5 = x, y, z, confidence, visibility (coordinates in mm) |
env_position | (n,3) | float32 | 3 = x, y, z (mm) |
body_orientation | (n,4) | float32 | 4 = theta, phi, confidence, visibility |
head_orientation | (n,4) | float32 | 4 = theta, phi, confidence, visibility |
bbs | (n,6) | float32 | 5 = center_x, center_y, width, height, confidence, class_idx |
scene_idx_range | (n,2) | uint32 | 2 = scene_idx_start, scene_idx_stop the index range in the hdf5 file containing data from the same scene |
actions | (n) | uint32 | List = dynamic sized list of action ids, e.g. [[1, 2], [3, 4, 5]] |
movements | (n) | uint32 | ids, see constants for ID <-> NAME mapping |
movement_speeds | (n) | uint32 | ids, see constants for ID <-> NAME mapping |
genders | (n) | uint32 | ids, see constants for ID <-> NAME mapping |
skin_colors | (n) | uint32 | ids, see constants for ID <-> NAME mapping |
sizes | (n) | uint32 | ids, see constants for ID <-> NAME mapping |
weights | (n) | uint32 | ids, see constants for ID <-> NAME mapping |
ages | (n) | uint32 | ids, see constants for ID <-> NAME mapping |
frame_nr_locals | (n) | uint32 | frame number of the current scene |
frame_nr_global | (n) | uint32 | frame number of the complete record |
Some notes to original datasets. Important: Those notes do NOT apply to internal PedRec usage, the original datasets are converted to PedRec Datasets before usage, thus those notes can usually be ignored.
They use a binary mask containing 1s in the bounding box area.
- 0 = 'Hips'
- 1 = 'RightUpLeg'
- 2 = 'RightLeg'
- 3 = 'RightFoot'
- 4 = 'RightToeBase'
- 5 = 'Site' - ????
- 6 = 'LeftUpLeg'
- 7 = 'LeftLeg'
- 8 = 'LeftFoot'
- 9 = 'LeftToeBase'
- 10 = 'Site' - ????
- 11 = 'Spine'
- 12 = 'Spine1'
- 13 = 'Neck'
- 14 = 'Head'
- 15 = 'Site'
- 16 = 'LShoulder'
- 17 = 'LeftArm'
- 18 = 'LeftForeArm'
- 19 = 'LeftHand'
- 20 = 'LeftHandThumb'
- 21 = 'Site'
- 22 = 'L_Wrist_End'
- 23 = 'Site'
- 24 = 'RightShoulder'
- 25 = 'RightArm'
- 26 = 'RightForeArm'
- 27 = 'RightHand'
- 28 = 'LeftHandThumb'
- 29 = 'Site'
- 30 = 'L_Wrist_End'
- 31 = 'Site'
- YoloV4 object detection: https://github.com/Tianxiaomo/pytorch-YOLOv4
- Pose-resnet as base network: https://github.com/microsoft/human-pose-estimation.pytorch
- Skeleton by Wolf Böse from the Noun Project
- Head by Naveen from the Noun Project
- body by Makarenko Andrey from the Noun Project
- Eye by Simon Sim from the Noun Project
- jogging by Adrien Coquet from the Noun Project
- Walk by Adrien Coquet from the Noun Project
- stand by Gan Khoon Lay from the Noun Project
- sit by Adrien Coquet from the Noun Project
- Dennis Burgermeister, Cognitive Systems Research Group, Reutlingen University (no longer active)
- Cristóbal Curio, Cognitive Systems Research Group, Reutlingen University
This project was funded by the Continental AG.