TEDdet

Pytorch implementation of our IEEE publication "Temporal Feature Exchange and Difference Network for online real-time action detection" has been released.

Y. Liu, F. Yang and D. Ginhac, "TEDdet: Temporal Feature Exchange and Difference Network for Online Real-Time Action Detection," in IEEE Access, vol. 10, pp. 37870-37881, 2022, doi: 10.1109/ACCESS.2022.3164730.

TEDdet Overview

we propose a lightweight action tubelet detector coined TEDdet which unifies complementary feature aggregation and motion modeling modules. Specifically, our Temporal Feature Exchange module facilitates feature interaction by aggregating action-specific visual patterns over successive frames, enabling spatiotemporal modeling on top of 2D CNN. To address actors' location shift in the sequence, our Temporal Feature Difference module approximates pair-wise motion among target frames in their abstract latent space. These modules can be easily integrated with an existing anchor-free detector (CenterNet) to cooperatively model action instances' categories, sizes and trajectories for precise tubelet generation. TEDdet exploits larger temporal strides to efficiently infer actions in a coarse-to-fine and online manner.

We present two lightweight temporal modeling modules: Temporal Feature Exchange (TE) and Temporal Feature Difference (TD) to facilitate learning action-specific spatiotemporal pattern and trajectory.
We propose TEDdet, an integrated action tubelet detector on top of 2D CenterNet and TE-TD plug-in. Our detector operates in a coarse-to-fine manner. Alongside the online tube generation algorithm, TEDdet's detection speed well exceeds real-time requirement (89 FPS).
Comprehensive analysis in terms of TEDdet's accuracy, robustness, and efficiency are conducted on public UCF-24 and JHMDB-21 datasets. Without relying on any 3D CNN or optical flow, our action detector achieves competitive accuracy at an unprecedented speed, suggesting a much more feasible solution pertinent to realistic applications.

TEDdet Usage

1. Installation and Dataset

Please refer to https://github.com/MCG-NJU/MOC-Detector for detailed instructions.

2. Train

The current version of TEDdet support ResNet18 as the feature extraction backbone. To proceed with training, first download the COCO pretrained weights in Google Drive. COCO pretrained models come from CenterNet. Move pretrained models to ${TEDdet_ROOT}/experiment/modelzoo/.

To train your own model, run the train.py script along with relevant input arguments. For instance:

$python train.py --K 5 --exp_id K5_model --rgb_model TED_K5 --batch_size 16 --master_batch 16 --lr 2.5e-4 --gpus 0 --num_worker 16 --num_epochs 10 --lr_step 5 --dataset hmdb --split 1 --down_ratio 8 --lr_drop 0.1 --ninput 1 --ninputrgb 5 --auto_stop --pretrain coco

Note that by setting --auto_save, a validation step will be carried out at the end of each training epoch in order to save the best-performing model (i.e., model_best.pth). Instead, using --save_all will save every epoch's training model. For more details on possible input arguments, refer to ${TEDdet_ROOT}/src/opts.py.

3. Evaluation

To evaluate a trained model, one performs two sequential steps from the det.py and ACT.py scripts. The former performs model inference and saves detection results (i.e., .pkl) accordingly. The latter loads the detection results and evaluates them based on designated metrics. For instance:

Model inference

$python det.py --task stream --K 5 --gpus 0 --batch_size 1 --master_batch 1 --num_workers 1 --rgb_model TED_K5/model_best.pth --inference_dir ../data0/TED_K5_stream --ninput 1 --dataset hmdb --split 1 --down_ratio 8 --ninputrgb 5

Note that TEDdet supports two inference modes: normal and stream mode. Stream mode takes into account a FIFO-based feature-caching mechanism to more efficiently process incoming video frames (the reported speed in the article corresponds to this mode). The mode needs to be specified explicitly during both inference and evaluation.

Evaluating mAP

$python ACT.py --task frameAP --K 5 --th 0.5 --inference_dir ../data0/TED_K5_stream --dataset hmdb --split 1 --evaluation_model trimmed --inference_mode stream --ninputrgb 5

To compute videoAP, one needs to additionally generate action tubes using ACT.py by first configuring --task to BuildTubes and then evaluate videoAP.

References

Our codes refer to the work of CenterNet, MOC and TEA. We thank the authors for their structured implementations and comments!

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
experiment/result_model/ted_K5_jh_s1		experiment/result_model/ted_K5_jh_s1
images		images
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEDdet

TEDdet Overview

TEDdet Usage

1. Installation and Dataset

2. Train

3. Evaluation

Model inference

Evaluating mAP

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TEDdet

TEDdet Overview

TEDdet Usage

1. Installation and Dataset

2. Train

3. Evaluation

Model inference

Evaluating mAP

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages