Important
The new version of BBox-Mask-Pose (BMPv2) is now available on arXiv.
BMPv2 significantly improves performance; see the quantitative results reported in the preprint.
One of the key contributions is PMPose, a new top-down pose estimation model, that is already strong on standard benchmarks and in crowded scenes.
The code is integrated in the main branch and was released in Release 2.0.0.
Due to repository changes, the version 2.0.0 is not backward compatible with previous versions.
- Mar 2025: HuggingFace Image Demo is up-to-date with BMPv2. Check-out the 3D generation!
- Mar 2026: Version 2.0 with improved (1) pose and (2) SAM and (3) wiring to 3D prediction released.
- Feb 2026: SAM-pose2seg won a Best Paper Award on CVWW 2026 🎉
- Jan 2026: BMPv2 paper is available on arXiv
- Aug 2025: HuggingFace Image Demo is out! 🎮
- Jul 2025: Version 1.1 with easy-to-run image demo released
- Jun 2025: BMPv1 paper accepted to ICCV 2025! 🎉
- Dec 2024: BMPv1 code is available
- Nov 2024: The project website is on
Bounding boxes, masks, and poses capture complementary aspects of the human body. BBoxMaskPose links detection, segmentation, and pose estimation iteratively, where each prediction refines the others. PMPose combines probabilistic modeling with mask conditioning for robust pose estimation in crowds. Together, these components achieve state-of-the-art results on COCO and OCHuman, being the first method to exceed 50 AP on OCHuman.
The repository is organized into two main packages with stable public APIs:
BBoxMaskPose/
├── pmpose/ # PMPose package (pose estimation)
│ └── pmpose/
│ ├── api.py # PUBLIC API: PMPose class
│ ├── mm_utils.py # Internal utilities
│ └── posevis_lite.py # Visualization
├── mmpose/ # MMPose fork with our edits
├── bboxmaskpose/ # BBoxMaskPose package (full pipeline)
│ └── bboxmaskpose/
│ ├── api.py # PUBLIC API: BBoxMaskPose class
│ ├── sam2/ # SAM2 implementation
│ ├── configs/ # BMP configurations
│ └── *_utils.py # Internal utilities
├── demos/ # Public API demos
│ ├── PMPose_demo.py # PMPose usage example
│ ├── BMP_demo.py # BBoxMaskPose usage example
│ └── quickstart.ipynb # Interactive notebook
└── demo/ # Legacy demo (still functional)
Key contributions:
- MaskPose: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
- Download pre-trained weights below
- PMPose: a pose estimation model modelling the full keypoint probability distribution AND conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
- Download pre-trained weights below
- BBox-MaskPose (BMP): method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
- Try the demo!
- SAM-pose2seg: fine-tuned SAM2 model for pose-guided instance segmentation
- Try the demo!
- Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
- Download pre-trained weights below
- Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
For more details, please visit our project website.
If you want to try our models without any installation, you can try the free HuggingFace demos.
BBoxMaskPose Demo showcases the whole loop including 3D pose estimation. You can generate GIFs similar to the one at the top of this README. Due to 3D rendering, this demo runs approx 30-60 seconds per image.
PMPose Demo showcase our familly of PMPose models. It is not an itterative methods but standard feed-forward top-down 2D pose estimation method. Check it out if you're interested in fast pose estimation.
The fastest way to get started with GPU support:
# Clone and build
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git
cd BBoxMaskPose
docker-compose build
# Run the demo
docker-compose upRequires: Docker Engine 19.03+, NVIDIA Container Toolkit, NVIDIA GPU with CUDA 12.1 support.
This project is built on top of MMPose and SAM 2.1. Please refer to the MMPose installation guide or SAM installation guide for detailed setup instructions.
Basic installation steps:
# Clone the repository
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/
cd BBoxMaskPose
# Install your version of torch, torchvision, OpenCV and NumPy
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
pip install numpy==1.25.1 opencv-python==4.9.0.80
# Install MMLibrary
pip install -U openmim
mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0"
# Install dependencies
pip install -r requirements.txt
pip install -e .python demos/PMPose_demo.py --image data/004806.jpg --device cudapython demos/BMP_demo.py --image data/004806.jpg --device cudaAfter running the demo, outputs are in outputs/004806/. The expected output should look like this:
This demo extends BMP with SAM-3D-Body for 3D human mesh recovery:
# Basic usage (auto-downloads checkpoint from HuggingFace)
python demos/BMPv2_demo.py --image data/004806.jpg --device cuda
# With local checkpoint
python demos/BMPv2_demo.py --image data/004806.jpg --device cuda \
--sam3d_checkpoint checkpoints/sam-3d-body-dinov3/model.ckpt \
--mhr_path checkpoints/sam-3d-body-dinov3/assets/mhr_model.ptSAM-3D-Body Installation (Optional): BMPv2 requires SAM-3D-Body for 3D mesh recovery. Install it separately:
# 1. Install dependencies
pip install -r requirements/sam3d.txt
# 2. Install detectron2
pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' --no-build-isolation --no-deps
# 3. Install MoGe (optional, for FOV estimation)
pip install git+https://github.com/microsoft/MoGe.git
# 4. Install adapted SAM-3D-Body repository
pip install git+https://github.com/MiraPurkrabek/sam-3d-body.git
# 5. Request access to checkpoints at https://huggingface.co/facebook/sam-3d-body-dinov3For more details, see SAM-3D-Body installation guide.
Interactive demo with both PMPose and BBoxMaskPose:
jupyter notebook demos/quickstart.ipynbPMPose API - Pose estimation with bounding boxes:
from pmpose import PMPose
# Initialize model
pose_model = PMPose(device="cuda", from_pretrained=True)
# Run inference
keypoints, presence, visibility, heatmaps = pose_model.predict(
image="demo/data/004806.jpg",
bboxes=[[100, 100, 300, 400]], # [x1, y1, x2, y2]
)
# Visualize
vis_img = pose_model.visualize(image="demo/data/004806.jpg", keypoints=keypoints)BBoxMaskPose API - Full detection + pose + segmentation:
from pmpose import PMPose
from bboxmaskpose import BBoxMaskPose
# Create pose model
pose_model = PMPose(device="cuda", from_pretrained=True)
# Inject into BMP
bmp_model = BBoxMaskPose(config="BMP_D3", device="cuda", pose_model=pose_model)
result = bmp_model.predict(image="demo/data/004806.jpg")
# Visualize
vis_img = bmp_model.visualize(image="demo/data/004806.jpg", result=result)Pre-trained models are available on VRG Hugging Face 🤗. To run the demo, don't need to download any weight manually. The detector, SAM-pose2seg and pose estimator will be downloaded during the runtime.
If you want to download our weights yourself, here are the links to our HuggingFace:
- ViTPose-b trained on COCO+MPII+AIC -- download weights
- MaskPose-b -- download weights
- PMPose -- select model
- SAM-pose2seg -- download weights
- Fine-tuned RTMDet-L -- download weights
The code combines MMDetection, MMPose 2.0, ViTPose, SAM 2.1 and SAM-3D-Body.
Our visualizations integrate Distinctipy for automatic color selection.
This repository combines our work on BBoxMaskPose project with our previous work on probabilistic 2D human pose estimation modelling.
The code was implemented by Miroslav Purkrábek and Constantin Kolomiiets. If you use this work, kindly cite it using the references provided below.
For questions, please use the Issues of Discussion.
@InProceedings{Purkrabek2025BMPv1,
author = {Purkrabek, Miroslav and Matas, Jiri},
title = {Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {9004-9013}
}
@InProceedings{Purkrabek2026BMPv2,
author = {Purkrabek, Miroslav and Kolomiiets, Constantin and Matas, Jiri},
title = {BBoxMaskPose v2: Expanding Mutual Conditioning to 3D},
booktitle = {arXiv preprint arXiv:2601.15200},
year = {2026}
}
@article{yang2025sam3dbody,
title={SAM 3D Body: Robust Full-Body Human Mesh Recovery},
author={Yang, Xitong and Kukreja, Devansh and Pinkus, Don and Sagar, Anushka and Fan, Taosha and Park, Jinhyung and Shin, Soyong and Cao, Jinkun and Liu, Jiawei and Ugrinovic, Nicolas and Feiszli, Matt and Malik, Jitendra and Dollar, Piotr and Kitani, Kris},
journal={arXiv preprint; identifier to be added},
year={2025}
}
@InProceedings{Kolomiiets2026CVWW,
author = {Kolomiiets, Constantin and Purkrabek, Miroslav and Matas, Jiri},
title = {SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds},
booktitle = {Computer Vision Winter Workshop (CVWW)},
year = {2026}
}


