π Paper: AAAI 2026 AI for Medicine & Healthcare (Bridge)
Accepted at the Proceedings of 2rd AI for Medicine and Healthcare Bridge Program at AAAI 2026π
UltrasODM: A Dual-Stream Optical-FlowβMamba Network for Trackerless 3D Freehand Ultrasound Reconstruction
Mayank Anand*, Ujair Alam, Surya Prakash, Priya Shukla, Gora Chand Nandi, Domenec Puig
2nd AI Bridge for Medicine & Healthcare (AIMedHealth), AAAI 2026 (Poster) [PMLR]
This repository contains the implementation of UltrasODM , a deep learning framework for trackerless 3D freehand ultrasound reconstruction. The framework combines video patch embedding, optical flow analysis, and bidirectional Mamba blocks to achieve sub-millimeter accuracy in ultrasound pose estimation.
- Baseline Model: EfficientNet-based architecture with optical flow integration
- Optical Flow Module: Enhanced motion dynamics extraction using Lucas-Kanade flow estimation
- Optical Flow + Mamba: Integration of selective state space models for temporal sequence modeling
- Dual Mamba Architecture: Bidirectional Mamba blocks with FPS/NPS sampling for point cloud processing
The framework consists of four main implementations:
- EfficientNet-B1 backbone for feature extraction
- Optical flow integration for motion analysis
- Multi-component loss function (MSE, correlation, velocity)
- Enhanced optical flow estimation with multi-scale feature extraction
- Motion magnitude estimation for adaptive feature fusion
- Velocity processor for temporal consistency
- Video patch embedding with adjustable window mechanisms
- Inner Mamba block for initial temporal processing
- FPS/NPS sampling for spatial attention
- Bidirectional Mamba with selective scan algorithm
- State space model (SSM) layers with discretization
- Dual-branch processing (FPS and NPS orders)
- Combined feature fusion with restored ordering
- Python 3.9+
- PyTorch 2.1.0+
- CUDA 11.8+ (for GPU acceleration)
# Clone the repository
git clone https://github.com/AnandMayank/UltrasODM.git
cd UltrasODM
# Create conda environment
conda create -n ultrasom python=3.9
conda activate ultrasom
# Install dependencies
pip install -r requirements.txt
pip install pytorch3d --no-deps -c pytorch3dThe framework expects data in the following format:
data/
βββ frames_transfs/
β βββ 000/
β β βββ RH_rotation.h5
β β βββ LH_rotation.h5
β βββ ...
βββ landmarks/
β βββ landmark_000.h5
β βββ ...
βββ calib_matrix.csv
Each .h5 file contains:
frames: Ultrasound frames (N, H, W)tforms: Transformation matrices (N, 4, 4)
python baseline/train_baseline.py --config config/baseline_config.yamlpython optical_flow_mamba/train_optical_flow_mamba.py --config config/mamba_config.yamlpython dual_mamba/train_dual_mamba.py --config config/dual_mamba_config.yamlThe video patch embedding module processes video frames into patch embeddings with:
- Adjustable window size for different temporal contexts
- Enhanced temporal encoding with learnable patterns
- Causal sequence modeling for real-time processing
The optical flow module extracts motion features through:
- Multi-scale flow feature extraction
- Motion magnitude estimation
- Adaptive fusion based on motion dynamics
The bidirectional Mamba implementation provides:
- True bidirectional processing (forward and backward)
- Selective scan algorithm for efficient sequence modeling
- State space model with discretization
The combined sampling strategy includes:
- Farthest Point Sampling (FPS) for global coverage
- Nearest Point Sampling (NPS) for local patterns
- Spatial attention mechanism for feature selection
The framework implements multiple loss components:
- MSE Loss: Mean squared error for pose prediction
- Correlation Loss: Feature correlation for temporal consistency
- Velocity Loss: Motion velocity regularization
- Point Loss: 3D point distance for clinical accuracy
Performance metrics on the TUS-REC2025 dataset:
| Model | Point Distance (mm) | Training Time | Parameters |
|---|---|---|---|
| Baseline | 0.45 | 8h | 12M |
| Optical Flow | 0.32 | 10h | 15M |
| Optical Flow + Mamba | 0.23 | 12h | 18M |
| Dual Mamba | 0.19 | 14h | 22M |
Note: Results are representative and may vary based on training configuration.
Model configurations are stored in config/:
baseline_config.yaml: Baseline model settingsmamba_config.yaml: Optical Flow + Mamba settingsdual_mamba_config.yaml: Dual Mamba settings
Key configuration parameters:
num_frames: Number of input frames (default: 4)embed_dim: Embedding dimension (default: 256)num_fps_points: FPS sampling points (default: 32)num_nps_points: NPS sampling points (default: 64)mamba_d_state: Mamba state dimension (default: 64)
UltrasODM/
βββ baseline/ # Baseline model implementation
β βββ train_baseline.py
β βββ network_baseline.py
βββ optical_flow/ # Optical flow module
β βββ optical_flow.py
β βββ flow_losses.py
βββ optical_flow_mamba/ # Optical Flow + Mamba model
β βββ train_optical_flow_mamba.py
β βββ network_mamba.py
β βββ video_patch_embedding.py
βββ dual_mamba/ # Dual Mamba model
β βββ train_dual_mamba.py
β βββ dual_mamba_block.py
β βββ ssm_layer.py
βββ utils/ # Shared utilities
β βββ loader.py
β βββ transform.py
β βββ metrics.py
β βββ plot_functions.py
βββ config/ # Configuration files
βββ data/ # Dataset directory
βββ docs/ # Documentation
If you use this code in your research, please cite:
@inproceedings{
anand2025ultrasodm,
title={Ultras{ODM}: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction},
author={Mayank Anand and Gora Chand Nandi and Surya Prakash and Ujair Alam and Priya Shukla and Domenec Puig},
booktitle={2rd AI for Medicine and Healthcare Bridge Program at AAAI26},
year={2025},
url={https://openreview.net/forum?id=dUPjABX5Qe}
}This work is based on research in trackerless 3D freehand ultrasound reconstruction and builds upon advances in state space models and selective scan algorithms.
This code is released for academic research purposes only. Commercial use is prohibited.
For questions or issues, please open an issue on GitHub or contact the corresponding author through the conference portal.