Skip to content

AnandMayank/UltrasODM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

UltrasODM: A Dual-Stream Optical-Flow–Mamba Network for 3D Freehand Ultrasound Reconstruction

πŸ“„ Paper: AAAI 2026 AI for Medicine & Healthcare (Bridge)

Accepted at the Proceedings of 2rd AI for Medicine and Healthcare Bridge Program at AAAI 2026πŸŽ‰

UltrasODM: A Dual-Stream Optical-Flow–Mamba Network for Trackerless 3D Freehand Ultrasound Reconstruction
Mayank Anand*, Ujair Alam, Surya Prakash, Priya Shukla, Gora Chand Nandi, Domenec Puig
2nd AI Bridge for Medicine & Healthcare (AIMedHealth), AAAI 2026 (Poster) [PMLR]

Overview

This repository contains the implementation of UltrasODM , a deep learning framework for trackerless 3D freehand ultrasound reconstruction. The framework combines video patch embedding, optical flow analysis, and bidirectional Mamba blocks to achieve sub-millimeter accuracy in ultrasound pose estimation.

Key Features

  • Baseline Model: EfficientNet-based architecture with optical flow integration
  • Optical Flow Module: Enhanced motion dynamics extraction using Lucas-Kanade flow estimation
  • Optical Flow + Mamba: Integration of selective state space models for temporal sequence modeling
  • Dual Mamba Architecture: Bidirectional Mamba blocks with FPS/NPS sampling for point cloud processing

Architecture Overview

The framework consists of four main implementations:

1. Baseline Model

  • EfficientNet-B1 backbone for feature extraction
  • Optical flow integration for motion analysis
  • Multi-component loss function (MSE, correlation, velocity)

2. Optical Flow Enhanced Model

  • Enhanced optical flow estimation with multi-scale feature extraction
  • Motion magnitude estimation for adaptive feature fusion
  • Velocity processor for temporal consistency

3. Optical Flow + Mamba Model

  • Video patch embedding with adjustable window mechanisms
  • Inner Mamba block for initial temporal processing
  • FPS/NPS sampling for spatial attention
  • Bidirectional Mamba with selective scan algorithm

4. Dual Mamba Model

  • State space model (SSM) layers with discretization
  • Dual-branch processing (FPS and NPS orders)
  • Combined feature fusion with restored ordering

Installation

Requirements

  • Python 3.9+
  • PyTorch 2.1.0+
  • CUDA 11.8+ (for GPU acceleration)

Setup

# Clone the repository
git clone https://github.com/AnandMayank/UltrasODM.git
cd UltrasODM

# Create conda environment
conda create -n ultrasom python=3.9
conda activate ultrasom

# Install dependencies
pip install -r requirements.txt
pip install pytorch3d --no-deps -c pytorch3d

Dataset Structure

The framework expects data in the following format:

data/
β”œβ”€β”€ frames_transfs/
β”‚   β”œβ”€β”€ 000/
β”‚   β”‚   β”œβ”€β”€ RH_rotation.h5
β”‚   β”‚   └── LH_rotation.h5
β”‚   └── ...
β”œβ”€β”€ landmarks/
β”‚   β”œβ”€β”€ landmark_000.h5
β”‚   └── ...
└── calib_matrix.csv

Each .h5 file contains:

  • frames: Ultrasound frames (N, H, W)
  • tforms: Transformation matrices (N, 4, 4)

Usage

Training Baseline Model

python baseline/train_baseline.py --config config/baseline_config.yaml

Training Optical Flow + Mamba Model

python optical_flow_mamba/train_optical_flow_mamba.py --config config/mamba_config.yaml

Training Dual Mamba Model

python dual_mamba/train_dual_mamba.py --config config/dual_mamba_config.yaml

Model Architecture Details

Video Patch Embedding

The video patch embedding module processes video frames into patch embeddings with:

  • Adjustable window size for different temporal contexts
  • Enhanced temporal encoding with learnable patterns
  • Causal sequence modeling for real-time processing

Optical Flow Integration

The optical flow module extracts motion features through:

  • Multi-scale flow feature extraction
  • Motion magnitude estimation
  • Adaptive fusion based on motion dynamics

Bidirectional Mamba

The bidirectional Mamba implementation provides:

  • True bidirectional processing (forward and backward)
  • Selective scan algorithm for efficient sequence modeling
  • State space model with discretization

FPS/NPS Sampling

The combined sampling strategy includes:

  • Farthest Point Sampling (FPS) for global coverage
  • Nearest Point Sampling (NPS) for local patterns
  • Spatial attention mechanism for feature selection

Loss Functions

The framework implements multiple loss components:

  1. MSE Loss: Mean squared error for pose prediction
  2. Correlation Loss: Feature correlation for temporal consistency
  3. Velocity Loss: Motion velocity regularization
  4. Point Loss: 3D point distance for clinical accuracy

Results

Performance metrics on the TUS-REC2025 dataset:

Model Point Distance (mm) Training Time Parameters
Baseline 0.45 8h 12M
Optical Flow 0.32 10h 15M
Optical Flow + Mamba 0.23 12h 18M
Dual Mamba 0.19 14h 22M

Note: Results are representative and may vary based on training configuration.

Configuration

Model configurations are stored in config/:

  • baseline_config.yaml: Baseline model settings
  • mamba_config.yaml: Optical Flow + Mamba settings
  • dual_mamba_config.yaml: Dual Mamba settings

Key configuration parameters:

  • num_frames: Number of input frames (default: 4)
  • embed_dim: Embedding dimension (default: 256)
  • num_fps_points: FPS sampling points (default: 32)
  • num_nps_points: NPS sampling points (default: 64)
  • mamba_d_state: Mamba state dimension (default: 64)

Code Structure

UltrasODM/
β”œβ”€β”€ baseline/                 # Baseline model implementation
β”‚   β”œβ”€β”€ train_baseline.py
β”‚   └── network_baseline.py
β”œβ”€β”€ optical_flow/            # Optical flow module
β”‚   β”œβ”€β”€ optical_flow.py
β”‚   └── flow_losses.py
β”œβ”€β”€ optical_flow_mamba/      # Optical Flow + Mamba model
β”‚   β”œβ”€β”€ train_optical_flow_mamba.py
β”‚   β”œβ”€β”€ network_mamba.py
β”‚   └── video_patch_embedding.py
β”œβ”€β”€ dual_mamba/              # Dual Mamba model
β”‚   β”œβ”€β”€ train_dual_mamba.py
β”‚   β”œβ”€β”€ dual_mamba_block.py
β”‚   └── ssm_layer.py
β”œβ”€β”€ utils/                   # Shared utilities
β”‚   β”œβ”€β”€ loader.py
β”‚   β”œβ”€β”€ transform.py
β”‚   β”œβ”€β”€ metrics.py
β”‚   └── plot_functions.py
β”œβ”€β”€ config/                  # Configuration files
β”œβ”€β”€ data/                    # Dataset directory
└── docs/                    # Documentation

Citation

If you use this code in your research, please cite:

@inproceedings{
anand2025ultrasodm,
title={Ultras{ODM}: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction},
author={Mayank Anand and Gora Chand Nandi and Surya Prakash and Ujair Alam and Priya Shukla and Domenec Puig},
booktitle={2rd AI for Medicine and Healthcare Bridge Program at AAAI26},
year={2025},
url={https://openreview.net/forum?id=dUPjABX5Qe}
}

Acknowledgments

This work is based on research in trackerless 3D freehand ultrasound reconstruction and builds upon advances in state space models and selective scan algorithms.

License

This code is released for academic research purposes only. Commercial use is prohibited.

Contact

For questions or issues, please open an issue on GitHub or contact the corresponding author through the conference portal.


About

UltrasODM is a novel dual-stream framework for 3D freehand ultrasound reconstruction that integrates optical flow with Mamba-based temporal modeling. Unlike conventional methods, UltrasODM is robust to rapid probe movements and brightness variations, common cause of failure in clinical settings.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages