STF-Depth: Semantic and Temporal Fusion Depth Estimation

STF-Depth stands for Semantic and Temporal Fusion Depth Estimation. It is a pipeline designed to improve the inaccuracies of single-image depth estimation.

This project leverages Temporal Fusion from the video domain and Semantic Fusion via segmentation to enhance inter-frame consistency and generate more realistic depth information.

Overview

Key Features

Multi-Model Pipeline: Utilizes state-of-the-art deep learning models to generate depth and segmentation maps for each video frame.
- Depth Estimation: MiDaS (DPT-Large)
- Semantic Segmentation: DeepLabV3
- Panoptic Segmentation: OneFormer
Automated Processing: Automatically handles the entire process from frame extraction to model inference and result saving for specified input video folders.
Result Caching: Caches intermediate results (.pkl) for processed videos, enabling faster re-runs for visualization or further processing by skipping the inference step.
Visualization: Saves output results from each model as image files for intuitive inspection.
Evaluation: Includes tools for quantitative evaluation on standard datasets like NYU Depth V2 and KITTI.

Installation

All dependencies for this project are managed via a Conda virtual environment.

1. Create and Activate Conda Environment

Create and activate a Conda environment named stfdepth.

# Create environment from the provided yaml file
conda env create -f conda.yaml

# Activate the environment
conda activate stfdepth

Note: A conda.yaml file is provided in the repository.

Usage

1. Inference (`run.py`)

This script processes videos or images to estimate depth maps with semantic and temporal fusion.

Prepare Input Data: Place your video files (.mp4, .avi, etc.) or image folders inside data/input/<dataset_name>/.
- Default dataset name is vp_test. So, place files in data/input/vp_test/.

Run Script:

# Activate Conda environment
conda activate stfdepth

# Run inference
python run.py

Command Line Arguments

--input_dir: Directory containing input datasets (default: data/input)
--output_dir: Directory to save final results (default: data/output)
--working_dir: Directory for intermediate files (frames, .pkl, visualizations) (default: data/working)
--datasets: List of dataset names to process (default: ["vp_test"])
--visualize: Flag to enable saving visualization results (default: False)

2. Evaluation (`test.py`)

This script evaluates the depth estimation performance against Ground Truth (GT).

Prepare Data: Structure your data as follows:
- Input Images: test/data/input/<dataset_name>/input/
- Ground Truth: test/data/input/<dataset_name>/gt/
See the Evaluation Datasets section below for details on preparing NYU and KITTI datasets.
Run Script:
```
python test.py --datasets nyu kitti
```

Command Line Arguments

--input_dir: Root directory for test data (default: ./test/data/input)
--output_dir: Directory to save evaluation results (default: ./test/data/output)
--datasets: List of datasets to evaluate (default: ["nyu", "kitti"])

Evaluation Datasets

This project uses standard datasets for quantitative evaluation. Helper scripts are provided in the test/ directory to convert raw datasets into the required format.

1. NYU Depth V2 (Indoor)

The NYU Depth V2 dataset is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.

Download: You can download the raw dataset from the official website.
Preparation:
1. Download the raw dataset (scene folders containing INDEX.txt and raw images).
2. Use the test/convert_nyu.py script to synchronize RGB and Depth frames and convert them.
```
# Edit 'original_dir' and 'converted_dir' in test/convert_nyu.py before running
python test/convert_nyu.py
```
- This script synchronizes frames based on timestamps, generates .mp4 videos for input, and saves stacked depth maps (.npy) for ground truth.

2. KITTI (Outdoor)

The KITTI dataset is a popular benchmark for autonomous driving, including depth prediction tasks.

Download: Download the "depth completion" or "depth prediction" dataset from the KITTI Vision Benchmark Suite.
Preparation:
1. Download the validation set (e.g., val_selection_cropped).
2. Use the test/convert_kitti.py script to format the data.
```
# Edit 'original_dir' and 'converted_dir' in test/convert_kitti.py before running
python test/convert_kitti.py
```
- This script matches images with their corresponding ground truth depth maps and converts them to .png (input) and .npy (GT) formats.

📂 Project Structure

.
├── data
│   ├── input/                # Input data directory
│   │   └── vp_test/          # Default dataset folder
│   ├── working/              # Intermediate results (frames, .pkl, visualizations)
│   └── output/               # Final results
├── test
│   └── data/                 # Data for evaluation (input images and GT)
├── run.py                    # Main inference script
├── test.py                   # Evaluation script
├── conda.yaml                # Conda environment configuration
└── README.md                 # Project documentation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
resource		resource
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda.yaml		conda.yaml
run.py		run.py
run_with_log.py		run_with_log.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STF-Depth: Semantic and Temporal Fusion Depth Estimation

Overview

Key Features

Installation

1. Create and Activate Conda Environment

Usage

1. Inference (`run.py`)

Command Line Arguments

2. Evaluation (`test.py`)

Command Line Arguments

Evaluation Datasets

1. NYU Depth V2 (Indoor)

2. KITTI (Outdoor)

📂 Project Structure

About

Uh oh!

Releases

Packages

Languages

License

PLASS-Lab/STF-Depth

Folders and files

Latest commit

History

Repository files navigation

STF-Depth: Semantic and Temporal Fusion Depth Estimation

Overview

Key Features

Installation

1. Create and Activate Conda Environment

Usage

1. Inference (run.py)

Command Line Arguments

2. Evaluation (test.py)

Command Line Arguments

Evaluation Datasets

1. NYU Depth V2 (Indoor)

2. KITTI (Outdoor)

📂 Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Inference (`run.py`)

2. Evaluation (`test.py`)

Packages