Skip to content

Latest commit

 

History

History
61 lines (47 loc) · 2.37 KB

File metadata and controls

61 lines (47 loc) · 2.37 KB

Training

Here, we provide the training process of our EmbodiedSplat on ScanNet dataset. Training on ScanNet++ can be done in similar way.

Dataset Preparation

Download the training split of preprocessed ScanNet data from here and locate it under dataset/scannet folder. Folder structure would look like:

scannet
├── test
│   └── ...
├── train                   
│   ├── scene0001_01                           
│   │   ├── color                        
│   │   │   ├── 0.jpg                    
│   │   │   ├── 1.jpg                               
│   │   │   └── ...
│   │   ├── depth                        
│   │   │   ├── 0.png                    
│   │   │   ├── 1.png                               
│   │   │   └── ...
│   │   ├── intrinsic                       
│   │   │   ├── extrinsic_color.txt                    
│   │   │   ├── extrinsic_depth.txt
│   │   │   ├── intrinsic_color.txt                                
│   │   │   └── intrinsic_depth.txt
│   │   ├── pose                       
│   │   │   ├── 0.txt                    
│   │   │   ├── 1.txt                              
│   │   │   └── ...
│   │   └── extrinsics.npy                       
│   └── ...
├── train_idx.txt
└── test_idx.txt

For training, we cache the instance masks generated by FastSAM and the corresponding instance-level features extracted by OpenSeg for every multi-view images. To generate these cached files, run the following command:

python process_scannet.py

You should now see a cache folder under each scene in the training split of ScanNet.

Training Stage 1

As mentioned in the Implementation Details section of the main paper, we adopt a two-stage training pipeline following EmbodiedSAM. Stage 1 conducts single-view training to warm-up the model. In this stage, memory adapter is excluded from the training.

bash scripts/scannet/train_embodiedsplat_s1.sh --gpu 0

Training Stage 2

Stage 2 further finetunes the model with additional memory adapter in multi-view setting.

bash scripts/scannet/train_embodiedsplat_s2.sh --gpu 0