Skip to content

Latest commit

 

History

History

data_preprocess

Datasets preparation

INRIA Dataset

  1. Download the INRIA Aerial Image Labeling Dataset.
  2. Extract and place the aerial image tiles in the data directory as follows:
data/inria_raw/
├── test/
│   └── images/
└── train/
    ├── gt/
    └── images/
  1. Set path to raw INRIA train tiles and gts in L255 & L256 in inria_to_coco.py
  2. Run the following command to prepare the INRIA dataset's train and validation splits in MS COCO format. The first 5 tiles of each city are kept as validation split as per the official recommendation.
# with pix2poly_env
python inria_to_coco.py

SpaceNet 2 Building Detection v2 Dataset (Vegas Subset)

NOTE: We only use the Vegas subset for all our experiments in the paper.

  1. Download the Spacenet 2 Building Detection v2 Dataset.
  2. Extract and place the satellite image tiles for the Vegas subset in the data folder in the following directory structure:
data/AOI_2_Vegas_Train/
└── geojson/
    └── buildings/
└── RGB-PanSharpen/
    ├── gt/
    └── images/
  1. Convert the pansharpened RGB image tiles from 16-bit to 8-bit using the following command:
# with gdal_env
python spacenet_convert_16bit_to_8bit.py
  1. Convert geojson annotations from world space coordinates to pixel space coordinates using the following command:
# with gdal_env
python spacenet_world_to_pixel_coords.py
  1. Set path to raw SpaceNet dataset's tiles and gts in L202 & L203 in spacenet_to_coco.py
  2. Run the following command to prepare the SpaceNet dataset's train and validation splits in MS COCO format. The first 15% tiles kept as validation split.
# with pix2poly_env
python spacenet_to_coco.py

WHU Buildings Dataset

  1. Download the 0.2 meter split of the WHU Buildings Aerial Imagery Dataset.
  2. Extract and place the aerial image tiles (512x512) in the data folder in the following directory structure:
data/WHU_aerial_0.2/
├── test/
│   ├── image/
│   └── label/
├── train/
│   ├── image/
│   └── label/
└── val/
    ├── image/
    └── label/
  1. Set path to raw WHU Buildings tiles and gts (512x512) in L263 & L264 in whu_buildings_to_coco.py
  2. Run the following command to prepare the WHU Buildings dataset's train, validation and test splits in MS COCO format.
# with pix2poly_env
python whu_buildings_to_coco.py

Massachusetts Roads Dataset

  1. Download the Massachusetts Roads Dataset using the following command:
./download_mass_roads_dataset.sh
  1. Download and extract the roads vector shapefile for the dataset here. Use QGIS or any preferred tool to convert the SHP file to a geojson.
  2. From this vector roads geojson, generate vector annotations for each tile in the dataset by clipping the geojson to the corresponding raster extents:
# using gdal_env
python mass_roads_clip_shapefile.py
  1. This results in the following directory structure containing the 1500x1500 tiles of the Massachusetts Roads Dataset:
data/mass_roads_1500/
├── test/
│   ├── map/
│   ├── sat/
│   └── shape/
├── train/
│   ├── map/
│   ├── sat/
│   └── shape/
└── valid/
    ├── map/
    ├── sat/
    └── shape/
  1. Split the 1500x1500 tiles into 224x224 overlapping patches with the following command:
# using gdal_env
python mass_roads_tiles_to_patches.py
  1. Generate vector annotation files for the patches as follows:
# using gdal_env
python mass_roads_clip_tile_vectors.py
python mass_roads_world_to_pixel_coords.py
  1. This results in the processed 224x224 patches of the Massachusetts Roads Dataset to be used for training Pix2Poly in the following directory structure:
data/mass_roads_224/
├── test/
│   ├── map/
│   ├── pixel_annotations/
│   ├── sat/
│   └── shape/
├── train/
│   ├── map/
│   ├── pixel_annotations/
│   ├── sat/
│   └── shape/
└── valid/
    ├── map/
    ├── pixel_annotations/
    ├── sat/
    └── shape/