UBC Ovarian Cancer Subtype Classification and Outlier Detection (UBC-OCEAN) - Kaggle competition entry
Navigating Ovarian Cancer: Unveiling Common Histotypes and Unearthing Rare Variants
This Kaggle competition was set up to classify ovarian cancer subtypes from histopathology images, with a focus on generalisability across medical centres. Hence, in training my deep learning model, I carefully chose data augmentations to mimic possible images from other centres, and upsampled minority classes to ensure they would not be forgotten about by the model.
The general approach used here is to mask and patch the images, extract patch features using a pretrained histology model, train a simple model to classify these features, using the mode to get a slide-level prediction, and finally train an outlier detection model for use at inference.
Due to the nature of the Kaggle challenge, I wrote Python code in self-contained Jupyter Notebooks.
The notebooks are ordered and described here:
1 Explore Metadatalooks at the labels and data provided, exploring distributions2 Review Imagesloads and considers the images, including whole slide images (WSIs) and tissue microarrays (TMAs)3 Save as tiffsaves each png image as a tiff file to help load the large images during training4 Mask TMAsuses traditional imaging techniques to develop an algorithm to mask out background for the TMAs5 Patch Imagessaves the coordinates of the image patches to save time during training6 Augment Datavisualises possible augmentations to use on the images7 Construct Feature Modelsets up the pretrained feature encoder model8 Train Decoder Modeltrains an MLP with dropout to classify the patch features9 Train Outlier Detection Modeltests decoder model using mode for slide-level prediction and trains a one class SVM to detect outliers
Learnings and future work:
- Used OpenSlide initially but pyvips worked better on Kaggle.
- For training, pre-saved tissue patch coordinates and masked images as tiff files, using OpenSlide to load. For inference, sampled and checked patches in real time, loading using pyvips.
- With more time I would implement a more sophisticated MIL method e.g. an attention model
This entry received a bronze-level score in the final competition grading.
Link to Kaggle competition: https://www.kaggle.com/competitions/UBC-OCEAN/overview