Official repository for Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models accepted at IEEE JBHI (Special Issue on Foundation Models in Medical Imaging).
To reproduce all experiments, the following steps need to be completed first:
Our work is based on MS-CXR, which is a subset of the large-scale MIMIC-CXR dataset. Please note that only credentialed PhysioNet users can access both datasets.
Create a virtual environment using the provided requirements.txt
file
# via pip
pip install -r requirements.txt
# via Conda
conda create --name <your_env_name> --file requirements.txt
Instructions on how to download weights for the LDM pre-trained on MIMIC-CXR can be found in [1] (see below). The downloaded checkpoints are expected to be in a directory called models/
Code for instantiating both BioViL and BioViL-T models is provided in the HI-ML Multimodal Toolbox repository. You can either install the toolbox via pip or clone the repository in health_multimodal
directory -- see [2] below.
To perform phrase grounding with the pre-trained LDM, you can run the following script:
python3 eval_ldm.py
To perform phrase grounding with either BioViL or BioViL-T (this can be controlled through the model-name
argument), you can run the following script:
python3 eval_biovil_t.py --model-name biovil_t
- https://github.com/Project-MONAI/GenerativeModels/tree/main/model-zoo/models/cxr_image_synthesis_latent_diffusion_model
- Links for pre-trained model weights can be found in the
large_files.yml
file.
- Links for pre-trained model weights can be found in the
- https://github.com/microsoft/hi-ml/tree/main/hi-ml-multimodal/src/health_multimodal
- https://github.com/Warvito/generative_chestxray