-
Notifications
You must be signed in to change notification settings - Fork 7
Dataset Examples
We prepared two sample datasets and shared via Dropbox folder.
Please download the data to your local machine before running the provided I/O functions to load them. No Dropbox account is required to access the files.
The mimic-iv folder contains a subset sampled from the MIMIC-IV Demo datasets. The data modalities include: Echocardiogram (ECHO), Electrocardiogram (ECG), Chest X-Ray (CXR), Electronic Health Record (EHR) tabular data, and medical notes (text). This is the main dataset for the hackathon.
We rearranged the subject IDs so that the samples are aligned across modalities.
The molecule-protein interaction folder contains a subset for molecule-protein interaction prediction sampled from the BioSNAP dataset. This dataset is used for demonstrate how your design and implementation can be adapted to other domains.
The following are some example data loading modules in the PyKale library that used for the tutorials in Open Biomedical Multimodal AI Research Workshop @EMBC 2025 and can be used as references for the modality-specific required functions and attributes.
Gow, B., Pollard, T., Nathanson, L. A., Moody, B., Johnson, A., Moukheiber, D., Greenbaum, N., Berkowitz, S., Eslami, P., Herbst, E., Mark, R., & Horng, S. (2022). MIMIC-IV-ECG Demo - Diagnostic Electrocardiogram Matched Subset Demo (version 0.1). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/4eqn-kt76
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV Clinical Database Demo (version 2.2). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/dp1f-ex47
Zitnik, M., Sosič, R., Maheshwari, S. & Leskovec, J. (2018). BioSNAP datasets: Stanford biomedical network dataset collection. https://snap.stanford.edu/biodata.