Skip to content

Dataset Examples

shuo-zhou~ edited this page Sep 14, 2025 · 3 revisions

Sample datasets for this Hackathon

We prepared two sample datasets and shared via Dropbox folder.

Please download the data to your local machine before running the provided I/O functions to load them. No Dropbox account is required to access the files.

The mimic-iv folder contains a subset sampled from the MIMIC-IV Demo datasets. The data modalities include: Echocardiogram (ECHO), Electrocardiogram (ECG), Chest X-Ray (CXR), Electronic Health Record (EHR) tabular data, and medical notes (text). This is the main dataset for the hackathon.

We rearranged the subject IDs so that the samples are aligned across modalities.

The molecule-protein interaction folder contains a subset for molecule-protein interaction prediction sampled from the BioSNAP dataset. This dataset is used for demonstrate how your design and implementation can be adapted to other domains.

Modality-specific data loading module examples

The following are some example data loading modules in the PyKale library that used for the tutorials in Open Biomedical Multimodal AI Research Workshop @EMBC 2025 and can be used as references for the modality-specific required functions and attributes.

References

Gow, B., Pollard, T., Nathanson, L. A., Moody, B., Johnson, A., Moukheiber, D., Greenbaum, N., Berkowitz, S., Eslami, P., Herbst, E., Mark, R., & Horng, S. (2022). MIMIC-IV-ECG Demo - Diagnostic Electrocardiogram Matched Subset Demo (version 0.1). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/4eqn-kt76

Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV Clinical Database Demo (version 2.2). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/dp1f-ex47

Zitnik, M., Sosič, R., Maheshwari, S. & Leskovec, J. (2018). BioSNAP datasets: Stanford biomedical network dataset collection. https://snap.stanford.edu/biodata.

Clone this wiki locally