-
Notifications
You must be signed in to change notification settings - Fork 7
Data Loading Modules
All dataset loaders live under mmai25_hackathon/load_data. Each module has a dedicated guide with dataset layout,
function docs, quick examples, and CLI usage. This page indexes those modules at a glance.
| Modality | Python module | Doc | Expected root | Outputs |
|---|---|---|---|---|
| Chest X‑ray (CXR) | load_data/cxr.py |
Chest‑X‑Ray | CXR root with files/
|
pd.DataFrame with cxr_path; PIL images |
| Echocardiogram | load_data/echo.py |
Echocardiogram | ECHO root with files/, echo-record-list.csv
|
pd.DataFrame; (frames, metadata)
|
| Electrocardiogram | load_data/ecg.py |
Electrocardiogram | ECG root with files/, record_list.csv
|
pd.DataFrame; (signals, fields)
|
| Clinical Notes | load_data/text.py |
Text | Note v2.2 .../note/ with CSVs |
pd.DataFrame; text extraction |
| Electronic Health Record | load_data/ehr.py |
Electronic‑Health‑Record |
mimic-iv-3.1/ with hosp/, icu/
|
merged pd.DataFrame or dict |
| Molecule (SMILES) | load_data/molecule.py |
Molecule | CSV with SMILES
|
pd.DataFrame; PyG graphs |
| Protein Sequence | load_data/protein.py |
Protein‑Sequence | CSV with Protein
|
pd.DataFrame; integer encodings |
| Labels | load_data/labels.py |
Labels | CSV with label column(s) |
pd.DataFrame; one‑hot labels |
| Tabular Utilities | load_data/tabular.py |
Tabular | Any CSVs |
pd.DataFrame; merged components |
Module: mmai25_hackathon/load_data/cxr.py · Doc: Chest‑X‑Ray
Maps metadata DICOM IDs to JPGs stored under files/ and returns a DataFrame with absolute image paths (cxr_path).
Includes an image loader to open radiographs as grayscale or RGB PIL images. See the doc’s CLI Usage to preview data.
Module: mmai25_hackathon/load_data/echo.py · Doc: Echocardiogram
Resolves .dcm paths from echo-record-list.csv, filters to existing files, and loads cine sequences as (T, H, W)
NumPy arrays with metadata. See the doc’s CLI Usage for a quick run command.
Module: mmai25_hackathon/load_data/ecg.py · Doc: Electrocardiogram
Builds .hea/.dat paths from record_list.csv, ensures pairs exist, and loads signals via WFDB, returning
(signals, fields). See the doc’s CLI Usage for a quick run command.
Module: mmai25_hackathon/load_data/text.py · Doc: Text
Loads radiology or discharge notes from MIMIC‑IV Note v2.2. Optionally merges <subset>_detail.csv, trims/drops empty
text, and provides helpers to extract the note text (plus metadata). See the doc’s CLI Usage for an example.
Module: mmai25_hackathon/load_data/ehr.py · Doc: Electronic‑Health‑Record
Discovers and loads tables from hosp/ and/or icu/, with optional per‑table column selection and row filters. Can
merge tables on shared keys into a single DataFrame, or return a dict when merge=False.
Module: mmai25_hackathon/load_data/molecule.py · Doc: Molecule
Fetches SMILES strings from a CSV/DataFrame and converts them to PyTorch Geometric graphs. See the doc for examples
and CLI Usage.
Module: mmai25_hackathon/load_data/protein.py · Doc: Protein‑Sequence
Reads protein sequences from a CSV/DataFrame and encodes each into a fixed‑length integer array (A–Z, excluding J; 0=pad/unknown). See the doc for examples and CLI Usage.
Module: mmai25_hackathon/load_data/labels.py · Doc: Labels
Fetches/loads label columns (single/multi) from a CSV/DataFrame and provides one‑hot encoding helpers for categorical labels.
Module: mmai25_hackathon/load_data/tabular.py · Doc: Tabular
Thin wrapper over pd.read_csv with column selection and row filtering. Includes a merging helper that forms
connected components by overlapping key columns.
- Paths shown in examples are illustrative; point to your local dataset roots. In docs,
MMAI25Hackathonrefers to the unzipped Dropbox folder. - Loaders validate required folders/files and raise helpful errors; you don’t need separate sanity checks.
- Use each module’s “CLI Usage” section to quickly preview data and sanity‑check paths.