Skip to content

API Standardization#4

Merged
zaRizk7 merged 46 commits intomainfrom
standardize-api
Sep 12, 2025
Merged

API Standardization#4
zaRizk7 merged 46 commits intomainfrom
standardize-api

Conversation

@zaRizk7
Copy link
Copy Markdown
Collaborator

@zaRizk7 zaRizk7 commented Sep 12, 2025

Refer to copilot post below 😉.

#4 (review)

- remove typing on var definition to reduce footprint
- remove graph conversion
 - include subset_cols in demo runtime
 - extend read_tabular functionality
 - use param validation for all functions defined
- refactor all duplicate code
- reuse tabular.py functionalities
- merge_tables and get_tabular_mimic are merged into single fetch_mimic_iv_ehr
- use pykale style pydocs for consistency
- loosen the coupling for base data path
- include param_validation for functions
- use argparse to run demo on the module
- improve function pydocs to mimic pykale
- decouple base path dependency
- use sklearn param_validation to validate args
- include examples for each function
- reuse tabular io functionality
- include module level pydoc
- include logging
- provide more explanations for the preview cli output
- include module level pydoc
- major refactor
- include param_validation
- loosen the coupling for data path finding
- use pykale style pydoc for functions
- include argparse for executable demo
- use logging to replace print
- include module level pydoc
- reuse tabular io
- use param_validation
- use pykale style pydoc for functions
- simplify metadata extraction for loading dicom
- use argparse for executable demo
- include module level pydoc
- rename function from fetch to load
- use tuple over list for available tables in mimic-iv
- provide logging to replace print
- print used only in executable demo
- update module level pydoc for consistency
- include logging and param_validation
- reuse read_tabular from tabular io module
- parser argument changed from keyword to positional for consistency
- update module level pydoc to be consistent
- include param_validation and logging
- use pos args over kwargs for argparse to be consistent
- update module pydoc to keep consistency
- use param_validation and reuse tabular io module
- update naming for argparser args from csv_path to data_path
- update module pydoc
- use logging
- separate variable assignment and return in merge_multiple_dataframes to allow logging in each step
- rename argparser args to be consistent with other modules
- include module level pydoc consistent with other io modules
- use param_validation and logging
- text extraction now takes pd.Series rather than a dataframe and note_id
- use argparse over fixed dataset path
- include check if subset_cols is None during read_tabular
- fix typing on return for extract_text_from_note
- add type: ignore on df_by_subset to bypass mypy
- include basesampler for additional options
- update module pydoc
- use dataloader instead of pygdataloader
- simplify method pydoc and reduce initial idea verbosity
- add typing to dataloader and sampler
- use notimplementederror for prepare_data method in basedataset
- standardize the runnable demo with default kwargs in argparse
- the assumption for default base data-path is ./MMAI25Hackathon
- reformat load_mimic_iv_notes function
- include pydoc for load_mimic_iv_notes
- include high-level steps on how the function works for all functions
- include module pydoc
- does not check runnable demo
- assumes MMAI25Hackathon is in cwd for testing with real data for integration testing
codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

codescene-delta-analysis[bot]

This comment was marked as outdated.

@zaRizk7 zaRizk7 requested a review from Copilot September 12, 2025 13:21
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements comprehensive API standardization across the hackathon codebase. The purpose is to consolidate data loading utilities, enhance test coverage, and establish a unified interface for handling multimodal datasets (EHR, images, text, molecules, proteins, etc.).

  • Standardized data loading APIs with consistent parameter validation and error handling
  • Added comprehensive test suite covering all data loaders with both unit and integration tests
  • Unified configuration management through pyproject.toml with proper dependency separation

Reviewed Changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Restructured dependencies, added dev extras, updated tooling configuration
mmai25_hackathon/load_data/*.py Standardized APIs with sklearn validation, consistent logging, unified error handling
tests/load_data/test_*.py Comprehensive test coverage for all data loading modules
tests/test_dataset.py Tests for base dataset/dataloader/sampler utilities
tests/dropbox_download.py CI integration test data download utility
mmai25_hackathon/dataset.py Enhanced base classes with better documentation and prepare_data method

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

return df[list(label_col)]

return df[label_col].to_frame("label")
return df[label_col].to_frame("label").reset_index(drop=index_col is None)

This comment was marked as resolved.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional to ensure the index is back to become column

- check for list tuple instead of sequence

Co-authored-by: Copilot <[email protected]>
codescene-delta-analysis[bot]

This comment was marked as outdated.

- remove sequence is string check since checking now given list and tuple
codescene-delta-analysis[bot]

This comment was marked as outdated.

- include break if the metadata_path is found
codescene-delta-analysis[bot]

This comment was marked as outdated.

- use high-level steps for load_mimic_iv_notes
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (2 files improve in Code Health)

Gates Failed
Enforce critical code health rules (1 file with Brain Method, Deep, Nested Complexity)
Enforce advisory code health rules (5 files with Complex Method, Complex Conditional)

Gates Passed
2 Quality Gates Passed

See analysis details in CodeScene

Reason for failure
Enforce critical code health rules Violations Code Health Impact
ehr.py 2 critical rules 9.54 → 7.59 Suppress
Enforce advisory code health rules Violations Code Health Impact
ehr.py 1 advisory rule 9.54 → 7.59 Suppress
tabular.py 2 advisory rules 8.61 → 7.78 Suppress
molecule.py 1 advisory rule 10.00 → 9.69 Suppress
protein.py 1 advisory rule 10.00 → 9.69 Suppress
supervised_labels.py 1 advisory rule 10.00 → 9.69 Suppress
View Improvements
File Code Health Impact Categories Improved
tabular.py 8.61 → 7.78 Complex Method
ehr.py 9.54 → 7.59 Complex Method, Bumpy Road Ahead
cxr.py 9.46 → 9.64 Complex Method, Bumpy Road Ahead
text.py 9.33 → 9.53 Complex Method, Bumpy Road Ahead

Quality Gate Profile: Clean Code Collective
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

Comment on lines +37 to +42
def fetch_smiles_from_dataframe(
df: Union[pd.DataFrame, str],
smiles_col: str,
index_col: str = None,
filter_rows: Optional[Dict[str, Union[Sequence, pd.Index]]] = None,
) -> pd.DataFrame:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Complex Method
fetch_smiles_from_dataframe has a cyclomatic complexity of 9, threshold = 9

Suppress

Comment on lines +42 to +45
df: Union[pd.DataFrame, str],
prot_seq_col: str,
index_col: str = None,
filter_rows: Optional[Dict[str, Union[Sequence, pd.Index]]] = None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Complex Method
fetch_protein_sequences_from_dataframe has a cyclomatic complexity of 9, threshold = 9

Suppress

df: Union[pd.DataFrame, str],
label_col: Union[str, Sequence[str]],
index_col: str = None,
filter_rows: Optional[Dict[str, Union[Sequence, pd.Index]]] = None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Complex Method
fetch_supervised_labels_from_dataframe has a cyclomatic complexity of 11, threshold = 9

Suppress

@zaRizk7 zaRizk7 merged commit 8dba63e into main Sep 12, 2025
6 of 7 checks passed
@zaRizk7 zaRizk7 deleted the standardize-api branch September 12, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants