-
Notifications
You must be signed in to change notification settings - Fork 3
Issue 780: Adds forecast_epiautogp
#783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## epiautogp #783 +/- ##
============================================
Coverage ? 45.78%
============================================
Files ? 35
Lines ? 3191
Branches ? 0
============================================
Hits ? 1461
Misses ? 1730
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive forecasting pipeline for the EpiAutoGP model, enabling it to forecast both daily and weekly ED visits (NSSP) and hospital admissions (NHSN). The implementation follows the existing pipeline patterns while introducing new utilities for data conversion, model execution, and post-processing specific to EpiAutoGP's Julia-based workflow.
Key Changes
- Introduced
forecast_epiautogp.pyas the main entry point for the EpiAutoGP forecasting pipeline - Added shared pipeline utilities (
ForecastPipelineContext,ModelPaths,setup_forecast_pipeline) to reduce code duplication - Enhanced Julia EpiAutoGP model to support flexible daily/weekly forecast strides with improved parameter naming
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pipelines/tests/test_prep_epiautogp_data.py | Removed (tests moved or consolidated) |
| pipelines/tests/test_forecast_utils.py | New unit tests for forecast pipeline utilities using mocks |
| pipelines/tests/test_epiautogp_prep_script.py | Removed preparation script tests |
| pipelines/tests/test_epiautogp_prep.sh | Removed shell test for data preparation |
| pipelines/tests/test_epiautogp_fit.sh | New shell script to run single EpiAutoGP forecasts |
| pipelines/tests/test_epiautogp_end_to_end.sh | New comprehensive end-to-end integration test |
| pipelines/forecast_pyrenew.py | Fixed import organization (moved to absolute imports) |
| pipelines/epiautogp/process_epiautogp_forecast.py | New post-processing utilities for EpiAutoGP outputs |
| pipelines/epiautogp/prep_epiautogp_data.py | Enhanced data conversion with context/paths pattern and ed_visit_type support |
| pipelines/epiautogp/plot_epiautogp_forecast.R | New R plotting script for EpiAutoGP-specific visualizations |
| pipelines/epiautogp/forecast_epiautogp.py | Main pipeline entry point orchestrating all steps |
| pipelines/epiautogp/epiautogp_forecast_utils.py | Shared utilities and dataclasses for pipeline stages |
| pipelines/epiautogp/init.py | Updated exports for new utilities |
| pipelines/epiautogp/README.md | Comprehensive documentation of pipeline architecture |
| EpiAutoGP/test/test_parse_arguments.jl | Updated test for renamed parameter |
| EpiAutoGP/test/test_output.jl | Added new required fields to test data |
| EpiAutoGP/test/test_modelling.jl | Updated tests for renamed parameter and added daily frequency test |
| EpiAutoGP/test/test_input.jl | Updated all test inputs with new required fields |
| EpiAutoGP/src/parse_arguments.jl | Renamed n-forecast-weeks to n-ahead for flexibility |
| EpiAutoGP/src/output.jl | Added PipelineOutput type and refactored output creation |
| EpiAutoGP/src/modelling.jl | Updated to support daily/weekly frequencies with time_step calculation |
| EpiAutoGP/src/input.jl | Added frequency, use_percentage, and ed_visit_type fields |
| EpiAutoGP/src/EpiAutoGP.jl | Added Parquet dependency and new constants |
| EpiAutoGP/run.jl | Switched default output type to PipelineOutput |
| EpiAutoGP/Project.toml | Added Parquet dependency and reordered authors field |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
damonbayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few open ended questions. I feel somewhat strongly that pipelines/epiautogp/process_epiautogp_forecast.py should be implemented in hewr/R/process_loc_forecast.R. Maintaining both seems risky.
Can we make an issue to make At the moment, |
|
@SamuelBrand1 I'll make an issue and offer my thoughts. |
Actually I think #781 already covers this? |
993f353 to
fad810b
Compare
bc5dbce to
9482d35
Compare
Using the S3 generic introduced in #790 I've added a method for handling the |
fad810b to
ce85a4c
Compare
|
#790 broke the full pipeline at some point in plotting... so this is reblocked. |
d89d3e4 to
60befce
Compare
ce85a4c to
ad68f9d
Compare
60befce to
abe63e8
Compare
ad68f9d to
3951168
Compare
abe63e8 to
26bba0b
Compare
I've rebased The way this works is that if no The idea is to stop endless creep of new model specific parameters, although I acknowledge that the auto-detect pattern is pretty bad for this. That can be fixed in a new issue. This is a PR into |
Co-authored-by: Copilot <[email protected]>
Replaces the use of process_epiautogp_forecast and a custom R plotting script with plot_and_save_loc_forecast, which handles both sample processing and plotting via hewr. Updates the post_process_forecast method to streamline steps and improve maintainability.
Introduces the process_model_samples.epiautogp S3 method and interface from pipelines
Replaces the epiautogp_model_name parameter with a generic model_name in plot_and_save_loc_forecast and related calls. Updates documentation and tests to reflect the new parameter, enabling auto-detection and dispatching for different model types.
Refactor logic to determine the correct samples file based on both frequency (epiweekly or daily) and target type (NHSN or NSSP) from the model name. This makes the file selection more robust and explicit, and adds error handling for unknown target types.
Introduces a new --n-threads argument to specify the number of threads used for EpiAutoGP computations, defaulting to 1. This allows users to control parallelism directly from the command line.
Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling.
Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata.
Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations.
Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency.
Co-authored-by: Copilot <[email protected]>
a645bea to
abd6f33
Compare
damonbayer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SamuelBrand1!
* Add shared forecast pipeline utilities and tests Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling. * add Parquet dep * reduce docstring bloat * Add PipelineOutput support for pipeline forecasts Outputs to expected parquet format * Add DEFAULT_TARGET_LETTER and update output filenames Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata. * move utils and rename paths dataclass * Add use_percentage flag to EpiAutoGPInput and output logic Introduces a use_percentage boolean field to EpiAutoGPInput to distinguish between raw counts and percentage-based input for ED visits. Updates output logic to set the variable name and convert values to proportions when use_percentage is true for nssp targets. Test cases and input construction are updated accordingly. * Refactor EpiAutoGP pipeline and add end-to-end tests Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations. * Update .gitignore * Refactor EpiAutoGP post-processing into utility function Consolidated forecast post-processing steps (processing outputs, creating hubverse table, and plotting) into a single post_process_forecast utility in epiautogp_forecast_utils.py. Updated imports and usage in __init__.py and forecast_epiautogp.py for improved modularity and code reuse. Added param_data_dir to ForecastPipelineContext and setup_forecast_pipeline. * Refactor forecast utils to use context methods Moved prepare_model_data and post_process_forecast functions into ForecastPipelineContext as methods. Updated imports and usage in forecast_epiautogp.py and __init__.py to use the new class methods, improving encapsulation and code organization. * Update README.md * Add frequency to input and generalize forecast horizon Introduces a 'frequency' field to EpiAutoGPInput to support both daily and epiweekly data. Refactors modelling and argument parsing to use a generic 'n_ahead' parameter (number of time steps) instead of 'n_forecast_weeks', and updates all related documentation, tests, and function signatures for consistency and flexibility. * Add ed_visit_type to input and output handling Introduces the ed_visit_type field to EpiAutoGPInput for specifying the type of ED visits, updates output logic to use this field for column selection, and adjusts tests and documentation accordingly. Also updates output file naming to use the frequency prefix. * Add ed_visit_type param for NSSP/ED visit modeling Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency. * Add daily NSSP forecast tests and support for ED visit type Expanded end-to-end and fit test scripts to include daily NSSP count and 'other ED visits' forecasts. Updated argument handling in test_epiautogp_fit.sh to support an optional ed_visit_type parameter and adjusted expected model counts accordingly. * Refactor forecast utils tests and remove prep_epiautogp tests Updated test_forecast_utils.py to use new ForecastPipelineContext interface, updated patch paths, and migrated to context methods for prepare_model_data and post_process_forecast. Removed test_prep_epiautogp_data.py as part of test suite cleanup. * update epiautogp docstrings * Update prep_epiautogp_data.py * Update output.jl * add nhsn test coverage * reorg unit tests * caught anti-pattern * Update pipelines/epiautogp/process_epiautogp_forecast.py Co-authored-by: Copilot <[email protected]> * explain use of percentage * Refactor forecast post-processing to use hewr plotting Replaces the use of process_epiautogp_forecast and a custom R plotting script with plot_and_save_loc_forecast, which handles both sample processing and plotting via hewr. Updates the post_process_forecast method to streamline steps and improve maintainability. * remove redundant files * update tests due to removed funcs * Add EpiAutoGP model support to process_loc_forecast Introduces the process_model_samples.epiautogp S3 method and interface from pipelines * Refactor to use generic model_name in forecast utilities Replaces the epiautogp_model_name parameter with a generic model_name in plot_and_save_loc_forecast and related calls. Updates documentation and tests to reflect the new parameter, enabling auto-detection and dispatching for different model types. * Improve sample file selection in process_loc_forecast.R Refactor logic to determine the correct samples file based on both frequency (epiweekly or daily) and target type (NHSN or NSSP) from the model name. This makes the file selection more robust and explicit, and adds error handling for unknown target types. * Add n-threads argument to CLI for EpiAutoGP Introduces a new --n-threads argument to specify the number of threads used for EpiAutoGP computations, defaulting to 1. This allows users to control parallelism directly from the command line. * Add shared forecast pipeline utilities and tests Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling. * change to relative imports * Add DEFAULT_TARGET_LETTER and update output filenames Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata. * move utils and rename paths dataclass * Refactor EpiAutoGP pipeline and add end-to-end tests Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations. * Add ed_visit_type param for NSSP/ED visit modeling Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency. * update epiautogp docstrings * caught anti-pattern * Update pipelines/epiautogp/process_epiautogp_forecast.py Co-authored-by: Copilot <[email protected]> * remove redundant files --------- Co-authored-by: Copilot <[email protected]>
* Add shared forecast pipeline utilities and tests Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling. * add Parquet dep * reduce docstring bloat * Add PipelineOutput support for pipeline forecasts Outputs to expected parquet format * Add DEFAULT_TARGET_LETTER and update output filenames Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata. * move utils and rename paths dataclass * Add use_percentage flag to EpiAutoGPInput and output logic Introduces a use_percentage boolean field to EpiAutoGPInput to distinguish between raw counts and percentage-based input for ED visits. Updates output logic to set the variable name and convert values to proportions when use_percentage is true for nssp targets. Test cases and input construction are updated accordingly. * Refactor EpiAutoGP pipeline and add end-to-end tests Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations. * Update .gitignore * Refactor EpiAutoGP post-processing into utility function Consolidated forecast post-processing steps (processing outputs, creating hubverse table, and plotting) into a single post_process_forecast utility in epiautogp_forecast_utils.py. Updated imports and usage in __init__.py and forecast_epiautogp.py for improved modularity and code reuse. Added param_data_dir to ForecastPipelineContext and setup_forecast_pipeline. * Refactor forecast utils to use context methods Moved prepare_model_data and post_process_forecast functions into ForecastPipelineContext as methods. Updated imports and usage in forecast_epiautogp.py and __init__.py to use the new class methods, improving encapsulation and code organization. * Update README.md * Add frequency to input and generalize forecast horizon Introduces a 'frequency' field to EpiAutoGPInput to support both daily and epiweekly data. Refactors modelling and argument parsing to use a generic 'n_ahead' parameter (number of time steps) instead of 'n_forecast_weeks', and updates all related documentation, tests, and function signatures for consistency and flexibility. * Add ed_visit_type to input and output handling Introduces the ed_visit_type field to EpiAutoGPInput for specifying the type of ED visits, updates output logic to use this field for column selection, and adjusts tests and documentation accordingly. Also updates output file naming to use the frequency prefix. * Add ed_visit_type param for NSSP/ED visit modeling Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency. * Add daily NSSP forecast tests and support for ED visit type Expanded end-to-end and fit test scripts to include daily NSSP count and 'other ED visits' forecasts. Updated argument handling in test_epiautogp_fit.sh to support an optional ed_visit_type parameter and adjusted expected model counts accordingly. * Refactor forecast utils tests and remove prep_epiautogp tests Updated test_forecast_utils.py to use new ForecastPipelineContext interface, updated patch paths, and migrated to context methods for prepare_model_data and post_process_forecast. Removed test_prep_epiautogp_data.py as part of test suite cleanup. * update epiautogp docstrings * Update prep_epiautogp_data.py * Update output.jl * add nhsn test coverage * reorg unit tests * caught anti-pattern * Update pipelines/epiautogp/process_epiautogp_forecast.py Co-authored-by: Copilot <[email protected]> * explain use of percentage * Refactor forecast post-processing to use hewr plotting Replaces the use of process_epiautogp_forecast and a custom R plotting script with plot_and_save_loc_forecast, which handles both sample processing and plotting via hewr. Updates the post_process_forecast method to streamline steps and improve maintainability. * remove redundant files * update tests due to removed funcs * Add EpiAutoGP model support to process_loc_forecast Introduces the process_model_samples.epiautogp S3 method and interface from pipelines * Refactor to use generic model_name in forecast utilities Replaces the epiautogp_model_name parameter with a generic model_name in plot_and_save_loc_forecast and related calls. Updates documentation and tests to reflect the new parameter, enabling auto-detection and dispatching for different model types. * Improve sample file selection in process_loc_forecast.R Refactor logic to determine the correct samples file based on both frequency (epiweekly or daily) and target type (NHSN or NSSP) from the model name. This makes the file selection more robust and explicit, and adds error handling for unknown target types. * Add n-threads argument to CLI for EpiAutoGP Introduces a new --n-threads argument to specify the number of threads used for EpiAutoGP computations, defaulting to 1. This allows users to control parallelism directly from the command line. * Add shared forecast pipeline utilities and tests Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling. * change to relative imports * Add DEFAULT_TARGET_LETTER and update output filenames Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata. * move utils and rename paths dataclass * Refactor EpiAutoGP pipeline and add end-to-end tests Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations. * Add ed_visit_type param for NSSP/ED visit modeling Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency. * update epiautogp docstrings * caught anti-pattern * Update pipelines/epiautogp/process_epiautogp_forecast.py Co-authored-by: Copilot <[email protected]> * remove redundant files --------- Co-authored-by: Copilot <[email protected]>
This PR closes #780 .
This pull request introduces several enhancements and refactors to:
EpIAutoGP. These make the model flexible in doing daily or weekly forecast strides, and formats output to work more smoothly with post-processing.pipelines/epiautogp. The python module that integrates runningEpiAutoGPwith thepyrenew-hewpipeline. The entrypoint script for this isforecast_epiautogp.pywhich is similar toforecast_pyrenew.pyandforecast_timeseries.py.forecast_epiautogp.pyWhere possible I have reused functionality that exists in
pipelines, however, to make dev easier I have introduced some classes and functions that wrap multiple steps. Inforecast_epiautogp.pythere are the following steps:setup_forecast_pipelinewhich does the credential step and the existing data wrangling code and puts the pipeline information into aForecastPipelineContextdataclass object. This reduced the amount of parameters that need passing around.ModelPathsdataclass object to hold the various paths that get passed around.EpiAutoGPdata set up that creates a newJSONdata file forEpiAutoGPto use.EpiAutoGPmodelEpiAutoGPbut it remains based on current post-processing functions.End to end testing
I've add the integration test
pipelines/tests/test_epiautogp_end_to_end.shwhich matches the structure of the existing end to end tests but currently only for covid. Model options cover: