Issue 780: Adds `forecast_epiautogp` #783

SamuelBrand1 · 2025-12-12T17:30:02Z

This PR closes #780 .

This pull request introduces several enhancements and refactors to:

EpIAutoGP. These make the model flexible in doing daily or weekly forecast strides, and formats output to work more smoothly with post-processing.
pipelines/epiautogp. The python module that integrates running EpiAutoGP with the pyrenew-hew pipeline. The entrypoint script for this is forecast_epiautogp.py which is similar to forecast_pyrenew.py and forecast_timeseries.py.

`forecast_epiautogp.py`

Where possible I have reused functionality that exists in pipelines, however, to make dev easier I have introduced some classes and functions that wrap multiple steps. In forecast_epiautogp.py there are the following steps:

prelim step which generates a contextual model name, groups the parameters and exe flags into dicts etc
A pipeline setup step setup_forecast_pipeline which does the credential step and the existing data wrangling code and puts the pipeline information into a ForecastPipelineContext dataclass object. This reduced the amount of parameters that need passing around.
A data setup step which calls a method on the pipeline context to set the data up for model usage and returns a ModelPaths dataclass object to hold the various paths that get passed around.
A specific EpiAutoGP data set up that creates a new JSON data file for EpiAutoGP to use.
A step that runs the EpiAutoGP model
A post-processing step which calls a method on the pipeline context. This does the output formatting, plotting and hubverse table creation. For this I had to write some specific functions for EpiAutoGP but it remains based on current post-processing functions.

End to end testing

I've add the integration test pipelines/tests/test_epiautogp_end_to_end.sh which matches the structure of the existing end to end tests but currently only for covid. Model options cover:

running and forecasting on weekly NHSN data
Weekly NSSP % ED visits
daily ED visit counts
daily other ED visits counts.

codecov · 2025-12-12T17:51:19Z

Codecov Report

❌ Patch coverage is 68.15068% with 93 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (epiautogp@1b3bf63). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
pipelines/epiautogp/forecast_epiautogp.py	0.00%	41 Missing ⚠️
hewr/R/process_loc_forecast.R	0.00%	30 Missing ⚠️
pipelines/epiautogp/prep_epiautogp_data.py	4.34%	22 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             epiautogp     #783   +/-   ##
============================================
  Coverage             ?   45.78%           
============================================
  Files                ?       35           
  Lines                ?     3191           
  Branches             ?        0           
============================================
  Hits                 ?     1461           
  Misses               ?     1730           
  Partials             ?        0

Flag	Coverage Δ
hewr	`36.77% <0.00%> (?)`
pipelines	`45.48% <75.95%> (?)`
pyrenew_hew	`62.29% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds a comprehensive forecasting pipeline for the EpiAutoGP model, enabling it to forecast both daily and weekly ED visits (NSSP) and hospital admissions (NHSN). The implementation follows the existing pipeline patterns while introducing new utilities for data conversion, model execution, and post-processing specific to EpiAutoGP's Julia-based workflow.

Key Changes

Introduced forecast_epiautogp.py as the main entry point for the EpiAutoGP forecasting pipeline
Added shared pipeline utilities (ForecastPipelineContext, ModelPaths, setup_forecast_pipeline) to reduce code duplication
Enhanced Julia EpiAutoGP model to support flexible daily/weekly forecast strides with improved parameter naming

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
pipelines/tests/test_prep_epiautogp_data.py	Removed (tests moved or consolidated)
pipelines/tests/test_forecast_utils.py	New unit tests for forecast pipeline utilities using mocks
pipelines/tests/test_epiautogp_prep_script.py	Removed preparation script tests
pipelines/tests/test_epiautogp_prep.sh	Removed shell test for data preparation
pipelines/tests/test_epiautogp_fit.sh	New shell script to run single EpiAutoGP forecasts
pipelines/tests/test_epiautogp_end_to_end.sh	New comprehensive end-to-end integration test
pipelines/forecast_pyrenew.py	Fixed import organization (moved to absolute imports)
pipelines/epiautogp/process_epiautogp_forecast.py	New post-processing utilities for EpiAutoGP outputs
pipelines/epiautogp/prep_epiautogp_data.py	Enhanced data conversion with context/paths pattern and ed_visit_type support
pipelines/epiautogp/plot_epiautogp_forecast.R	New R plotting script for EpiAutoGP-specific visualizations
pipelines/epiautogp/forecast_epiautogp.py	Main pipeline entry point orchestrating all steps
pipelines/epiautogp/epiautogp_forecast_utils.py	Shared utilities and dataclasses for pipeline stages
pipelines/epiautogp/init.py	Updated exports for new utilities
pipelines/epiautogp/README.md	Comprehensive documentation of pipeline architecture
EpiAutoGP/test/test_parse_arguments.jl	Updated test for renamed parameter
EpiAutoGP/test/test_output.jl	Added new required fields to test data
EpiAutoGP/test/test_modelling.jl	Updated tests for renamed parameter and added daily frequency test
EpiAutoGP/test/test_input.jl	Updated all test inputs with new required fields
EpiAutoGP/src/parse_arguments.jl	Renamed `n-forecast-weeks` to `n-ahead` for flexibility
EpiAutoGP/src/output.jl	Added `PipelineOutput` type and refactored output creation
EpiAutoGP/src/modelling.jl	Updated to support daily/weekly frequencies with time_step calculation
EpiAutoGP/src/input.jl	Added frequency, use_percentage, and ed_visit_type fields
EpiAutoGP/src/EpiAutoGP.jl	Added Parquet dependency and new constants
EpiAutoGP/run.jl	Switched default output type to PipelineOutput
EpiAutoGP/Project.toml	Added Parquet dependency and reordered authors field

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

EpiAutoGP/Project.toml

pipelines/epiautogp/prep_epiautogp_data.py

pipelines/epiautogp/epiautogp_forecast_utils.py

EpiAutoGP/src/output.jl

pipelines/tests/test_epiautogp_end_to_end.sh

Copilot

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pipelines/epiautogp/process_epiautogp_forecast.py

pipelines/epiautogp/prep_epiautogp_data.py

pipelines/epiautogp/process_epiautogp_forecast.py

pipelines/tests/test_epiautogp_end_to_end.sh

pipelines/epiautogp/epiautogp_forecast_utils.py

damonbayer

A few open ended questions. I feel somewhat strongly that pipelines/epiautogp/process_epiautogp_forecast.py should be implemented in hewr/R/process_loc_forecast.R. Maintaining both seems risky.

SamuelBrand1 · 2025-12-15T20:08:38Z

A few open ended questions. I feel somewhat strongly that pipelines/epiautogp/process_epiautogp_forecast.py should be implemented in hewr/R/process_loc_forecast.R. Maintaining both seems risky.

Can we make an issue to make process_loc_forecast.R more flexible? And a sub-issue to modify to use this?

At the moment, process_loc_forecast.R takes in pyrenew_model_name, timeseries_model_name with the logic depending on which arg is NA. I'd need to hack this quite a lot to make it flexible. I think the cleanest way to do this would be with some kind of S3 method specific for each model, with a new model needing a new method.

damonbayer · 2025-12-15T20:19:06Z

@SamuelBrand1 I'll make an issue and offer my thoughts.

SamuelBrand1 · 2025-12-15T20:29:54Z

@SamuelBrand1 I'll make an issue and offer my thoughts.

Actually I think #781 already covers this?

SamuelBrand1 · 2025-12-17T19:59:08Z

A few open ended questions. I feel somewhat strongly that pipelines/epiautogp/process_epiautogp_forecast.py should be implemented in hewr/R/process_loc_forecast.R. Maintaining both seems risky.

Using the S3 generic introduced in #790 I've added a method for handling the epiautogp output and use the hewr postprocessing.

SamuelBrand1 · 2025-12-17T21:32:49Z

#790 broke the full pipeline at some point in plotting... so this is reblocked.

SamuelBrand1 · 2025-12-18T11:55:42Z

A few open ended questions. I feel somewhat strongly that pipelines/epiautogp/process_epiautogp_forecast.py should be implemented in hewr/R/process_loc_forecast.R. Maintaining both seems risky.

Using the S3 generic introduced in #790 I've added a method for handling the epiautogp output and use the hewr postprocessing.

I've rebased epiautogp branch on the fixed main branch, and added a new model_name arg to the interface to hewr from pipelines.

The way this works is that if no model_name is given then it falls back on the current patterns. If a model_name is supplied then hewr detects the model_type by pattern matching, and passes along to the appropriate post-processing method.

The idea is to stop endless creep of new model specific parameters, although I acknowledge that the auto-detect pattern is pretty bad for this. That can be fixed in a new issue.

This is a PR into epiautogp so doesn't trigger full CI, but locally I've run the end-to-end integration for the current models.

Co-authored-by: Copilot <[email protected]>

Replaces the use of process_epiautogp_forecast and a custom R plotting script with plot_and_save_loc_forecast, which handles both sample processing and plotting via hewr. Updates the post_process_forecast method to streamline steps and improve maintainability.

Introduces the process_model_samples.epiautogp S3 method and interface from pipelines

Replaces the epiautogp_model_name parameter with a generic model_name in plot_and_save_loc_forecast and related calls. Updates documentation and tests to reflect the new parameter, enabling auto-detection and dispatching for different model types.

Refactor logic to determine the correct samples file based on both frequency (epiweekly or daily) and target type (NHSN or NSSP) from the model name. This makes the file selection more robust and explicit, and adds error handling for unknown target types.

Introduces a new --n-threads argument to specify the number of threads used for EpiAutoGP computations, defaulting to 1. This allows users to control parallelism directly from the command line.

Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling.

Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata.

Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations.

Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency.

Co-authored-by: Copilot <[email protected]>

damonbayer

Thanks @SamuelBrand1!

* Add shared forecast pipeline utilities and tests Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling. * add Parquet dep * reduce docstring bloat * Add PipelineOutput support for pipeline forecasts Outputs to expected parquet format * Add DEFAULT_TARGET_LETTER and update output filenames Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata. * move utils and rename paths dataclass * Add use_percentage flag to EpiAutoGPInput and output logic Introduces a use_percentage boolean field to EpiAutoGPInput to distinguish between raw counts and percentage-based input for ED visits. Updates output logic to set the variable name and convert values to proportions when use_percentage is true for nssp targets. Test cases and input construction are updated accordingly. * Refactor EpiAutoGP pipeline and add end-to-end tests Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations. * Update .gitignore * Refactor EpiAutoGP post-processing into utility function Consolidated forecast post-processing steps (processing outputs, creating hubverse table, and plotting) into a single post_process_forecast utility in epiautogp_forecast_utils.py. Updated imports and usage in __init__.py and forecast_epiautogp.py for improved modularity and code reuse. Added param_data_dir to ForecastPipelineContext and setup_forecast_pipeline. * Refactor forecast utils to use context methods Moved prepare_model_data and post_process_forecast functions into ForecastPipelineContext as methods. Updated imports and usage in forecast_epiautogp.py and __init__.py to use the new class methods, improving encapsulation and code organization. * Update README.md * Add frequency to input and generalize forecast horizon Introduces a 'frequency' field to EpiAutoGPInput to support both daily and epiweekly data. Refactors modelling and argument parsing to use a generic 'n_ahead' parameter (number of time steps) instead of 'n_forecast_weeks', and updates all related documentation, tests, and function signatures for consistency and flexibility. * Add ed_visit_type to input and output handling Introduces the ed_visit_type field to EpiAutoGPInput for specifying the type of ED visits, updates output logic to use this field for column selection, and adjusts tests and documentation accordingly. Also updates output file naming to use the frequency prefix. * Add ed_visit_type param for NSSP/ED visit modeling Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency. * Add daily NSSP forecast tests and support for ED visit type Expanded end-to-end and fit test scripts to include daily NSSP count and 'other ED visits' forecasts. Updated argument handling in test_epiautogp_fit.sh to support an optional ed_visit_type parameter and adjusted expected model counts accordingly. * Refactor forecast utils tests and remove prep_epiautogp tests Updated test_forecast_utils.py to use new ForecastPipelineContext interface, updated patch paths, and migrated to context methods for prepare_model_data and post_process_forecast. Removed test_prep_epiautogp_data.py as part of test suite cleanup. * update epiautogp docstrings * Update prep_epiautogp_data.py * Update output.jl * add nhsn test coverage * reorg unit tests * caught anti-pattern * Update pipelines/epiautogp/process_epiautogp_forecast.py Co-authored-by: Copilot <[email protected]> * explain use of percentage * Refactor forecast post-processing to use hewr plotting Replaces the use of process_epiautogp_forecast and a custom R plotting script with plot_and_save_loc_forecast, which handles both sample processing and plotting via hewr. Updates the post_process_forecast method to streamline steps and improve maintainability. * remove redundant files * update tests due to removed funcs * Add EpiAutoGP model support to process_loc_forecast Introduces the process_model_samples.epiautogp S3 method and interface from pipelines * Refactor to use generic model_name in forecast utilities Replaces the epiautogp_model_name parameter with a generic model_name in plot_and_save_loc_forecast and related calls. Updates documentation and tests to reflect the new parameter, enabling auto-detection and dispatching for different model types. * Improve sample file selection in process_loc_forecast.R Refactor logic to determine the correct samples file based on both frequency (epiweekly or daily) and target type (NHSN or NSSP) from the model name. This makes the file selection more robust and explicit, and adds error handling for unknown target types. * Add n-threads argument to CLI for EpiAutoGP Introduces a new --n-threads argument to specify the number of threads used for EpiAutoGP computations, defaulting to 1. This allows users to control parallelism directly from the command line. * Add shared forecast pipeline utilities and tests Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling. * change to relative imports * Add DEFAULT_TARGET_LETTER and update output filenames Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata. * move utils and rename paths dataclass * Refactor EpiAutoGP pipeline and add end-to-end tests Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations. * Add ed_visit_type param for NSSP/ED visit modeling Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency. * update epiautogp docstrings * caught anti-pattern * Update pipelines/epiautogp/process_epiautogp_forecast.py Co-authored-by: Copilot <[email protected]> * remove redundant files --------- Co-authored-by: Copilot <[email protected]>

SamuelBrand1 requested review from damonbayer, dylanhmorris and sbidari as code owners December 12, 2025 17:30

damonbayer requested a review from Copilot December 12, 2025 22:51

Copilot AI reviewed Dec 12, 2025

View reviewed changes

SamuelBrand1 requested a review from Copilot December 15, 2025 11:02

Copilot started reviewing on behalf of SamuelBrand1 December 15, 2025 11:02 View session

Copilot AI reviewed Dec 15, 2025

View reviewed changes

damonbayer reviewed Dec 15, 2025

View reviewed changes

pipelines/epiautogp/process_epiautogp_forecast.py Outdated Show resolved Hide resolved

damonbayer reviewed Dec 15, 2025

View reviewed changes

pipelines/tests/test_epiautogp_end_to_end.sh Show resolved Hide resolved

damonbayer reviewed Dec 15, 2025

View reviewed changes

pipelines/epiautogp/epiautogp_forecast_utils.py Show resolved Hide resolved

damonbayer requested changes Dec 15, 2025

View reviewed changes

SamuelBrand1 mentioned this pull request Dec 15, 2025

Binary Model Name Parameters in process_loc_forecast.R #788

Closed

damonbayer mentioned this pull request Dec 15, 2025

Separation of concerns in process_loc_forecast.R #789

Open

SamuelBrand1 mentioned this pull request Dec 15, 2025

Issue 788: Add S3 dispatch for model sample processing #790

Merged

SamuelBrand1 force-pushed the epiautogp branch from 993f353 to fad810b Compare December 17, 2025 17:05

SamuelBrand1 force-pushed the 780-add-forecast_epiautogp-function branch from bc5dbce to 9482d35 Compare December 17, 2025 17:05

SamuelBrand1 force-pushed the epiautogp branch from fad810b to ce85a4c Compare December 17, 2025 21:31

SamuelBrand1 force-pushed the 780-add-forecast_epiautogp-function branch from d89d3e4 to 60befce Compare December 17, 2025 21:33

SamuelBrand1 force-pushed the epiautogp branch from ce85a4c to ad68f9d Compare December 17, 2025 22:46

SamuelBrand1 force-pushed the 780-add-forecast_epiautogp-function branch from 60befce to abe63e8 Compare December 17, 2025 22:46

SamuelBrand1 force-pushed the epiautogp branch from ad68f9d to 3951168 Compare December 17, 2025 22:59

SamuelBrand1 force-pushed the 780-add-forecast_epiautogp-function branch from abe63e8 to 26bba0b Compare December 17, 2025 23:00

SamuelBrand1 and others added 22 commits December 30, 2025 09:41

add nhsn test coverage

1e1bec3

reorg unit tests

0ab2318

caught anti-pattern

233e8d4

Update pipelines/epiautogp/process_epiautogp_forecast.py

c460d37

Co-authored-by: Copilot <[email protected]>

explain use of percentage

943ceca

remove redundant files

1cf1106

update tests due to removed funcs

842bfd5

Add EpiAutoGP model support to process_loc_forecast

29776ae

Introduces the process_model_samples.epiautogp S3 method and interface from pipelines

Add n-threads argument to CLI for EpiAutoGP

2880265

Introduces a new --n-threads argument to specify the number of threads used for EpiAutoGP computations, defaulting to 1. This allows users to control parallelism directly from the command line.

change to relative imports

2cd5f8c

move utils and rename paths dataclass

004e67f

update epiautogp docstrings

6c746e2

caught anti-pattern

f22e216

Update pipelines/epiautogp/process_epiautogp_forecast.py

a06e6c1

Co-authored-by: Copilot <[email protected]>

remove redundant files

abd6f33

SamuelBrand1 force-pushed the 780-add-forecast_epiautogp-function branch from a645bea to abd6f33 Compare December 30, 2025 09:49

damonbayer mentioned this pull request Jan 5, 2026

Create a generic ForecastPipelineContext class #814

Open

damonbayer approved these changes Jan 5, 2026

View reviewed changes

SamuelBrand1 merged commit c52d745 into epiautogp Jan 5, 2026
8 checks passed

SamuelBrand1 deleted the 780-add-forecast_epiautogp-function branch January 5, 2026 21:55

SamuelBrand1 mentioned this pull request Jan 6, 2026

add forecast_epiautogp function #780

Closed

Issue 780: Adds forecast_epiautogp #783

Issue 780: Adds forecast_epiautogp #783

Uh oh!

Conversation

SamuelBrand1 commented Dec 12, 2025

forecast_epiautogp.py

End to end testing

Uh oh!

codecov bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

damonbayer left a comment

Choose a reason for hiding this comment

Uh oh!

SamuelBrand1 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

damonbayer commented Dec 15, 2025

Uh oh!

SamuelBrand1 commented Dec 15, 2025

Uh oh!

SamuelBrand1 commented Dec 17, 2025

Uh oh!

SamuelBrand1 commented Dec 17, 2025

Uh oh!

SamuelBrand1 commented Dec 18, 2025

Uh oh!

damonbayer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Issue 780: Adds `forecast_epiautogp` #783

Issue 780: Adds `forecast_epiautogp` #783

`forecast_epiautogp.py`

codecov bot commented Dec 12, 2025 •

edited

Loading

SamuelBrand1 commented Dec 15, 2025 •

edited

Loading