Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 6, 2026

Per feedback, this PR extracts a generic date range parsing utility function to pipelines/common_utils.py for reuse across multiple forecasting models. This addresses the need for date exclusion functionality mentioned in issue #409 for PyRenew models.

Changes

  • Added parse_exclude_date_ranges() to pipelines/common_utils.py
    • Generic utility function for parsing comma-separated date range strings
    • Format: "2024-01-15:2024-01-20,2024-03-01:2024-03-07"
    • Returns list[tuple[date, date]] of inclusive date ranges
    • Includes comprehensive docstring with parameter descriptions, return types, exceptions, and usage examples
    • Can be used by any forecasting model (PyRenew, EpiAutoGP, timeseries, etc.) to exclude periods with known reporting problems

Usage

from pipelines.common_utils import parse_exclude_date_ranges

# Parse single range
ranges = parse_exclude_date_ranges("2024-01-15:2024-01-20")
# Returns: [(datetime.date(2024, 1, 15), datetime.date(2024, 1, 20))]

# Parse multiple ranges
ranges = parse_exclude_date_ranges("2024-01-15:2024-01-20,2024-03-01:2024-03-07")
# Returns: [(datetime.date(2024, 1, 15), datetime.date(2024, 1, 20)),
#           (datetime.date(2024, 3, 1), datetime.date(2024, 3, 7))]

# Handle None or empty input
ranges = parse_exclude_date_ranges(None)
# Returns: None

The function includes robust validation:

  • Validates date range format (must contain : separator)
  • Validates date format (must be YYYY-MM-DD)
  • Validates that start_date <= end_date
  • Provides clear error messages for invalid input

Future Use

This utility will be used in a separate PR to implement the period exclusion functionality for EpiAutoGP (related to the original issue about extra data options for epiautogp data conversion). The generic location in common_utils.py allows any forecasting model to benefit from this functionality.

Original prompt

This section details on the original issue you should resolve

<issue_title>Extra data options for epiautogp data conversion</issue_title>
<issue_description>EpiAutoGP has a different scientific logic to pyrenew in that its based on Gaussian processes (GPs) rather than models defined with sequential data conditioning (e.g. classic times series models like ARIMA or renewal epi models).

A few options that should get added to the data structure that lands with #756 in a follow up PR:

  • Weekly only data: if the target is weekly, then the signal should have the option of being weekly only. I think its possible that EpiAutoGP could actually work better with daily data (e.g. catch mid-week reporting that shows a drop before the whole week is reported) but there should be an option to fit on weekly since the time steps are irrelevant to GP models.
  • Much longer data horizon: The point of EpiAutoGP is to try and infer an ensemble of possible temporal autocovariance kernels to explain and forecast the data, doing this on most recent 90/150 day data is not optimal.
  • Period data drops: We know some periods of the data are misleading (known reporting problems) so there should be an option to drop all data from certain dates. GPs don't need regular sequential data.</issue_description>

<agent_instructions>Read the epiautogp submodule of pipelines carefully before planning</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Add extra data options for EpiAutoGP data conversion Add date exclusion parameter for EpiAutoGP data conversion Jan 6, 2026
Copilot AI requested a review from SamuelBrand1 January 6, 2026 11:08
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 82.25806% with 22 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (epiautogp@9d54d6c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pipelines/epiautogp/forecast_epiautogp.py 0.00% 22 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             epiautogp     #820   +/-   ##
============================================
  Coverage             ?   48.64%           
============================================
  Files                ?       36           
  Lines                ?     3316           
  Branches             ?        0           
============================================
  Hits                 ?     1613           
  Misses               ?     1703           
  Partials             ?        0           
Flag Coverage Δ
hewr 36.84% <ø> (?)
pipelines 50.68% <82.25%> (?)
pyrenew_hew 62.29% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SamuelBrand1 SamuelBrand1 marked this pull request as ready for review January 6, 2026 13:57
Copy link
Collaborator

@SamuelBrand1 SamuelBrand1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this this was good first pass!

@SamuelBrand1
Copy link
Collaborator

SamuelBrand1 commented Jan 6, 2026

@damonbayer @dylanhmorris I've looked at this and I think the test covers the required behaviour and the code looks ok to me and not too bloat.

For context, the "exclude some periods" behaviour I use for EpiAutoGP rather than excluding some most recent dates because its suits the nature of the model. Atm, I do this with epifusion.DataLoader options in the cfa-mech-experiment repo but this does the same via CL parsing. I think the AI solution of having : and , and using them for splits and tuples seems good.

@damonbayer
Copy link
Collaborator

We have wanted to enable something like this for some time in the PyRenew models: #409. Perhaps we should commit some of the relevant parts (like _parse_exclude_date_ranges) to a more generic location (like pipelines/common_utils.py) on main first?

@SamuelBrand1
Copy link
Collaborator

@copilot Can you change the base target of this branch to main and implement _parse_exclude_date_ranges and any other generic utility functions into pipelines/common_utils.py. I will do a later PR to use them in epiautogp. If you can't do this let us know promptly

@SamuelBrand1
Copy link
Collaborator

We have wanted to enable something like this for some time in the PyRenew models: #409. Perhaps we should commit some of the relevant parts (like _parse_exclude_date_ranges) to a more generic location (like pipelines/common_utils.py) on main first?

@copilot pay attention to this comment when addressing the previous instruction.

Extracted the generic date range parsing utility function from the
epiautogp-specific implementation and moved it to pipelines/common_utils.py
where it can be reused by other forecasting models (e.g., PyRenew models
as mentioned in issue #409).

The function parses comma-separated date ranges in 'start:end' format
(e.g., "2024-01-15:2024-01-20,2024-03-01:2024-03-07") and returns a list
of (start_date, end_date) tuples. This will enable multiple models to
exclude periods with known reporting problems.

The epiautogp-specific implementation will be added in a later PR.

Co-authored-by: SamuelBrand1 <[email protected]>
Copy link
Contributor Author

Copilot AI commented Jan 6, 2026

We have wanted to enable something like this for some time in the PyRenew models: #409. Perhaps we should commit some of the relevant parts (like _parse_exclude_date_ranges) to a more generic loca...

I've moved parse_exclude_date_ranges to pipelines/common_utils.py and reverted all epiautogp-specific changes as requested. The generic utility function is now available for reuse by PyRenew models (ref: #409) and will be used by EpiAutoGP in a later PR. See commit 1f81c2e.

Copilot AI changed the title Add date exclusion parameter for EpiAutoGP data conversion Add generic date range parsing utility to common_utils Jan 6, 2026
@SamuelBrand1
Copy link
Collaborator

I'm not confident in copilot refiddling this PR, so I'm closing it in favour of #824 and then go past the epiautogp specific usage again if that is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants