Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
7846849
Add shared forecast pipeline utilities and tests
SamuelBrand1 Dec 11, 2025
e74e28f
add Parquet dep
SamuelBrand1 Dec 11, 2025
a779507
reduce docstring bloat
SamuelBrand1 Dec 11, 2025
5960f79
Add PipelineOutput support for pipeline forecasts
SamuelBrand1 Dec 11, 2025
4d933e2
Add DEFAULT_TARGET_LETTER and update output filenames
SamuelBrand1 Dec 11, 2025
bb40a53
move utils and rename paths dataclass
SamuelBrand1 Dec 12, 2025
badc8a8
Add use_percentage flag to EpiAutoGPInput and output logic
SamuelBrand1 Dec 12, 2025
4de5d71
Refactor EpiAutoGP pipeline and add end-to-end tests
SamuelBrand1 Dec 12, 2025
8f7ebe9
Update .gitignore
SamuelBrand1 Dec 12, 2025
f02fdd7
Refactor EpiAutoGP post-processing into utility function
SamuelBrand1 Dec 12, 2025
31c16a8
Refactor forecast utils to use context methods
SamuelBrand1 Dec 12, 2025
ef8d206
Update README.md
SamuelBrand1 Dec 12, 2025
b97289a
Add frequency to input and generalize forecast horizon
SamuelBrand1 Dec 12, 2025
9310d7f
Add ed_visit_type to input and output handling
SamuelBrand1 Dec 12, 2025
d15d3d2
Add ed_visit_type param for NSSP/ED visit modeling
SamuelBrand1 Dec 12, 2025
092f300
Add daily NSSP forecast tests and support for ED visit type
SamuelBrand1 Dec 12, 2025
055b8d7
Refactor forecast utils tests and remove prep_epiautogp tests
SamuelBrand1 Dec 12, 2025
c91462b
update epiautogp docstrings
SamuelBrand1 Dec 15, 2025
1ac8d74
Update prep_epiautogp_data.py
SamuelBrand1 Dec 15, 2025
057743e
Update output.jl
SamuelBrand1 Dec 15, 2025
1e1bec3
add nhsn test coverage
SamuelBrand1 Dec 15, 2025
0ab2318
reorg unit tests
SamuelBrand1 Dec 15, 2025
233e8d4
caught anti-pattern
SamuelBrand1 Dec 15, 2025
c460d37
Update pipelines/epiautogp/process_epiautogp_forecast.py
SamuelBrand1 Dec 15, 2025
943ceca
explain use of percentage
SamuelBrand1 Dec 15, 2025
7735980
Refactor forecast post-processing to use hewr plotting
SamuelBrand1 Dec 17, 2025
1cf1106
remove redundant files
SamuelBrand1 Dec 17, 2025
842bfd5
update tests due to removed funcs
SamuelBrand1 Dec 17, 2025
29776ae
Add EpiAutoGP model support to process_loc_forecast
SamuelBrand1 Dec 18, 2025
9b438cc
Refactor to use generic model_name in forecast utilities
SamuelBrand1 Dec 18, 2025
93734c7
Improve sample file selection in process_loc_forecast.R
SamuelBrand1 Dec 18, 2025
2880265
Add n-threads argument to CLI for EpiAutoGP
SamuelBrand1 Dec 18, 2025
6032a3e
Add shared forecast pipeline utilities and tests
SamuelBrand1 Dec 11, 2025
2cd5f8c
change to relative imports
SamuelBrand1 Dec 11, 2025
c631f06
Add DEFAULT_TARGET_LETTER and update output filenames
SamuelBrand1 Dec 11, 2025
004e67f
move utils and rename paths dataclass
SamuelBrand1 Dec 12, 2025
081c193
Refactor EpiAutoGP pipeline and add end-to-end tests
SamuelBrand1 Dec 12, 2025
9a74547
Add ed_visit_type param for NSSP/ED visit modeling
SamuelBrand1 Dec 12, 2025
6c746e2
update epiautogp docstrings
SamuelBrand1 Dec 15, 2025
f22e216
caught anti-pattern
SamuelBrand1 Dec 15, 2025
a06e6c1
Update pipelines/epiautogp/process_epiautogp_forecast.py
SamuelBrand1 Dec 15, 2025
abd6f33
remove redundant files
SamuelBrand1 Dec 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -395,7 +395,7 @@ private_data/*

# Ignore end to end test output
pipelines/tests/end_to_end_test_output/*
pipelines/tests/epiautogp_test_output/*
pipelines/tests/epiautogp_end_to_end_test_output/*

# subdirs reserved for mounting blob storage containers
config
Expand Down
4 changes: 3 additions & 1 deletion EpiAutoGP/Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "EpiAutoGP"
uuid = "c2940010-6b35-4be1-8bbf-9fa0d9979e50"
authors = ["Sam Brand (USI1) <[email protected]>"]
version = "0.1.0"
authors = ["Sam Brand (USI1) <[email protected]>"]

[deps]
ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63"
Expand All @@ -11,6 +11,7 @@ Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
NowcastAutoGP = "7e9f7f4b-f590-4c14-8324-de4fcbed18f7"
Parquet = "626c502c-15b0-58ad-a749-f091afb673ae"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StructTypes = "856f2bd8-1eba-4b0a-8007-ebc267875bd4"
Expand All @@ -27,6 +28,7 @@ Dates = "1.11.0"
JSON3 = "1.14.3"
Logging = "1.11.0"
NowcastAutoGP = "0.3.0"
Parquet = "0.8.6"
Random = "1.11.0"
Statistics = "1.11.1"
StructTypes = "1.11.0"
Expand Down
2 changes: 1 addition & 1 deletion EpiAutoGP/run.jl
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ function main()

# Create hubverse-compatible output
@info "Creating hubverse-compatible forecast output..."
output_type = QuantileOutput() # Use default quantile levels
output_type = PipelineOutput() # Use default quantile levels

hubverse_df = create_forecast_output(
input_data,
Expand Down
7 changes: 6 additions & 1 deletion EpiAutoGP/src/EpiAutoGP.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module EpiAutoGP
using NowcastAutoGP # Core modeling package
using CSV, DataFramesMeta, Dates, JSON3, StructTypes # Data handling packages
using CSV, DataFramesMeta, Dates, JSON3, StructTypes, Parquet # Data handling packages
using ArgParse # Command-line argument parsing
using Statistics # For modeling functions

Expand All @@ -23,6 +23,7 @@ export prepare_for_modelling,
export AbstractForecastOutput,
AbstractHubverseOutput,
QuantileOutput,
PipelineOutput,
create_forecast_df,
create_forecast_output

Expand All @@ -36,6 +37,10 @@ const DEFAULT_TARGET_DICT = Dict(
"nhsn" => "hosp",
"nssp" => "prop ed visits"
)
const DEFAULT_TARGET_LETTER = Dict(
"nhsn" => "h",
"nssp" => "e"
)
const DEFAULT_GROUP_NAME = "CFA"
const DEFAULT_MODEL_NAME = "EpiAutoGP"

Expand Down
82 changes: 7 additions & 75 deletions EpiAutoGP/src/input.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,33 +12,24 @@ with nowcasting requirements and forecast parameters.
- `reports::Vector{Real}`: Vector of case counts/measurements corresponding to each date
- `pathogen::String`: Disease identifier (e.g., "COVID-19", "Influenza", "RSV")
- `location::String`: Geographic location identifier (e.g., "CA", "NY", "US")
- `target::String`: Target data type (e.g., "nssp", "nhsn")
- `frequency::String`: Temporal frequency of data ("daily" or "epiweekly")
- `use_percentage::Bool`: Whether data represents percentage values
- `ed_visit_type::String`: Type of ED visits ("observed" or "other"), only applicable for NSSP target
- `forecast_date::Date`: Reference date from which forecasting begins, often this will be a nowcast date
- `nowcast_dates::Vector{Date}`: Dates requiring nowcasting (typically recent dates with incomplete data)
- `nowcast_reports::Vector{Vector{Real}}`: Uncertainty bounds or samples for nowcast dates

# Examples
```julia
# Create a simple input dataset
data = EpiAutoGPInput(
[Date("2024-01-01"), Date("2024-01-02"), Date("2024-01-03")],
[45.0, 52.0, 38.0],
"COVID-19",
"CA",
Date("2024-01-03"),
[Date("2024-01-02"), Date("2024-01-03")],
[[50.0, 52.0, 54.0], [36.0, 38.0, 40.0]]
)

# Validate the input
validate_input(data) # returns true if valid
```
"""
struct EpiAutoGPInput
dates::Vector{Date}
reports::Vector{Real}
pathogen::String
location::String
target::String
frequency::String
use_percentage::Bool
ed_visit_type::String
forecast_date::Date
nowcast_dates::Vector{Date}
nowcast_reports::Vector{Vector{Real}}
Expand All @@ -65,30 +56,6 @@ Performs comprehensive validation including:

# Returns
- `Bool`: Returns `true` if validation passes

# Throws
- `ArgumentError`: If any validation check fails, with descriptive error message

# Examples
```julia
# Valid data passes validation
valid_data = EpiAutoGPInput(
[Date("2024-01-01"), Date("2024-01-02")],
[45.0, 52.0],
"COVID-19", "CA", Date("2024-01-02"),
Date[], Vector{Real}[]
)
validate_input(valid_data) # returns true

# Invalid data throws ArgumentError
invalid_data = EpiAutoGPInput(
[Date("2024-01-01")],
[-5.0], # negative values not allowed
"COVID-19", "CA", Date("2024-01-01"),
Date[], Vector{Real}[]
)
validate_input(invalid_data) # throws ArgumentError
```
"""
function validate_input(data::EpiAutoGPInput; valid_targets = ["nhsn", "nssp"])
@assert data.target in valid_targets "Target must be one of $(valid_targets), got '$(data.target)'"
Expand Down Expand Up @@ -221,41 +188,6 @@ end
read_and_validate_data(path_to_json::String) -> EpiAutoGPInput

Read epidemiological data from JSON file with automatic validation.

This is the recommended function for loading input data in production workflows.
It combines [`read_data`](@ref) and [`validate_input`](@ref) to ensure that
loaded data is both structurally correct and passes all validation checks.

# Arguments
- `path_to_json::String`: Path to the JSON file containing input data

# Returns
- `EpiAutoGPInput`: Validated data structure ready for modeling

# Throws
- `SystemError`: If the file cannot be read
- `JSON3.StructuralError`: If JSON structure is invalid
- `ArgumentError`: If data fails validation checks

# Examples
```julia
# Load and validate data in one step
data = read_and_validate_data("epidata.json")

# This is equivalent to:
data = read_data("epidata.json")
validate_input(data)

# Use in a try-catch block for error handling
try
data = read_and_validate_data("uncertain_data.json")
println("Data loaded successfully")
catch e
@error "Failed to load data" exception=e
end
```

See also: [`read_data`](@ref), [`validate_input`](@ref), [`EpiAutoGPInput`](@ref)
"""
function read_and_validate_data(path_to_json::String)
data = read_data(path_to_json)
Expand Down
61 changes: 13 additions & 48 deletions EpiAutoGP/src/modelling.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
prepare_for_modelling(input::EpiAutoGPInput, transformation_name::String, n_forecast_weeks::Int, n_forecasts::Int) -> NamedTuple
prepare_for_modelling(input::EpiAutoGPInput, transformation_name::String, n_ahead::Int, n_forecasts::Int) -> NamedTuple

Prepare all data and configuration needed for NowcastAutoGP modeling.

Expand All @@ -9,7 +9,7 @@ formats nowcast data for the modeling pipeline, and calculates forecast dates an
# Arguments
- `input::EpiAutoGPInput`: The input data structure containing dates, reports, and nowcast information
- `transformation_name::String`: Name of transformation to apply ("boxcox", "positive", "percentage")
- `n_forecast_weeks::Int`: Number of weeks to forecast into the future
- `n_ahead::Int`: Number of time steps (days or epiweeks) to forecast into the future
- `n_forecasts::Int`: Total number of forecast samples desired

# Returns
Expand All @@ -21,15 +21,9 @@ A NamedTuple containing:
- `n_forecasts_per_nowcast::Int`: Number of forecast samples per nowcast scenario
- `transformation::Function`: Forward transformation function
- `inv_transformation::Function`: Inverse transformation function

# Examples
```julia
input = EpiAutoGPInput(...)
model_setup = prepare_for_modelling(input, "boxcox", 4, 1000)
```
"""
function prepare_for_modelling(input::EpiAutoGPInput, transformation_name::String,
n_forecast_weeks::Int, n_forecasts::Int)
n_ahead::Int, n_forecasts::Int)
# Extract stable confirmed data, excluding recent uncertain dates with nowcasts
stable_data_idxs = findall(d -> !(d in input.nowcast_dates), input.dates)
stable_data_dates = input.dates[stable_data_idxs]
Expand All @@ -47,8 +41,10 @@ function prepare_for_modelling(input::EpiAutoGPInput, transformation_name::Strin
# Create nowcast data structure when nowcasts exist
create_nowcast_data(input.nowcast_reports, input.nowcast_dates; transformation)

# Calculate forecasting dates (starting from forecast_date and going up to n_forecast_weeks ahead)
forecast_dates = [input.forecast_date + Week(i) for i in 0:n_forecast_weeks]
# Calculate forecasting dates (starting from forecast_date and going n_ahead time steps forward)
# Use Day or Week based on frequency
time_step = input.frequency == "epiweekly" ? Week(1) : Day(1)
forecast_dates = [input.forecast_date + i * time_step for i in 0:n_ahead]

# Calculate number of forecasts per nowcast sample
n_forecasts_per_nowcast = isnothing(nowcast_data) ?
Expand Down Expand Up @@ -84,14 +80,6 @@ combination with nowcast scenarios.

# Returns
- Fitted AutoGP model ready for forecasting

# Examples
```julia
dates = [Date(2024,1,1), Date(2024,1,8), Date(2024,1,15)]
values = [100.0, 120.0, 95.0]
transform_func, _ = get_transformations("boxcox", values)
model = fit_base_model(dates, values; transformation=transform_func)
```
"""
function fit_base_model(dates::Vector{Date}, values::Vector{<:Real};
transformation::Function,
Expand Down Expand Up @@ -136,7 +124,7 @@ end

"""
forecast_with_epiautogp(input::EpiAutoGPInput;
n_forecast_weeks::Int=8,
n_ahead::Int=8,
n_forecasts::Int=20,
transformation_name::String="boxcox",
n_particles::Int=24,
Expand All @@ -153,7 +141,7 @@ This function implements the complete nowcasting and forecasting workflow:

# Arguments
- `input::EpiAutoGPInput`: The input data structure with dates, reports, and nowcast information
- `n_forecast_weeks::Int=8`: Number of weeks to forecast ahead from forecast_date
- `n_ahead::Int=8`: Number of time steps (days or epiweeks) to forecast ahead from forecast_date
- `n_forecasts::Int=20`: Total number of forecast samples to generate
- `transformation_name::String="boxcox"`: Data transformation type ("boxcox", "positive", "percentage")
- `n_particles::Int=24`: Number of SMC particles for GP model fitting
Expand All @@ -168,23 +156,9 @@ A NamedTuple containing:
- `forecast_date::Date`: The reference date for forecasting (from input.forecast_date)
- `location::String`: The location identifier (from input.location)
- `disease::String`: The disease name (from input.disease)

# Examples
```julia
# Basic forecasting
input = EpiAutoGPInput(...)
results = forecast_with_epiautogp(input)
forecast_dates, forecasts = results.forecast_dates, results.forecasts

# Custom parameters
results = forecast_with_epiautogp(input;
n_forecast_weeks=4,
n_forecasts=1000,
transformation_name="positive")
```
"""
function forecast_with_epiautogp(input::EpiAutoGPInput;
n_forecast_weeks::Int = 8,
n_ahead::Int = 8,
n_forecasts::Int = 20,
transformation_name::String = "boxcox",
n_particles::Int = 24,
Expand All @@ -193,7 +167,7 @@ function forecast_with_epiautogp(input::EpiAutoGPInput;
n_hmc::Int = 50)

# Prepare training data, nowcasting data and forecasting dates
model_info = prepare_for_modelling(input, transformation_name, n_forecast_weeks, n_forecasts)
model_info = prepare_for_modelling(input, transformation_name, n_ahead, n_forecasts)

# Fit base model on confirmed/stable data
base_model = fit_base_model(
Expand Down Expand Up @@ -234,26 +208,17 @@ with parsed command-line arguments to execute the full nowcasting and forecastin
- Same as forecast_with_epiautogp(): NamedTuple with forecast results and metadata

# Expected command-line arguments
- `"n-forecast-weeks"`: Number of weeks to forecast
- `"n-ahead"`: Number of time steps (days or epiweeks) to forecast
- `"n-forecast-draws"`: Total number of forecast samples
- `"transformation"`: Data transformation type
- `"n-particles"`: Number of SMC particles
- `"smc-data-proportion"`: SMC data proportion
- `"n-mcmc"`: Number of MCMC samples
- `"n-hmc"`: Number of HMC samples

# Examples
```julia
# Typical usage pattern
args = parse_arguments()
input_data = read_and_validate_data(args["json-input"])
results = forecast_with_epiautogp(input_data, args)
forecast_dates, forecasts = results.forecast_dates, results.forecasts
```
"""
function forecast_with_epiautogp(input::EpiAutoGPInput, args::Dict{String, Any})
return forecast_with_epiautogp(input;
n_forecast_weeks = args["n-forecast-weeks"],
n_ahead = args["n-ahead"],
n_forecasts = args["n-forecast-draws"],
transformation_name = args["transformation"],
n_particles = args["n-particles"],
Expand Down
Loading