DTC Sandpit Challenge: methods for estimating the probability of a major outbreak
Table of Contents
- Simulate – done
- Analytic – upload the cell in rough-work collab to GitHub without the sliders (input params)
- ML – write code that uses the simulated data and include plots (train 4 separate classifiers and save the models)
- Write unit tests (ask Matthew) to cover as many lines as you can
- Create CI workflow in
.github/workflows/ci.yml- GitHub Actions
- Code Coverage
-
Create tests in
tests/test_name.py- Test all
.pyfiles (test each Method separately + IO) - Use PyTest
- Test all
-
Add Read the Docs documentation: https://docs.readthedocs.com/platform/stable/intro/add-project.html#manually-import-your-docs
Input: first k weeks of infectious cases, e.g. k[0:3] of k = [1,2,6,8,...].
Output: a CSV file simulated_cases.csv with case number entries; columns are days, e.g. day_1, day_2, day_3, ...
- Consider using the
tempfilemodule rather than saving to the user directory every time.
Input:
- the first
kdays worth of simulated infection data fromsimulated_cases.csv - estimated range for the reproduction number
Output:
- The conditional probability
P([I1,I2,I3] | R) - Outbreak probability given first three cases
P(PMO | [I1,I2,I3]) - Outbreak probability given reproduction number
P(PMO | R) - Overall outbreak probability:
(conditional probability) × (outbreak probability given reproduction number)
What to do:
- Numerically compute the integral for the serial interval distribution
- Compute the expected number of new cases
Input:
- Sequence of case counts
Output:
- All trajectories of cases where the first
kdays of simulated data match the observed sequence - Outbreak probability: fraction of those trajectories classified as major outbreaks
Input:
- an observed input sequence of early case counts, e.g.
data = [1,2,6] = k[0:3] - ML model(s) trained on simulated trajectories
Output:
- predicted outbreak probability (and model metrics); saved model files (TBD)