-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathprocess-model-forecast-evaluation.qmd
132 lines (103 loc) · 5.13 KB
/
process-model-forecast-evaluation.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# Analyzing forecasts {#sec-analysis}
The scores for the forecast generated by the book have been found in the [catalog](https://radiantearth.github.io/stac-browser/#/external/raw.githubusercontent.com/eco4cast/neon4cast-ci/main/catalog/scores/models/model_items/bookcast_forest.json). The catalog provides metadata and code that is used for data access.
## Analyzing submitted forecasts
```{r}
#| echo: false
#| message: false
library(tidyverse)
library(lubridate)
```
The code to download all forecasts generated by the model used in this book is:
```{r}
my_results <- arrow::open_dataset("s3://anonymous@bio230014-bucket01/challenges/scores/bundled-parquet/project_id=neon4cast/duration=P1D/variable=nee/model_id=bookcast_forest?endpoint_override=sdsc.osn.xsede.org")
df <- my_results |>
filter(site_id == "OSBS",
reference_datetime > as_date("2024-03-15")) |>
collect()
```
The code above can be found in the catalog link at the top of the page.
## Aggregated scores
We can look at the mean score for the process model but this provides very little context for the quality of forecast. It is more informative to compare the score to the score from other models.
```{r}
df |>
summarise(mean_crps = mean(crps, na.rm = TRUE))
```
## Comparing to baselines
We will benchmark our process model forecast against to two "naive" baselines of `climatology` and `persistenceRW`.
```{r}
all_results <- arrow::open_dataset("s3://anonymous@bio230014-bucket01/challenges/scores/bundled-parquet/project_id=neon4cast/duration=P1D/variable=nee?endpoint_override=sdsc.osn.xsede.org")
df_with_baselines <- all_results |>
filter(site_id == "OSBS",
reference_datetime > as_date("2024-03-15"),
model_id %in% c("bookcast_forest", "climatology", "persistenceRW")) |>
collect()
```
## Visualization
How do the forecasts look for a single `reference_datetime`
```{r}
#| warnings: false
df_with_baselines |>
filter(as_date(reference_datetime) == as_date("2024-10-01")) |>
ggplot(aes(x = datetime)) +
geom_ribbon(aes(ymin = quantile02.5, ymax = quantile97.5, fill = model_id), alpha = 0.3) +
geom_line(aes(y = median, color = model_id)) +
geom_point(aes(y = observation)) +
labs(y = "forecast") +
theme_bw()
```
## Aggregated scores
We can first look at the aggregated scores (all reference_datetime and datetime combinations). Importantly, the code below uses `pivot_wider` and `pivot_longer` to ensure we only include `datetime` values where all three models provided forecasts. Otherwise, there would be different periods from the three models in the aggregated score.
```{r}
df_with_baselines |>
select(model_id, crps, datetime, reference_datetime) |>
group_by(model_id, datetime, reference_datetime) |>
slice(1) |>
ungroup() |>
pivot_wider(names_from = model_id, values_from = crps) |>
na.omit() |>
pivot_longer(-c(datetime, reference_datetime), names_to = "model_id", values_to = "crps") |>
summarise(mean_crps = mean(crps), .by = c("model_id")) |>
ggplot(aes(x = model_id, y = mean_crps)) +
geom_bar(stat="identity") +
labs(y = "mean CRPS") +
theme_bw()
```
## By horizon
How does forecast performance change as forecasts extend farther in the future (increasing horizon), regardless of when the forecast was produced?
```{r}
df_with_baselines |>
group_by(model_id, datetime, reference_datetime) |>
slice(1) |>
ungroup() |>
mutate(horizon = as.numeric(datetime - reference_datetime) / 86400) |>
select(model_id, horizon, datetime, reference_datetime, crps) |>
pivot_wider(names_from = model_id, values_from = crps) |>
na.omit() |>
pivot_longer(-c(horizon, datetime, reference_datetime), names_to = "model_id", values_to = "crps") |>
summarize(mean_crps = mean(crps), .by = c("model_id", "horizon")) |>
ggplot(aes(x = horizon, y = mean_crps, color = model_id)) +
geom_line() |>
labs(y = "mean CRPS") +
theme_bw()
```
## By reference datetime
How does forecast performance vary across the dates that the forecasts are generated, regardless of horizon?
```{r}
df_with_baselines |>
select(model_id, datetime, reference_datetime, crps) |>
group_by(model_id, datetime, reference_datetime) |>
slice(1) |>
ungroup() |>
pivot_wider(names_from = model_id, values_from = crps) |>
na.omit() |>
pivot_longer(-c(datetime, reference_datetime), names_to = "model_id", values_to = "crps") |>
summarize(mean_crps = mean(crps), .by = c("model_id", "reference_datetime")) |>
ggplot(aes(x = reference_datetime, y = mean_crps, color = model_id)) +
geom_line() +
labs(y = "mean CRPS") +
theme_bw()
```
## Additional comparisons
Forecasts can be compared across `site_id` (aggregating across all `reference_datetime` and `horizon`) if there are multiple sites and `datetime` (aggregating across all `horizon`). Since CRPS is in the naive units of the variable, it can not be compared across variables.
## Reading
Lewis, A. S. L., Woelmer, W. M., Wander, H. L., Howard, D. W., Smith, J. W., McClure, R. P., et al. (2022). Increased adoption of best practices in ecological forecasting enables comparisons of forecastability. Ecological Applications, 32(2), e02500. <https://doi.org/10.1002/eap.2500>