-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathFluxCourseForecast.Rmd
583 lines (471 loc) · 22.2 KB
/
FluxCourseForecast.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
---
title: "Carbon Forecast"
author: "Flux Course 2023"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(compiler)
library(tidyverse)
library(mvtnorm)
library(EML)
#remotes::install_github("eco4cast/EFIstandards")
#library(EFIstandards)
source("R/functions.R")
source("R/utils.R")
## configuration
#### SET THE ENSEMBLE SIZE
ne = 500 ## production run should be 200 - 5000, depending on what your computer can handle
timestep = 1800 #seconds
start_date = as.Date("2015-07-01")
horiz_calib = 1 #day forecast horizon during calibration
horiz = 35 #days, forecast horizon during forecast
outdir = "./" ## output directory for saving files
```
# Why Forecast
* Accelerate Science
* Improve Decision Making
# What: NEON Forecast Challenge Flux Forecast
* 35 days ahead (NOAA GEFS ensemble weather forecast)
* subdaily timestep
* NEON fluxes updated daily (5 day latency)
* Able to submit daily
More info is available at https://ecoforecast.org/efi-rcn-forecast-challenges/
# How: Forecast Analysis Cycle
* Forecast step -> Uncertainty propagation
* Analysis step -> Bayes' theorem
If you liked the Forecast-Analysis Cycle you may also like...
* Dietze M. 2017. Ecological Forecasting. Princeton University Press
* Dietze et al 2018. Iterative near-term ecological forecasting: Needs, opportunities, and challenges. PNAS 115 (7) 1424-1432 https://doi.org/10.1073/pnas.171023111
# Super Simple Ecosystem Model
Let's begin by defining our model itself, as well as a number of ancillary
functions that will be useful in simulation and analysis. The model below is
very simple but is complex enough to have some chance at capturing observed
variability. In addition, unlike most ecosystem models, it explicitly contains
process error. The model has three state variables (X) that are all expressed in
terms of carbon (Mg/ha): Leaf Biomass, Structural Biomass (wood, roots, etc),
and soil organic matter (SOM). The model also has only two drivers:
photosynthetically active radiation (PAR), and air temperature. Within the model we first
estimate LAI from Leaf Biomass and SLA. Using LAI and light we estimate GPP
using a simple light use efficiency approach. GPP is then allocated to
autotrophic respiration (Ra), leaf NPP, and woody NPP. These leaf and wood
biomass pools can then turns over into SOM as litterfall and Coarse Woody Debris / mortality.
Heterotrophic respiration is assumed to follow a standard Q10 temperature
sensitivity. Finally, Normal process error is added to X.
```{r}
SSEM.orig
```
# Initial Conditions
Having defined our model, the next step is to define the ensemble size and
generate an ensemble estimate of the initial state variables. To do so we'll use
the estimates that are reported in the Ameriflux BADM Meta-data files for the
site. Since we're only relying on two different estimates of pool size to
calculate our mean and standard deviation, and neither estimate has a reported
error, these should be taken as "demonstration only" rather than as
"Best practices". In a real application one would want to account for the
sampling error associated with the number of vegetation plots or soil cores
measured, the measurement error in the soil C and tree DBH, and the allometric
uncertainty in converting from DBH to leaf and stem biomass. In other words, our
pool sizes are likely a lot less certain than what we take them to be in this
exercise.
There are a few assumptions in developing the initial conditions
- The mean and standard deviation are only calculated using two observations.
As stated above, this is not best practice.
- The wood state is the combination of stems, coarse roots, and fine roots
- The SOM (soil) state is the combination of litter, downed coarse woody debris,
and soil.
- All values are converted from g/m2 to Mg/ha
```{r}
### Initial State (Mg/ha)
### These specific data points were extracted from the US-NR1 Ameriflux BADM files
### https://ameriflux.lbl.gov/sites/siteinfo/US-NR1
### METADATA: https://ameriflux.lbl.gov/data/badm/badm-standards/
### data is also available at https://bit.ly/3POkhq9
#library(readxl)
#badm = readxl::read_xlsx(file.path("AMF_AA-Flx_FLUXNET-BIF_CCBY4_20220606.xlsx")) %>% filter(SITE_ID == "US-NR1")
#write_csv(x=badm,file="data/US-NR1_BADM.csv")
#Bwood = badm %>% filter(VARIABLE == "AG_BIOMASS_TREE") %>% select(DATAVALUE) %>% type_convert() ## unsuccessful crack at automating extraction
Bwood = 14500 * 1e-6 * 10000 ## convert g/m2 -> Mg/ha
Bleaf = 2950 * 0.01 ## generates a LAI that is too high ***
SOM = c(1.57, 1.58) + c(0.49, 1.39) + c(2.06, 2.59) * 1e-3 * 10000 ## sum up litter, CWD, and soil; change units (US-ME2)
X = as.matrix(c(mean(Bleaf), mean(Bwood), mean(SOM)))
### sample initial condition ensemble members
if(ne > 1){
X = as.matrix(cbind(
rnorm(ne, X[1], Bleaf*0.1), ## no variability in data, assume 10%
rnorm(ne, X[2], Bwood*0.1), ## no variability in data, assume 10%
rnorm(ne, X[3], sd(SOM))))
}
X.orig = X ## make a copy so that we can run different experiments
## visualize initial condition priors
pool.lab = c("leaf", "wood", "SOM")
for(i in 1:3){
hist(X[, i], main = pool.lab[i])
}
```
# Drivers
```{r}
## data downloaded from FLUXNET2015 "fullset" product, subset to 2015 for this analysis
# flux <- read_csv("AMF_US-NR1_FLUXNET_FULLSET_HH_1998-2016_3-5.csv")
# flux <- flux %>% filter(date > as.Date("2015-01-01"))
# write_csv(x=flux,file= "data/US-NR1_Flux2015.csv")
flux <- read_csv("data/US-NR1_Flux2015.csv")
date = strptime(flux$TIMESTAMP_START,format="%Y%m%d%H%M")
inputs <- data.frame(
date = date,
temp = flux$TA_ERA, ## gap filled air temperature (Celsius)
PAR = flux$SW_IN_ERA / 0.486 ## gap filled PAR, conversion From Campbell and Norman p151
)
plot(inputs$date,inputs$PAR,type='l')
plot(inputs$date,inputs$temp,type='l')
```
# Parameters: Priors & Direct constraints (trait data)
```{r}
## ancillary data from Ameriflux BADM metadata
SLA = 1e3/c(193.7,205.1,237.7) ## m2/kg
litterfall = c(215.8)*0.01*3 ## gC/m2/yr->Mg/ha/yr
### initial params
params = list()
## univariate priors: expert opinion
params$alpha = rlnorm(ne, log(0.02), 0.05) ## light use efficiency
params$Q10 = rnorm(ne, 2.1, 0.1) ## soil respiration Q10
params$Rbasal = rlnorm(ne, log(0.2), 1) / (params$Q10^2.5) ## Soil basal respiration (umol/m2/sec per Mg/ha of SOM)
## multivariate prior on allocation parameters
## assume that NPP is ~50% of GPP on average (Litton et al 2007)
Ra = 0.5
## prior mean on allocation, assume leaf NPP is 31.5% of total (Quaife et al 2008)
alloc = matrix(c(Ra, (1 - 0.315) * (1 - Ra), 0.315 * (1 - Ra)), 1)
## draw effective sample size to add stochasticity to prior
Neff = matrix(rpois(ne, 100), ne)
rdirichlet <- cmpfun(rdirichlet.orig) ## byte compile to speed up
## prior on fractional allocation to [Ra, wood, leaf] (each draw must sum to 1)
params$falloc = rdirichlet(ne,Neff%*%alloc)
## Process error: expert opinion
## convert gamma priors on precision into standard deviations
params$sigma.leaf = 1 / sqrt(rgamma(ne, 10, 10 * 0.01 ^ 2)) ## leaf biomass
params$sigma.stem = 1 / sqrt(rgamma(ne, 10, 10 * 0.01 ^ 2)) ## wood biomass
params$sigma.soil = 1 / sqrt(rgamma(ne, 10, 10 * 0.01 ^ 2)) ## soil carbon
## Specific leaf area
params$SLA = rnorm(ne, mean(SLA), sd(SLA))
## simulate litterfall turnover based on observed
## litterfall rate and Bleaf prior (initial condition)
if(length(litterfall)>1){
sdlitterfall = sd(litterfall)
} else {
sdlitterfall = 0.1*litterfall ## wild assumption
}
lit = rnorm(10000,mean(litterfall),sdlitterfall/sqrt(2))/
rnorm(10000,mean(X[,1]),sd(X[,1])/sqrt(2)) ## X1 = leaf biomass
## draw prior mean and sd; convert turnover per year -> turnover per timestep
lit.mu = rnorm(ne,mean(lit),sd(lit))*timestep/86400/365
lit.sd = 1/sqrt(rgamma(ne,10,10*var(lit)))*timestep/86400/365
litterfall.param = beta.match(lit.mu,lit.sd^2)
## match moments and draw litterfall prior
params$litterfall = rbeta(ne,litterfall.param$a,litterfall.param$b)
## draw prior mean based on background tree mortality rate of 1/142 per year (Dietze et al 2011)
mortality.mu = 1/rpois(ne,142)*timestep/86400/365
## draw prior sd assuming a 50% CV
mortality.sd = rbeta(ne,4,4)*mortality.mu*timestep/86400/365
## match moments and draw mortality prior
mortality.param = beta.match(mortality.mu,mortality.sd^2)
params$mortality = rbeta(ne,mortality.param$a,mortality.param$b)
## flatten to a data frame
params = as.data.frame(params)
```
### plot parameter values
```{r, fig.asp=1}
plot_params(params)
```
# Forecast Step: Monte Carlo Error Propagation
*Ensemble forecast function*
```{r}
ensemble_forecast
```
*Run first forecast*
```{r}
dayInputs = inputs %>% dplyr::filter(date >= start_date, date < start_date + lubridate::days(horiz_calib))
output.ensemble = ensemble_forecast(X = X.orig,
params = params,
inputs = dayInputs[,2:3]) ## don't feed date col'n to model
```
*Plot timeseries*
```{r}
plot_forecast(output.ensemble,sample=5)
```
If you like Ensemble Forecasting, you may also like...
* Alternative approaches to error propagation
* Analytical distributional transformations
* Analytical Moments
* Linear Tangent approximations
* Uncertainty partitioning
* Predictability analyses (e.g. error vs lead time)
* Dietze 2017 Prediction in ecology: a first-principles framework. Ecological Applications 27: 2048–2060. DOI: 10.1002/eap.1589
# Assessment
At the half-hourly or hourly resolution, the _QC variable indicates if the corresponding record is a measured value (_QC = 0) or the quality level of the gap-filling that was used for that record (_QC = 1, _QC = 3 worse quality).
```{r}
## model
ci = apply(output.ensemble[, ,6], 1, quantile, c(0.025, 0.5, 0.975)) ## forecast confidence interval
## variable 6 = NEP
## observations
today = which(date >= start_date & date < start_date + horiz_calib)
nep = -flux$NEE_VUT_REF[today] ## change sign convention
nep.qc = flux$NEE_VUT_REF_QC[today]
nep.unc = flux$NEE_VUT_REF_JOINTUNC[today]
## plot timeseries
ylim = range(c(range(ci),range(nep,na.rm=TRUE)))
plot(dayInputs$date,ci[2,],ylim=ylim,col=1,pch=18)
ciEnvelope(dayInputs$date, ci[1, ], ci[3, ], col = col.alpha("lightGrey", 0.5))
points(dayInputs$date,nep,col=nep.qc+1)
legend("topright",legend=c("model",paste("QC",0:2)),
pch = c(18,rep(1,3)),
col= c(1,1:3))
## predicted/observed plot
plot(ci[2,],nep,col=nep.qc+1)
abline(0,1,lty=2)
```
If you liked model assessment, you might also like...
* Model selection (AIC, DIC, wAIC, Predictive Loss)
* Model averaging
* Quantile-based model diagnostics & Continuous Rank Probability Score (CRPS)
# Analysis Step: Particle Filter
* What is a Likelihood?
* What is Bayes Theorem?
* How does it allow us to update model parameters and states?
* How does a Particle Filter work?
```{r}
ParticleFilter
```
```{r}
Analysis = list() ## storage for saving PF output
## data constraints, day 1
nep[nep.qc>0] = NA ## only use true observations, not gap-filled values
dat = data.frame(nep = nep,
sigma.nep = nep.unc
)
## run PF
Analysis[[1]] = ParticleFilter(output.ensemble,params,dat,wt = rep(1,ne))
# Save the Analysis
dir.create("demo_analysis",showWarnings = FALSE)
saveRDS(Analysis[[1]],file=file.path("demo_analysis",paste0(start_date,".RDS")))
```
If you liked the Particle Filter, you may also like...
* Markov Chain Monte Carlo (MCMC)
* JAGS, NIMBLE
* R's BayesianTools
* Hamiltonian Monte Carlo & STAN
* State Space Models
* Kalman Filter
* Ensemble Kalman Filter
* Using uncertainty analysis (Sobol, Monte Carlo, etc) to reduce the number of parameters considered
# Workflow
Now let's put the Forecast and Analysis steps into an iterative cycle so that we can calibrate the model
```{r}
ndays = 30 ## number of days to run the forecast
t0 = which(date == start_date) ## row number of time 0
forecast <- array(NA,c(86400/timestep*(ndays+1),ne,12)) ## output storage [time, ensemble, state]
forecast[1:dim(output.ensemble)[1],,] = output.ensemble ## store first forecast
for(t in 1:ndays){
today = which(date >= (start_date + t) & date < (start_date + horiz_calib + t))
# Today's forecast
out = ensemble_forecast(X = Analysis[[t]]$X, ## initial conditions = yesterday's Analysis
params = Analysis[[t]]$params, ## today's parameters = yesterday's Analysis
inputs = inputs[today,2:3]) ## today's subset of meteorology
forecast[today-t0+1,,] = out ## store today's forecast in overall array
# Today's data constraints
nep = -flux$NEE_VUT_REF[today]
nep.qc = flux$NEE_VUT_REF_QC[today]
nep[nep.qc>0] = NA
dat = data.frame(nep = nep,
sigma.nep = flux$NEE_VUT_REF_JOINTUNC[today]
)
# Today's analysis
Analysis[[t+1]] = ParticleFilter(out,Analysis[[t]]$params, dat,wt = Analysis[[t]]$wt)
# Save the Analysis
saveRDS(Analysis[[t+1]],file=file.path("demo_analysis",paste0(start_date+t+1,".RDS")))
# counter to help us know things are still running
print(t)
}
# Save the Analysis List
saveRDS(Analysis,file="AnalysisList.RDS")
## effective sample size over time
weights = sapply(Analysis,function(x){x$wt})
Neff = apply(weights,2,function(x){
wtn = x/sum(x)
return(1/sum(wtn^2))})
plot(Neff,type='l')
abline(v=which(Neff > 0.999*ne),lty=3)
```
# Forecast Visualizations
Timeseries for all variables
```{r}
plot_forecast(forecast)
```
Validation against NEP
```{r}
## Timeseries
days = which(date >= start_date & date < start_date + horiz_calib + ndays)
nep = -flux$NEE_VUT_REF[days]
nep.qc = flux$NEE_VUT_REF_QC[days]
ci = apply(forecast[, ,6], 1, quantile, c(0.025, 0.5, 0.975))
ylim = range(c(range(ci),range(nep,na.rm=TRUE)))
plot(date[days],ci[2,],ylim=ylim,col=1, pch=18,ylab="NEP",type="b")
#ciEnvelope(date[days], ci[1, ], ci[3, ], col = col.alpha("lightGrey", 0.5))
points(date[days],nep,col=nep.qc+1)
legend("topright",legend=c("model",paste("QC",0:2)),
pch = c(18,rep(1,3)),
col= c(1,1:3))
## predicted/observed
plot(ci[2,],nep,col=nep.qc+1,xlab="model")
abline(0,1,lty=2)
```
Compare initial and final parameters
```{r,fig.asp=1}
plot_params(Analysis[[t+1]]$params,params)
```
How did parameters evolve over time?
```{r}
## parameter timeseries
for(i in 1:12){
p = sapply(Analysis,function(x){
wtd.quantile(x$params[,i],x$wt,c(0.025,0.5,0.975))
})
time = start_date + 0:ndays
plot(time,p[2,],ylim=range(p),ylab=names(params)[i])
ciEnvelope(time,p[1,],p[3,],col=col.alpha("lightblue",0.5))
}
```
# EFI Standard: output and metadata
To be able to submit the forecast to the NEON challenge, we need to reorganize the output into community standard (long data frame) and generate metadata about the forecast
```{r}
## reorganize data in long format
dimnames(output.ensemble) <- list(as.character(date[today]),as.character(1:ne),varnames) ## label array dimensions
fx = as.data.frame.table(output.ensemble) ## reorganize into long foremat
colnames(fx) = c("datetime","ensemble","variable","prediction") ## label columns
fx_file = file.path(outdir,paste0("terrestrial_30min-",start_date,"-SSEM.csv")) ## output filename
write_csv(fx,fx_file)
head(fx)
## metadata
## define variable names, units, etc
attributes <- tibble::tribble(
~attributeName, ~attributeDefinition, ~unit, ~formatString, ~numberType, ~definition,
"time", "[dimension]{time}", "year", "YYYY-MM-DD HH:MM:SS", "numberType", NA,
"ensemble", "[dimension]{index of ensemble member}", "dimensionless", NA, "integer", NA,
)
variables = cbind(varnames,rep("[variable]",12),units,rep(NA,12),rep("real",12),rep(NA,12))
colnames(variables) = colnames(attributes)
attributes <- rbind(attributes,variables)
attributes
attrList <- set_attributes(attributes,
col_classes = c("Date", "numeric", rep("numeric",nrow(variables))))
## sets metadata about the file itself (name, file type, size, MD5, etc)
physical <- set_physical(fx_file, recordDelimiter='\n')
## set metadata for the file as a whole
dataTable <- eml$dataTable(
entityName = "forecast", ## this is a standard name
entityDescription = "Carbon cycle forecast for US-NR1",
physical = physical,
attributeList = attrList)
me <- list(individualName = list(givenName = "Mike",
surName = "Dietze"),
electronicMailAddress = "[email protected]",
id = "https://orcid.org/0000-0002-2324-2518")
coverage <- set_coverage(begin = first(date[days]),
end = last(date[days]),
geographicDescription = "Niwot Ridge, CO, USA ",
west = -105.5464, east = -105.5464,
north = 40.0329, south = 40.0329)
dataset = eml$dataset(
title = "A simple carbon cycle forecast",
creator = me,
contact = list(references="https://orcid.org/0000-0002-2324-2518"),
pubDate = start_date,
intellectualRights = "",
abstract = "An illustration of how we might use a particle filter to constrain a simple carbon forecast",
dataTable = dataTable,
coverage = coverage
)
## EFI specific forecast metadata
additionalMetadata <- eml$additionalMetadata(
metadata = list(
forecast = list(
## Basic elements
timestep = paste(timestep,"seconds"),
forecast_horizon = paste(days,"days"),
start_time = start_date,
iteration_id = Sys.time(),
project_id = "Flux Course 2022",
metadata_standard_version = "0.4",
model_description = list(
model_id = "SSEM",
name = "Super Simple Ecosystem Model",
type = "process-based",
repository = "https://github.com/ecoforecast/EFActivites"
),
## MODEL STRUCTURE & UNCERTAINTY CLASSES
initial_conditions = list(
# Possible values: absent, present, data_driven, propagates, assimilates
status = "assimilates",
# Number of parameters / dimensionality
complexity = 3
),
drivers = list(
status = "data_driven",
complexity = 2
),
parameters = list(
status = "assimilates",
complexity = 9
),
random_effects = list(
status = "absent"
),
process_error = list(
status = "assimilates",
propagation = list(
type = "ensemble", # ensemble vs analytic
size = ne # required if ensemble
),
complexity = 3,
covariance = FALSE
),
obs_error = list(
status = "data_driven",
complexity = 1,
covariance = FALSE
)
) # forecast
) # metadata
) # eml$additionalMetadata
my_eml <- eml$eml(dataset = dataset,
additionalMetadata = additionalMetadata,
packageId = Sys.time() ,
system = "datetime" ## system used to generate packageId
)
## check that the EML is also a valid EFI forecast
#EFIstandards::forecast_validator(my_eml)
meta_file = file.path(outdir,paste0("terrestrial_30min_",start_date,"_SSEM.xml"))
write_eml(my_eml, meta_file)
```
# Submission
example of how you would submit the forecast to the EFI NEON challenge (by default, doesn't actually run)
```
neon4cast::submit(forecast_file = fx_file, metadata = NULL, ask = FALSE)
```
# Run an actual forecast!
* The script forecast.R runs the forecast into the future using forecast meteorology (EFI provides NOAA GEFS 35 day, 31 ensemble)
* To set up, copy the date-stamped Analysis RDS file that you want to start off from (state and parameter initial conditons) into the `analysis` folder and change the date to the date you want to start forecasting from
* Met variables are now matrices of ensemble members that need to be sampled over!
* Every day we predict 35 days into the future, so we end up forecasting every date 35 times (with different lead times) rather than just once as we did above.
* Because of the latency in the NEON data we alse end up "reforecasting" the past few days to catch and assimilate any new observations that may have shown up since our last forecast
# Workflow automation
The first part of this file (on: workflow dispatch) used cron to schedule the workflow. The forecast in this repository is designed to run daily at 20:00 UTC. The execution of the forecast occurs on GitHub's servers, so your local computer does not need to be turned on. In ".github/workflow/do_prediction.yml", the lines `-cron: "* 20 * *"` define the time that the forecast is run. In this case it is run each day at 20:00:00 UTC (note all GitHub timings are on UTC). You can update this to run on a different schedule based on timing codes found in https://crontab.guru
The second part of this file (jobs) is used to grab a copy of a Docker container image, eco4cast/rocker-neon4cast, that has R and a large number of R packages pre-installed, including the NEON forecast challenge packages.
The final part of this file (run) tells Github Actions what Rscript it should run
* Automate to run every day!
See https://github.com/eco4cast/neon4cast-example for example. To do so you'd want to save the calibration we've done here and set up an R script that just runs the Forecast Analysis cycle for a specific start date.
A video providing more information about how to use GitHub actions for automated forecast generation can be found here: https://youtu.be/dMrUlXi4_Bo
# Next steps
* Additional sites: EFI provides met and targets for all 47 NEON tower sites. Extending beyond these is possible but required nontrivial time spent on data processing.
* Additional data constraints to the Particle Filter's Likelihood: Things that work particularly well are sensor-based observations that become available frequently (e.g., MODIS LAI, SMAP, soil respiration)
* Modify/improve the model: Lots of assumptions in SSEM are known to be false!
* Longer calibration (do parameters vary by season? by year?)