diff --git a/NEWS.md b/NEWS.md
index 935c955b..2a6a8cd9 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -15,6 +15,7 @@
* Arguments `use_stan`, `jags_path`, `data_train`, `data_test`, `adapt_delta`, `max_treedepth` and `drift` have been removed from primary functions to streamline documentation and reflect the package's mission to deprecate 'JAGS' as a suitable backend. Both `adapt_delta` and `max_treedepth` should now be supplied in a named `list()` to the new argument `control`
## Bug fixes
+* Updates to ensure `ensemble` provides appropriate weighting of forecast draws (#98)
* Not necessarily a "bug fix", but this update removes several dependencies to lighten installation and improve efficiency of the workflow (#93)
* Fixed a minor bug in the way `trend_map` recognises levels of the `series` factor
* Bug fix to ensure `lfo_cv` recognises the actual times in `time`, just in case the user supplies data that doesn't start at `t = 1`. Also updated documentation to better reflect this
diff --git a/R/ensemble.R b/R/ensemble.R
index 84381e9b..3e1dd832 100644
--- a/R/ensemble.R
+++ b/R/ensemble.R
@@ -102,31 +102,56 @@ ensemble.mvgam_forecast <- function(object, ..., ndraws = 5000){
# End of checks; now proceed with ensembling
n_series <- length(models[[1]]$series_names)
- # Function to random sample rows of a matrix with
- # replacement (in case some forecasts contain fewer draws than others)
- subsamp <- function(x, nsamps){
- if(NROW(x) < nsamps){
- sampinds <- sample(1:NROW(x), nsamps, replace = TRUE)
- } else {
- sampinds <- sample(1:NROW(x), nsamps, replace = FALSE)
- }
-
- x[sampinds, ]
- }
+ # Calculate total number of forecast draws to sample from for each model
+ n_mod_draws <- lapply(seq_len(n_models), function(x){
+ NROW(models[[x]]$forecasts[[1]])
+ })
- # Create evenly weighted ensemble hindcasts and forecasts
+ # Calculate model weights (only option at the moment is even weighting,
+ # but this may be relaxed in future)
+ mod_weights <- data.frame(mod = paste0('mod', 1:n_models),
+ orig_weight = 1 / n_models,
+ ndraws = unlist(n_mod_draws,
+ use.names = FALSE)) %>%
+ # Adjust weights by the number of draws available per
+ # forecast, ensuring that models with fewer draws aren't
+ # under-represented in the final weighted ensemble
+ dplyr::mutate(weight = (orig_weight / ndraws) * 100) %>%
+ dplyr::mutate(mod = as.factor(mod)) %>%
+ dplyr::select(mod, weight)
+
+ # Create draw indices
+ mod_inds <- as.factor(unlist(lapply(seq_len(n_models), function(x){
+ rep(paste0('mod', x), NROW(models[[x]]$forecasts[[1]]))
+ }), use.names = FALSE))
+ all_draw_inds <- 1:sum(unlist(n_mod_draws,
+ use.names = FALSE))
+ mod_inds_draws <- split(all_draw_inds, mod_inds)
+
+ # Add model-specific weights to the draw indices
+ draw_weights <- data.frame(draw = all_draw_inds,
+ mod = mod_inds) %>%
+ dplyr::left_join(mod_weights, by = 'mod')
+
+ # Perform multinomial sampling using draw-specific weights
+ fc_draws <- sample(all_draw_inds,
+ size = ndraws,
+ replace = max(all_draw_inds) < ndraws,
+ prob = draw_weights$weight)
+
+ # Create weighted ensemble hindcasts and forecasts
ens_hcs <- lapply(seq_len(n_series), function(series){
all_hcs <- do.call(rbind,
lapply(models,
function(x) x$hindcasts[[series]]))
- subsamp(all_hcs, ndraws)
+ all_hcs[fc_draws, ]
})
ens_fcs <- lapply(seq_len(n_series), function(series){
all_fcs <- do.call(rbind,
lapply(models,
function(x) x$forecasts[[series]]))
- subsamp(all_fcs, ndraws)
+ all_fcs[fc_draws, ]
})
# Initiate the ensemble forecast object
diff --git a/R/globals.R b/R/globals.R
index 26aa793c..92f2a0d5 100644
--- a/R/globals.R
+++ b/R/globals.R
@@ -26,4 +26,4 @@ utils::globalVariables(c("y", "year", "smooth_vals", "smooth_num",
"value", "threshold", "colour", "resids",
"c_dark", "eval_timepoints", "yqlow",
"ymidlow", "ymidhigh", "yqhigh", "preds",
- "yhigh", "ylow"))
+ "yhigh", "ylow", "weight", "orig_weight"))
diff --git a/README.Rmd b/README.Rmd
index 940dff5a..b3007770 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -34,8 +34,9 @@ knitr::opts_chunk$set(
The goal of `mvgam` is to fit Bayesian (Dynamic) Generalized Additive Models. This package constructs State-Space models that can include highly flexible nonlinear predictor effects for both process and observation components by leveraging functionalities from the impressive [`brms`](https://paulbuerkner.com/brms/){target="_blank"} and [`mgcv`](https://cran.r-project.org/web/packages/mgcv/index.html){target="_blank"} packages. This allows `mvgam` to fit a wide range of models, including hierarchical ecological models such as N-mixture or Joint Species Distribution models, as well as univariate and multivariate time series models with imperfect detection. The original motivation for the package is described in [Clark & Wells 2022](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13974){target="_blank"} (published in *Methods in Ecology and Evolution*), with additional inspiration on the use of Bayesian probabilistic modelling coming from [Michael Betancourt](https://betanalpha.github.io/writing/){target="_blank"}, [Michael Dietze](https://www.bu.edu/earth/profiles/michael-dietze/){target="_blank"} and [Sarah Heaps](https://www.durham.ac.uk/staff/sarah-e-heaps/){target="_blank"}, among many others.
## Resources
-A series of [vignettes cover data formatting, forecasting and several extended case studies of DGAMs](https://nicholasjclark.github.io/mvgam/){target="_blank"}. A number of other examples have also been compiled:
+A series of [vignettes cover data formatting, forecasting and several extended case studies of DGAMs](https://nicholasjclark.github.io/mvgam/){target="_blank"}. A number of other examples, including some step-by-step introductory webinars, have also been compiled:
+* [Time series in R and Stan using the `mvgam` package](https://www.youtube.com/playlist?list=PLzFHNoUxkCvsFIg6zqogylUfPpaxau_a3){target="_blank"}
* [Ecological Forecasting with Dynamic Generalized Additive Models](https://www.youtube.com/watch?v=0zZopLlomsQ){target="_blank"}
* [Distributed lags (and hierarchical distributed lags) using `mgcv` and `mvgam`](https://ecogambler.netlify.app/blog/distributed-lags-mgcv/){target="_blank"}
* [State-Space Vector Autoregressions in `mvgam`](https://ecogambler.netlify.app/blog/vector-autoregressions/){target="_blank"}
diff --git a/README.md b/README.md
index 0fc4ffc2..65fdc807 100644
--- a/README.md
+++ b/README.md
@@ -43,9 +43,14 @@ target="_blank">Sarah Heaps, among many others.
A series of vignettes cover data formatting, forecasting and several
-extended case studies of DGAMs. A number of other examples have also
-been compiled:
-
+extended case studies of DGAMs. A number of other examples,
+including some step-by-step introductory webinars, have also been
+compiled:
+
+- Time series in R and Stan using the mvgam
+ package
- Ecological Forecasting with Dynamic Generalized Additive
Models
@@ -246,38 +251,37 @@ summary(lynx_mvgam)
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
-#> (Intercept) 6.400 6.60 6.900 1.01 926
-#> s(season).1 -0.630 -0.13 0.340 1.00 1365
-#> s(season).2 0.730 1.30 1.900 1.00 1060
-#> s(season).3 1.200 1.90 2.600 1.00 993
-#> s(season).4 -0.087 0.55 1.200 1.00 975
-#> s(season).5 -1.300 -0.70 -0.074 1.00 968
-#> s(season).6 -1.300 -0.56 0.120 1.00 1252
-#> s(season).7 0.032 0.73 1.400 1.00 1259
-#> s(season).8 0.610 1.40 2.100 1.00 729
-#> s(season).9 -0.370 0.23 0.890 1.00 829
-#> s(season).10 -1.400 -0.86 -0.360 1.00 1233
+#> (Intercept) 6.400 6.60 6.900 1.00 837
+#> s(season).1 -0.620 -0.14 0.390 1.01 729
+#> s(season).2 0.740 1.30 1.900 1.00 902
+#> s(season).3 1.300 1.90 2.600 1.00 734
+#> s(season).4 -0.046 0.53 1.100 1.00 945
+#> s(season).5 -1.300 -0.70 -0.053 1.00 730
+#> s(season).6 -1.200 -0.57 0.160 1.00 876
+#> s(season).7 0.051 0.73 1.400 1.00 917
+#> s(season).8 0.610 1.40 2.100 1.00 753
+#> s(season).9 -0.380 0.22 0.840 1.00 717
+#> s(season).10 -1.400 -0.88 -0.390 1.00 985
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
-#> s(season) 9.97 10 48.6 <2e-16 ***
+#> s(season) 9.98 10 49.1 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Latent trend parameter AR estimates:
#> 2.5% 50% 97.5% Rhat n_eff
-#> ar1[1] 0.60 0.83 0.98 1.01 671
-#> sigma[1] 0.38 0.47 0.60 1.00 750
+#> ar1[1] 0.60 0.83 0.98 1 625
+#> sigma[1] 0.38 0.48 0.60 1 787
#>
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
-#> 3 of 2000 iterations saturated the maximum tree depth of 10 (0.15%)
-#> *Run with max_treedepth set to a larger value to avoid saturation
+#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#>
-#> Samples were drawn using NUTS(diag_e) at Tue Dec 03 9:38:07 AM 2024.
+#> Samples were drawn using NUTS(diag_e) at Mon Dec 16 10:06:22 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
@@ -420,8 +424,8 @@ plot(lynx_mvgam, type = 'forecast', newdata = lynx_test)
- #> Out of sample CRPS:
- #> 2453.7903515
+ #> Out of sample DRPS:
+ #> 2412.582034
And the estimated latent trend component, again using the more flexible
`plot_mvgam_...()` option to show first derivatives of the estimated
@@ -475,31 +479,32 @@ description
```
#> Methods text skeleton
- #> We used the R package mvgam (version 1.1.4; Clark & Wells, 2023) to construct, fit and interrogate the mo
- #> del. mvgam fits Bayesian State-Space models that can include flexible predictor effects in both the proce
- #> ss and observation components by incorporating functionalities from the brms (Bürkner 2017), mgcv (Wood 2
- #> 017) and splines2 (Wang & Yan, 2023) packages. The mvgam-constructed model and observed data were passed
- #> to the probabilistic programming environment Stan (version 2.34.1; Carpenter et al. 2017, Stan Developmen
- #> t Team 2024), specifically through the cmdstanr interface (Gabry & Češnovar, 2021). We ran 4 Hamiltonian
- #> Monte Carlo chains for 500 warmup iterations and 500 sampling iterations for joint posterior estimation.
- #> Rank normalized split Rhat (Vehtari et al. 2021) and effective sample sizes were used to monitor converge
- #> nce.
+ #> We used the R package mvgam (version 1.1.4; Clark & Wells, 2023) to construct, fit and int
+ #> errogate the model. mvgam fits Bayesian State-Space models that can include flexible predi
+ #> ctor effects in both the process and observation components by incorporating functionaliti
+ #> es from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan, 2023) packages
+ #> . The mvgam-constructed model and observed data were passed to the probabilistic programmi
+ #> ng environment Stan (version 2.34.1; Carpenter et al. 2017, Stan Development Team 2024), s
+ #> pecifically through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 4 Hamiltonian
+ #> Monte Carlo chains for 500 warmup iterations and 500 sampling iterations for joint posteri
+ #> or estimation. Rank normalized split Rhat (Vehtari et al. 2021) and effective sample sizes
+ #> were used to monitor convergence.
#>
#> Primary references
#> Clark, NJ and Wells K (2022). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974
- #> Bürkner, PC (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1-28. doi:10.18637/jss.v080.i01
+ #> Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1-28. doi:10.18637/jss.v080.i01
#> Wood, SN (2017). Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC.
- #> Wang W and Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517. doi:10.6339/21-JDS1020 .
+ #> Wang W and Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517. doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020.
#> Carpenter, B, Gelman, A, Hoffman, MD, Lee, D, Goodrich, B, Betancourt, M, Brubaker, M, Guo, J, Li, P and Riddell, A (2017). Stan: A probabilistic programming language. Journal of Statistical Software 76.
- #> Gabry J, Češnovar R, Johnson A, and Bronder S (2024). cmdstanr: R Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/, https://discourse.mc-stan.org.
- #> Vehtari A, Gelman A, Simpson D, Carpenter B, and Bürkner P (2021). “Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC (with discussion).” Bayesian Analysis 16(2) 667-718. https://doi.org/10.1214/20-BA1221.
+ #> Gabry J, Cesnovar R, Johnson A, and Bronder S (2024). cmdstanr: R Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/, https://discourse.mc-stan.org.
+ #> Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021). Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2) 667-718. https://doi.org/10.1214/20-BA1221.
#>
#> Other useful references
- #> Arel-Bundock V (2024). marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests. R package version 0.19.0.4, https://marginaleffects.com/.
- #> Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019). “Visualization in Bayesian workflow.” Journal of the Royal Statatistical Society A, 182, 389-402. doi:10.1111/rssa.12378.
+ #> Arel-Bundock, V, Greifer, N, and Heiss, A (2024). How to interpret statistical models using marginaleffects for R and Python. Journal of Statistical Software, 111(9), 1-32. https://doi.org/10.18637/jss.v111.i09
+ #> Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019). Visualization in Bayesian workflow. Journal of the Royal Statatistical Society A, 182, 389-402. doi:10.1111/rssa.12378.
#> Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4.
- #> Bürkner, PC, Gabry, J, and Vehtari, A. (2020). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90(14), 2499–2523. https://doi.org/10.1080/00949655.2020.1783262
+ #> Burkner, PC, Gabry, J, and Vehtari, A. (2020). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90(14), 2499-2523. https://doi.org/10.1080/00949655.2020.1783262
## Extended observation families
@@ -618,7 +623,7 @@ summary(mod, include_betas = FALSE)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#>
-#> Samples were drawn using NUTS(diag_e) at Tue Dec 03 9:39:28 AM 2024.
+#> Samples were drawn using NUTS(diag_e) at Mon Dec 16 10:07:43 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
diff --git a/docs/articles/data_in_mvgam.html b/docs/articles/data_in_mvgam.html
index 9a7d9d95..591b613e 100644
--- a/docs/articles/data_in_mvgam.html
+++ b/docs/articles/data_in_mvgam.html
@@ -13,6 +13,7 @@
+