diff --git a/DESCRIPTION b/DESCRIPTION index 6dba3629..0ef9d298 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: mvgam Title: Multivariate Bayesian Generalized Additive Models for discrete time series -Version: 1.0.3 -Date: 2022-08-03 +Version: 1.0.4 +Date: 2023-03-29 Authors@R: person(given = "Nicholas", family = "Clark", diff --git a/NAMESPACE b/NAMESPACE index 3052bba4..e94b472b 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -9,10 +9,6 @@ S3method(print,mvgam_prefit) S3method(summary,mvgam) S3method(summary,mvgam_prefit) export("%>%") -export(add_base_dgam_lines) -export(add_poisson_lines) -export(add_stan_data) -export(add_trend_lines) export(add_tweedie_lines) export(code) export(compare_mvgams) @@ -40,5 +36,4 @@ export(roll_eval_mvgam) export(series_to_mvgam) export(sim_mvgam) export(update_priors) -export(vectorise_stan_lik) importFrom(magrittr,"%>%") diff --git a/R/add_base_dgam_lines.R b/R/add_base_dgam_lines.R index 3d4c2d85..1dc08a75 100644 --- a/R/add_base_dgam_lines.R +++ b/R/add_base_dgam_lines.R @@ -1,7 +1,7 @@ #' Dynamic GAM model file additions #' #' -#' @export +#' @noRd #' @param use_lv Logical (use latent variables or not?) #' @param stan Logical (convert existing model to a Stan model?) #' @param offset Logical (include an offset in the linear predictor?) diff --git a/R/add_poisson_lines.R b/R/add_poisson_lines.R index cfafcea6..dce366e7 100644 --- a/R/add_poisson_lines.R +++ b/R/add_poisson_lines.R @@ -1,7 +1,7 @@ #' Poisson JAGS modifications #' #' -#' @export +#' @noRd #' @param model_file A template `JAGS` model file to be modified #' @param upper_bounds Optional upper bounds for the truncated observation likelihood #' @return A modified `JAGS` model file diff --git a/R/add_stan_data.R b/R/add_stan_data.R index 510b0d04..36d38c63 100644 --- a/R/add_stan_data.R +++ b/R/add_stan_data.R @@ -1,7 +1,7 @@ #' Add remaining data, model and parameter blocks to a Stan model #' #' -#' @export +#' @noRd #' @param jags_file Prepared JAGS mvgam model file #' @param stan_file Incomplete Stan model file to be edited #' @param use_lv logical diff --git a/R/add_trend_lines.R b/R/add_trend_lines.R index 61142914..efe36d1d 100644 --- a/R/add_trend_lines.R +++ b/R/add_trend_lines.R @@ -1,7 +1,7 @@ #' Latent trend model file modifications #' #' -#' @export +#' @noRd #' @param model_file A template `JAGS` or `Stan` model file to be modified #' @param stan Logical (convert existing `JAGS` model to a `Stan` model?) #' @param use_lv Logical (use latent variable trends or not) diff --git a/R/extract_cmdstanr.R b/R/extract_cmdstanr.R index 29370cff..c8f308ef 100644 --- a/R/extract_cmdstanr.R +++ b/R/extract_cmdstanr.R @@ -1,6 +1,8 @@ -# Helper functions for converting cmdstanr objects to stanfit -# All functions were directly copied from brms and so all credit must -# go to the brms development team +#' Helper functions for converting cmdstanr objects to stanfit. +#' All functions were directly copied from `brms` and so all credit must +#' go to the `brms` development team +#' @noRd +#' repair_variable_names <- function(x) { x <- sub("\\.", "[", x) x <- gsub("\\.", ",", x) diff --git a/R/mvgam.R b/R/mvgam.R index 07eeaba8..b07f122e 100644 --- a/R/mvgam.R +++ b/R/mvgam.R @@ -76,10 +76,8 @@ #'@param use_stan Logical. If \code{TRUE} and if \code{rstan} is installed, the model will be compiled and sampled using #'the Hamiltonian Monte Carlo with a call to \code{\link[cmdstanr]{cmdstan_model}} or, if `cmdstanr` is not available, #'a call to \code{\link[rstan]{stan}}. Note that -#'not all options that are available in `JAGS` can be used, including no option for a Tweedie family. -#'However, as `Stan` can estimate Hilbert base approximate gaussian processes, which -#'are much more computationally tractable than full GPs for time series with `>100` observations, estimation -#'in `Stan` can support latent GP trends while estimation in \code{JAGS} cannot +#'there are many more options when using `Stan` vs `JAGS` (the only "advantage" of `JAGS` is the ability +#'to use a Tweedie family). #'@param max_treedepth positive integer placing a cap on the number of simulation steps evaluated during each iteration when #'`use_stan == TRUE`. Default is `12`. Increasing this value can sometimes help with exploration of complex #'posterior geometries, but it is rarely fruitful to go above a `max_treedepth` of `14` @@ -141,17 +139,16 @@ #'draws from the model's posterior distribution #'\cr #'\cr -#'*Using Stan*: A useful feature of `mvgam` is the ability to use Hamiltonian Monte Carlo for parameter estimation -#'via the software `Stan` (using either the `cmdstanr` or `rstan` interface). Note that the `rstan` library is -#'currently required for this option to work, even if using `cmdstanr` as the backend. This is because `rstan`'s functions -#'are needed to arrange the posterior samples into the correct format for all of \code{mvgam}'s other functions to work. -#'Also note that currently there is no support for -#'fitting `Tweedie` responses in `Stan`. -#'However there are great advantages when using `Stan`, which includes the option to estimate smooth latent trends +#'*Using Stan*: A feature of `mvgam` is the ability to use Hamiltonian Monte Carlo for parameter estimation +#'via the software `Stan` (using either the `cmdstanr` or `rstan` interface). +#'There are great advantages when using `Stan`, which includes the option to estimate smooth latent trends #'via [Hilbert space approximate Gaussian Processes](https://arxiv.org/abs/2004.11408). This often makes sense for #'ecological series, which we expect to change smoothly. In `mvgam`, latent squared exponential GP trends are approximated using #'by default \code{40} basis functions, which saves computational costs compared to fitting full GPs while adequately estimating -#'GP \code{alpha} and \code{rho} parameters +#'GP \code{alpha} and \code{rho} parameters. Because of the many advantages of `Stan` over `JAGS`, further development +#'of the package will only be applied to `Stan`. This includes the planned addition of more response distributions, +#'plans to handle zero-inflation, and plans to incorporate a greater variety of trend models. Users are strongly encouraged +#'to opt for `Stan` over `JAGS` in any workflows #'@author Nicholas J Clark #' #'@seealso \code{\link[mcgv]{jagam}}, \code{\link[mcgv]{gam}} @@ -326,7 +323,7 @@ mvgam = function(formula, threads = 1, priors, upper_bounds, - use_stan = FALSE, + use_stan = TRUE, max_treedepth, adapt_delta, jags_path){ @@ -394,28 +391,33 @@ mvgam = function(formula, # If Stan is to be used, make sure it is installed if(use_stan & run_model){ - if(!require(rstan)){ - warning('rstan library is required but not found; setting run_model = FALSE') - run_model <- FALSE + if(!requireNamespace('rstan', quietly = TRUE)){ + warning('rstan library not found; checking for cmdstanr library') + + if(!requireNamespace('cmdstanr', quietly = TRUE)){ + warning('cmdstanr library not found; setting run_model = FALSE') + run_model <- FALSE + } + } + } - } # JAGS cannot support latent GP trends as there is no easy way to use Hilbert base # approximation to reduce the computational demands if(!use_stan & trend_model == 'GP'){ - warning('gaussian process trends not yet supported for JAGS; reverting to Stan') + warning('gaussian process trends not supported for JAGS; reverting to Stan') use_stan <- TRUE } if(use_stan & family == 'tw'){ - warning('Tweedie family not yet supported for stan; reverting to JAGS') + warning('Tweedie family not supported for stan; reverting to JAGS') use_stan <- FALSE } # If the model is to be run in JAGS, make sure the JAGS software can be located if(!use_stan){ if(run_model){ - if(!require(runjags)){ + if(!requireNamespace('runjags', quietly = TRUE)){ warning('runjags library is required but not found; setting run_model = FALSE') run_model <- FALSE } @@ -425,6 +427,7 @@ mvgam = function(formula, if(!use_stan){ if(run_model){ if(missing(jags_path)){ + require(runjags) jags_path <- runjags::findjags() } @@ -1248,14 +1251,6 @@ mvgam = function(formula, family = family, upper_bounds = upper_bounds) - # Remove data likelihood and posterior predictions if this is a prior sampling run - if(prior_simulation){ - stan_objects$stan_file <- stan_objects$stan_file[-c(grep('// likelihood functions', - stan_objects$stan_file): - (grep('// likelihood functions', - stan_objects$stan_file) + 6))] - } - model_data <- stan_objects$model_data if(use_lv){ model_data$n_lv <- n_lv @@ -1339,6 +1334,17 @@ mvgam = function(formula, threads = threads, trend_model = trend_model, offset = offset) + + # Remove data likelihood if this is a prior sampling run + if(prior_simulation){ + vectorised$model_file <- vectorised$model_file[-c((grep('// likelihood functions', + vectorised$model_file, + fixed = TRUE) - 1): + (grep('generated quantities {', + vectorised$model_file, + fixed = TRUE) - 4))] + } + model_data <- vectorised$model_data # Check if cmdstan is accessible; if not, use rstan @@ -1393,7 +1399,7 @@ mvgam = function(formula, init = inits, max_treedepth = 12, adapt_delta = 0.8, - iter_sampling = 600, + iter_sampling = samples, iter_warmup = 200, show_messages = FALSE, diagnostics = NULL) @@ -1415,16 +1421,6 @@ mvgam = function(formula, variables = param) out_gam_mod <- repair_stanfit(out_gam_mod) - # stanfit <- rstan::read_stan_csv(fit1$output_files(), col_major = TRUE) - # param_names <- row.names(rstan::summary(stanfit)$summary) - # stanfit@sim$samples <- lapply(seq_along(stanfit@sim$samples), function(x){ - # samps <- as.list(stanfit@sim$samples[[x]]) - # names(samps) <- param_names - # samps - # }) - # - # out_gam_mod <- stanfit - } else { require(rstan) message('Using rstan as the backend') diff --git a/R/vectorise_stan_lik.R b/R/vectorise_stan_lik.R index 10352b6b..27ffd1cc 100644 --- a/R/vectorise_stan_lik.R +++ b/R/vectorise_stan_lik.R @@ -1,7 +1,7 @@ #' Vectorise a stan model's likelihood for quicker computation #' #' -#' @export +#' @noRd #' @param model_file Stan model file to be edited #' @param model_data Prepared mvgam data for Stan modelling #' @param family \code{character}. Must be either 'Negative Binomial' or 'Poisson' diff --git a/docs/README-unnamed-chunk-11-1.png b/docs/README-unnamed-chunk-11-1.png index 27db2d30..db9c031d 100644 Binary files a/docs/README-unnamed-chunk-11-1.png and b/docs/README-unnamed-chunk-11-1.png differ diff --git a/docs/README-unnamed-chunk-13-1.png b/docs/README-unnamed-chunk-13-1.png index 51bd3ca4..fecab825 100644 Binary files a/docs/README-unnamed-chunk-13-1.png and b/docs/README-unnamed-chunk-13-1.png differ diff --git a/docs/README-unnamed-chunk-15-1.png b/docs/README-unnamed-chunk-15-1.png index 249a7713..9e431c14 100644 Binary files a/docs/README-unnamed-chunk-15-1.png and b/docs/README-unnamed-chunk-15-1.png differ diff --git a/docs/README-unnamed-chunk-16-1.png b/docs/README-unnamed-chunk-16-1.png index 7ce23e34..0d6d918e 100644 Binary files a/docs/README-unnamed-chunk-16-1.png and b/docs/README-unnamed-chunk-16-1.png differ diff --git a/docs/README-unnamed-chunk-17-1.png b/docs/README-unnamed-chunk-17-1.png index 8bf753d8..c5ad2025 100644 Binary files a/docs/README-unnamed-chunk-17-1.png and b/docs/README-unnamed-chunk-17-1.png differ diff --git a/docs/README-unnamed-chunk-18-1.png b/docs/README-unnamed-chunk-18-1.png index 505dcd9e..f654fdbd 100644 Binary files a/docs/README-unnamed-chunk-18-1.png and b/docs/README-unnamed-chunk-18-1.png differ diff --git a/docs/README-unnamed-chunk-19-1.png b/docs/README-unnamed-chunk-19-1.png index ca4fecaf..06c7f4d5 100644 Binary files a/docs/README-unnamed-chunk-19-1.png and b/docs/README-unnamed-chunk-19-1.png differ diff --git a/docs/README-unnamed-chunk-20-1.png b/docs/README-unnamed-chunk-20-1.png index df3a8314..d7915432 100644 Binary files a/docs/README-unnamed-chunk-20-1.png and b/docs/README-unnamed-chunk-20-1.png differ diff --git a/docs/README-unnamed-chunk-21-1.png b/docs/README-unnamed-chunk-21-1.png index 328589ee..c3b32e53 100644 Binary files a/docs/README-unnamed-chunk-21-1.png and b/docs/README-unnamed-chunk-21-1.png differ diff --git a/docs/README-unnamed-chunk-22-1.png b/docs/README-unnamed-chunk-22-1.png index 3fd3d649..8617b579 100644 Binary files a/docs/README-unnamed-chunk-22-1.png and b/docs/README-unnamed-chunk-22-1.png differ diff --git a/docs/README-unnamed-chunk-23-1.png b/docs/README-unnamed-chunk-23-1.png index 77a7137e..e583dfbe 100644 Binary files a/docs/README-unnamed-chunk-23-1.png and b/docs/README-unnamed-chunk-23-1.png differ diff --git a/docs/README-unnamed-chunk-24-1.png b/docs/README-unnamed-chunk-24-1.png index a6ef58dd..1a61b6cb 100644 Binary files a/docs/README-unnamed-chunk-24-1.png and b/docs/README-unnamed-chunk-24-1.png differ diff --git a/docs/README-unnamed-chunk-24-2.png b/docs/README-unnamed-chunk-24-2.png index 7c8a4f6d..9324eaf2 100644 Binary files a/docs/README-unnamed-chunk-24-2.png and b/docs/README-unnamed-chunk-24-2.png differ diff --git a/docs/README-unnamed-chunk-25-1.png b/docs/README-unnamed-chunk-25-1.png index 045d52ea..67789128 100644 Binary files a/docs/README-unnamed-chunk-25-1.png and b/docs/README-unnamed-chunk-25-1.png differ diff --git a/docs/README-unnamed-chunk-26-1.png b/docs/README-unnamed-chunk-26-1.png index da882f9c..4de8107f 100644 Binary files a/docs/README-unnamed-chunk-26-1.png and b/docs/README-unnamed-chunk-26-1.png differ diff --git a/docs/README-unnamed-chunk-26-2.png b/docs/README-unnamed-chunk-26-2.png new file mode 100644 index 00000000..06cc9b79 Binary files /dev/null and b/docs/README-unnamed-chunk-26-2.png differ diff --git a/docs/README-unnamed-chunk-27-1.png b/docs/README-unnamed-chunk-27-1.png index ba5262ee..c69447c9 100644 Binary files a/docs/README-unnamed-chunk-27-1.png and b/docs/README-unnamed-chunk-27-1.png differ diff --git a/docs/README-unnamed-chunk-28-1.png b/docs/README-unnamed-chunk-28-1.png index b0986dff..3fa2d31f 100644 Binary files a/docs/README-unnamed-chunk-28-1.png and b/docs/README-unnamed-chunk-28-1.png differ diff --git a/docs/README-unnamed-chunk-29-1.png b/docs/README-unnamed-chunk-29-1.png index 155484f6..7e021f3a 100644 Binary files a/docs/README-unnamed-chunk-29-1.png and b/docs/README-unnamed-chunk-29-1.png differ diff --git a/docs/README-unnamed-chunk-30-1.png b/docs/README-unnamed-chunk-30-1.png index c0a12581..aa9937d7 100644 Binary files a/docs/README-unnamed-chunk-30-1.png and b/docs/README-unnamed-chunk-30-1.png differ diff --git a/docs/README-unnamed-chunk-31-1.png b/docs/README-unnamed-chunk-31-1.png index c44d58d9..a3b04c31 100644 Binary files a/docs/README-unnamed-chunk-31-1.png and b/docs/README-unnamed-chunk-31-1.png differ diff --git a/docs/README-unnamed-chunk-32-1.png b/docs/README-unnamed-chunk-32-1.png index dc9186a0..a9aeb2cf 100644 Binary files a/docs/README-unnamed-chunk-32-1.png and b/docs/README-unnamed-chunk-32-1.png differ diff --git a/docs/README-unnamed-chunk-33-1.png b/docs/README-unnamed-chunk-33-1.png index b2ff1a4a..1a4f0ce7 100644 Binary files a/docs/README-unnamed-chunk-33-1.png and b/docs/README-unnamed-chunk-33-1.png differ diff --git a/docs/README-unnamed-chunk-34-1.png b/docs/README-unnamed-chunk-34-1.png new file mode 100644 index 00000000..e4fca89c Binary files /dev/null and b/docs/README-unnamed-chunk-34-1.png differ diff --git a/docs/README-unnamed-chunk-35-1.png b/docs/README-unnamed-chunk-35-1.png new file mode 100644 index 00000000..d60792be Binary files /dev/null and b/docs/README-unnamed-chunk-35-1.png differ diff --git a/docs/README-unnamed-chunk-8-1.png b/docs/README-unnamed-chunk-8-1.png index 5d2402c8..294a1a41 100644 Binary files a/docs/README-unnamed-chunk-8-1.png and b/docs/README-unnamed-chunk-8-1.png differ diff --git a/docs/README-unnamed-chunk-9-1.png b/docs/README-unnamed-chunk-9-1.png index c2294f43..f7ba4065 100644 Binary files a/docs/README-unnamed-chunk-9-1.png and b/docs/README-unnamed-chunk-9-1.png differ diff --git a/docs/README.Rmd b/docs/README.Rmd index cbbb2d70..d0a9e25d 100644 --- a/docs/README.Rmd +++ b/docs/README.Rmd @@ -40,7 +40,7 @@ The package can also be used to generate all necessary data structures, initial ## Installation Install the development version from `GitHub` using: -`devtools::install_github("nicholasjclark/mvgam")`. Note that to actually condition models with MCMC sampling, either the `JAGS` software must be installed (along with the `R` packages `rjags` and `runjags`) or the `Stan` software must be installed (along with the package `rstan` and, optionally, the `cmdstanr` package). These are not listed as dependencies of `mvgam` to ensure that installation is less difficult. If users wish to fit the models using `mvgam`, please refer to installation links for `JAGS` [here](https://sourceforge.net/projects/mcmc-jags/files/) or for `Stan` and `rstan` [here](https://mc-stan.org/users/interfaces/rstan)). +`devtools::install_github("nicholasjclark/mvgam")`. Note that to actually condition models with MCMC sampling, either the `JAGS` software must be installed (along with the `R` packages `rjags` and `runjags`) or the `Stan` software must be installed (along with either `rstan` and/or `cmdstanr`). These are not listed as dependencies of `mvgam` to ensure that installation is less difficult. If users wish to fit the models using `mvgam`, please refer to installation links for `JAGS` [here](https://sourceforge.net/projects/mcmc-jags/files/), for `Stan` with `rstan` [here](https://mc-stan.org/users/interfaces/rstan), or for `Stan` with `cmdstandr` [here](https://mc-stan.org/cmdstanr/). ## Citing mvgam and related software When using open source software (or software in general), please make sure to appropriately acknowledge the hard work that developers and maintainers put into making these packages available. Citations are currently the best way to formally acknowledge this work, so we highly encourage you to cite any packages that you rely on for your research. @@ -102,13 +102,23 @@ plot_mvgam_series(data = lynx_train, y = 'population') ``` Now we will formulate an `mvgam` model; this model fits a GAM in which a cyclic smooth function for `season` is estimated jointly with a full time series model for the temporal process (in this case an `AR3` process), rather than relying on smoothing splines that do not incorporate a concept of the future. We assume the outcome follows a Poisson distribution. But before conditioning the model on observed data, a check of prior smooth function realisations is useful to ensure we are allowing enough flexibility to capture the types of functional behaviours we think are reasonable without allowing outrageous behaviours. First we follow conventional recommendations to set `k` for the smooth term to be large, which would allow maximum flexibility in functional behaviours -```{r, message=FALSE, warning=FALSE} +```{r, eval=FALSE} +lynx_mvgam_prior <- mvgam(data = lynx_train, + formula = population ~ s(season, bs = 'cc', k = 19), + knots = list(season = c(0.5, 19.5)), + family = 'poisson', + trend_model = 'AR3', + chains = 2, + prior_simulation = TRUE) +``` + +```{r, include=FALSE} lynx_mvgam_prior <- mvgam(data = lynx_train, formula = population ~ s(season, bs = 'cc', k = 19), knots = list(season = c(0.5, 19.5)), family = 'poisson', trend_model = 'AR3', - chains = 1, + chains = 2, prior_simulation = TRUE) ``` @@ -118,18 +128,28 @@ plot(lynx_mvgam_prior, type = 'smooths', realisations = TRUE) ``` These functions are showing the marginal contribution of the seasonal smooth function to the linear predictor (on the log scale), and they are clearly allowed to move into ridiculous spaces that we should give very little prior plausibility to: ```{r} -exp(-25) -exp(25) +exp(-17) +exp(17) ``` Setting `k` to a smaller value results in less flexibility. This is because number of basis functions that contribute to functional behaviour is reduced -```{r, message=FALSE, warning=FALSE} +```{r, include = FALSE} +lynx_mvgam_prior <- mvgam(data = lynx_train, + formula = population ~ s(season, bs = 'cc', k = 12), + knots = list(season = c(0.5, 19.5)), + family = 'poisson', + trend_model = 'AR3', + chains = 2, + prior_simulation = TRUE) +``` + +```{r, eval = FALSE} lynx_mvgam_prior <- mvgam(data = lynx_train, formula = population ~ s(season, bs = 'cc', k = 12), knots = list(season = c(0.5, 19.5)), family = 'poisson', trend_model = 'AR3', - chains = 1, + chains = 2, prior_simulation = TRUE) ``` @@ -198,6 +218,7 @@ summary(lynx_mvgam) ``` As with any `MCMC` based software, we can inspect traceplots. Here for the `GAM` component smoothing parameters. +There is no requirement for `rstan` to be installed as a dependency, but we can still use it if available to examine traceplots ```{r} rstan::stan_trace(lynx_mvgam$model_output, 'rho') ``` diff --git a/docs/README.md b/docs/README.md index 7336bda4..4e88a609 100644 --- a/docs/README.md +++ b/docs/README.md @@ -50,13 +50,13 @@ Install the development version from `GitHub` using: `devtools::install_github("nicholasjclark/mvgam")`. Note that to actually condition models with MCMC sampling, either the `JAGS` software must be installed (along with the `R` packages `rjags` and `runjags`) or -the `Stan` software must be installed (along with the package `rstan` -and, optionally, the `cmdstanr` package). These are not listed as -dependencies of `mvgam` to ensure that installation is less difficult. -If users wish to fit the models using `mvgam`, please refer to -installation links for `JAGS` -[here](https://sourceforge.net/projects/mcmc-jags/files/) or for `Stan` -and `rstan` [here](https://mc-stan.org/users/interfaces/rstan)). +the `Stan` software must be installed (along with either `rstan` and/or +`cmdstanr`). These are not listed as dependencies of `mvgam` to ensure +that installation is less difficult. If users wish to fit the models +using `mvgam`, please refer to installation links for `JAGS` +[here](https://sourceforge.net/projects/mcmc-jags/files/), for `Stan` +with `rstan` [here](https://mc-stan.org/users/interfaces/rstan), or for +`Stan` with `cmdstandr` [here](https://mc-stan.org/cmdstanr/). ## Citing mvgam and related software @@ -189,7 +189,7 @@ lynx_mvgam_prior <- mvgam(data = lynx_train, knots = list(season = c(0.5, 19.5)), family = 'poisson', trend_model = 'AR3', - chains = 1, + chains = 2, prior_simulation = TRUE) ``` @@ -199,17 +199,17 @@ Plot a set of realisations from the prior seasonal smooth function plot(lynx_mvgam_prior, type = 'smooths', realisations = TRUE) ``` - + These functions are showing the marginal contribution of the seasonal smooth function to the linear predictor (on the log scale), and they are clearly allowed to move into ridiculous spaces that we should give very little prior plausibility to: ``` r -exp(-25) -#> [1] 1.388794e-11 -exp(25) -#> [1] 72004899337 +exp(-17) +#> [1] 4.139938e-08 +exp(17) +#> [1] 24154953 ``` Setting `k` to a smaller value results in less flexibility. This is @@ -222,7 +222,7 @@ lynx_mvgam_prior <- mvgam(data = lynx_train, knots = list(season = c(0.5, 19.5)), family = 'poisson', trend_model = 'AR3', - chains = 1, + chains = 2, prior_simulation = TRUE) ``` @@ -234,7 +234,7 @@ range of functional shapes. plot(lynx_mvgam_prior, type = 'smooths', realisations = TRUE) ``` - + In practice, imparting domain knowledge into prior specifications for penalised smooth functions is challenging, as these behaviours are often @@ -267,12 +267,12 @@ test_priors #> 3 ar2 1 trend AR2 coefficient #> 4 ar3 1 trend AR3 coefficient #> 5 sigma 1 trend sd -#> prior example_change -#> 1 lambda ~ normal(10, 25); lambda ~ exponential(0.87); -#> 2 ar1 ~ std_normal(); ar1 ~ normal(0.67, 0.17); -#> 3 ar2 ~ std_normal(); ar2 ~ normal(0.67, 0.96); -#> 4 ar3 ~ std_normal(); ar3 ~ normal(-0.25, 0.17); -#> 5 sigma ~ exponential(2); sigma ~ exponential(0.74); +#> prior example_change +#> 1 lambda ~ normal(10, 25); lambda ~ exponential(0.2); +#> 2 ar1 ~ std_normal(); ar1 ~ normal(-0.29, 0.72); +#> 3 ar2 ~ std_normal(); ar2 ~ normal(0.32, 0.62); +#> 4 ar3 ~ std_normal(); ar3 ~ normal(-0.54, 0.4); +#> 5 sigma ~ exponential(2); sigma ~ exponential(0.97); ``` Any of the above priors can be changed by modifying the `prior` column @@ -297,58 +297,58 @@ lynx_mvgam <- mvgam(data = lynx_train, #> Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup) #> Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup) #> Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup) -#> Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup) #> Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup) #> Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup) +#> Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup) #> Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup) -#> Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup) #> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup) -#> Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup) #> Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup) -#> Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup) +#> Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup) +#> Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup) +#> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup) #> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup) +#> Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup) #> Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup) -#> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup) +#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup) #> Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup) #> Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup) #> Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup) -#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup) -#> Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup) -#> Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling) -#> Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup) -#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup) -#> Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling) -#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling) #> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup) #> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling) -#> Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling) -#> Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling) -#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling) +#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup) +#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling) +#> Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup) +#> Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup) +#> Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling) +#> Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling) #> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling) -#> Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling) -#> Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling) -#> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling) +#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling) +#> Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling) +#> Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling) #> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling) -#> Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling) +#> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling) +#> Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling) +#> Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling) +#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling) #> Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling) +#> Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling) #> Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling) -#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling) -#> Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling) -#> Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling) #> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling) +#> Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling) +#> Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling) #> Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling) -#> Chain 4 Iteration: 1000 / 1000 [100%] (Sampling) -#> Chain 4 finished in 28.8 seconds. -#> Chain 3 Iteration: 1000 / 1000 [100%] (Sampling) -#> Chain 3 finished in 28.9 seconds. #> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling) -#> Chain 2 finished in 31.3 seconds. +#> Chain 2 finished in 29.8 seconds. +#> Chain 3 Iteration: 1000 / 1000 [100%] (Sampling) +#> Chain 4 Iteration: 1000 / 1000 [100%] (Sampling) +#> Chain 3 finished in 30.1 seconds. +#> Chain 4 finished in 30.2 seconds. #> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling) -#> Chain 1 finished in 31.7 seconds. +#> Chain 1 finished in 31.2 seconds. #> #> All 4 chains finished successfully. -#> Mean chain execution time: 30.2 seconds. -#> Total execution time: 31.8 seconds. +#> Mean chain execution time: 30.3 seconds. +#> Total execution time: 31.3 seconds. ``` Inspect the resulting model file, which is written in the `Stan` @@ -453,7 +453,7 @@ and compare to the histogram of the observations (`y`) ppc(lynx_mvgam, series = 1, type = 'hist') ``` - + Now plot the distribution of predicted means compared to the observed mean @@ -462,7 +462,7 @@ mean ppc(lynx_mvgam, series = 1, type = 'mean') ``` - + Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior retrodictions (`yhat`) and compare to the CDF of the @@ -472,7 +472,7 @@ observations (`y`) ppc(lynx_mvgam, series = 1, type = 'cdf') ``` - + Rootograms are becoming [popular graphical tools for checking a discrete model’s ability to capture dispersion properties of the response @@ -493,7 +493,7 @@ generally near zero ppc(lynx_mvgam, series = 1, type = 'rootogram', n_bins = 25) ``` - + Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased @@ -504,7 +504,7 @@ this histogram should look roughly uniform ppc(lynx_mvgam, series = 1, type = 'pit') ``` - + All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Have a @@ -537,28 +537,28 @@ summary(lynx_mvgam) #> #> GAM coefficient (beta) estimates: #> 2.5% 50% 97.5% Rhat n.eff -#> (Intercept) 6.49464200 6.71836000 6.95850925 1.00 1798 -#> s(season).1 -0.57933102 0.05280635 0.69412617 1.01 729 -#> s(season).2 -0.17771063 0.83898350 1.78268900 1.01 335 -#> s(season).3 -0.06180266 1.24715000 2.45915425 1.01 313 -#> s(season).4 -0.46297547 0.43705650 1.33236650 1.00 690 -#> s(season).5 -1.11386325 -0.15558450 0.91460440 1.00 394 -#> s(season).6 -1.01934825 -0.02110400 1.05492275 1.00 544 -#> s(season).7 -0.61721910 0.37137650 1.35038950 1.00 575 -#> s(season).8 -0.99024972 0.24853500 1.73782475 1.01 279 -#> s(season).9 -1.18000250 -0.31756100 0.62468777 1.01 351 -#> s(season).10 -1.35713100 -0.69993050 -0.03220405 1.00 577 +#> (Intercept) 6.50976700 6.73163000 6.97094975 1.00 1651 +#> s(season).1 -0.52701220 0.05518615 0.74531672 1.00 946 +#> s(season).2 -0.24032817 0.81511400 1.75200850 1.01 388 +#> s(season).3 -0.05913285 1.20650500 2.39905900 1.01 354 +#> s(season).4 -0.58263707 0.43134700 1.36017225 1.00 803 +#> s(season).5 -1.15155875 -0.15124250 0.93753662 1.00 454 +#> s(season).6 -1.02573100 -0.01850930 1.08575550 1.00 514 +#> s(season).7 -0.71321785 0.36646800 1.36718550 1.00 640 +#> s(season).8 -1.00653600 0.21252950 1.75936375 1.00 306 +#> s(season).9 -1.12760900 -0.34532400 0.68948080 1.00 366 +#> s(season).10 -1.36529700 -0.69741100 -0.02544059 1.00 574 #> #> GAM smoothing parameter (rho) estimates: -#> 2.5% 50% 97.5% Rhat n.eff -#> s(season) 2.082473 3.39916 4.244269 1 499 +#> 2.5% 50% 97.5% Rhat n.eff +#> s(season) 2.096422 3.4179 4.25348 1 466 #> #> Latent trend parameter estimates: -#> 2.5% 50% 97.5% Rhat n.eff -#> ar1[1] 0.7075472 1.1172950 1.42644350 1 509 -#> ar2[1] -0.8062831 -0.3954970 0.03302338 1 1725 -#> ar3[1] -0.4639477 -0.1326695 0.27083892 1 480 -#> sigma[1] 0.3984874 0.4902760 0.63022597 1 968 +#> 2.5% 50% 97.5% Rhat n.eff +#> ar1[1] 0.6731276 1.1223750 1.4346067 1 427 +#> ar2[1] -0.8113491 -0.4081520 0.0619227 1 1136 +#> ar3[1] -0.4765546 -0.1320530 0.3105695 1 430 +#> sigma[1] 0.3968855 0.4907045 0.6215614 1 1376 #> #> Stan MCMC diagnostics #> n_eff / iter looks reasonable for all parameters @@ -570,13 +570,15 @@ summary(lynx_mvgam) ``` As with any `MCMC` based software, we can inspect traceplots. Here for -the `GAM` component smoothing parameters. +the `GAM` component smoothing parameters. There is no requirement for +`rstan` to be installed as a dependency, but we can still use it if +available to examine traceplots ``` r rstan::stan_trace(lynx_mvgam$model_output, 'rho') ``` - + and for the latent trend component parameters @@ -584,7 +586,7 @@ and for the latent trend component parameters rstan::stan_trace(lynx_mvgam$model_output, c('ar1', 'ar2', 'ar3', 'sigma')) ``` - + Inspect the model’s estimated smooth for the 19-year cyclic pattern, which is shown as a ribbon plot of posterior empirical quantiles. We can @@ -603,7 +605,7 @@ important in the model plot(lynx_mvgam, type = 'smooths', residuals = TRUE) ``` - + It is often also useful to compare prior to posterior function realisations to understand how informative the observed data have been @@ -614,13 +616,13 @@ layout(matrix(1:2, nrow = 2)) plot(lynx_mvgam_prior, type = 'smooths', realisations = TRUE) ``` - + ``` r plot(lynx_mvgam, type = 'smooths', realisations = TRUE) ``` - + ``` r layout(1) @@ -634,7 +636,7 @@ the more flexible `plot_mvgam_smooth()` function plot_mvgam_smooth(lynx_mvgam, 1, 'season', derivatives = TRUE) ``` - + We can also view the mvgam’s posterior retrodictions and predictions for the entire series (testing and training) @@ -642,11 +644,11 @@ the entire series (testing and training) ``` r plot(lynx_mvgam, type = 'forecast', newdata = lynx_test) #> Out of sample DRPS: -#> [1] 1078.426 +#> [1] 1061.58 #> ``` - + And the estimated latent trend component, again using the more flexible `plot_mvgam_...()` option to show first derivatives of the estimated @@ -656,7 +658,7 @@ trend plot_mvgam_trend(lynx_mvgam, newdata = lynx_test, derivatives = TRUE) ``` - + We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better @@ -667,25 +669,25 @@ simulate realistic and unbiased future values ppc(lynx_mvgam, series = 1, type = 'rootogram', newdata = lynx_test) ``` - + ``` r ppc(lynx_mvgam, series = 1, type = 'mean', newdata = lynx_test) ``` - + ``` r ppc(lynx_mvgam, series = 1, type = 'cdf', newdata = lynx_test) ``` - + ``` r ppc(lynx_mvgam, series = 1, type = 'pit', newdata = lynx_test) ``` - + A key aspect of ecological forecasting is to understand [how different components of a model contribute to forecast @@ -701,7 +703,7 @@ text(1, 0.8, cex = 1.5, label="Trend component", pos = 4, col="#7C0000", family = 'serif') ``` - + Both components contribute to forecast uncertainty, suggesting we would still need some more work to learn about factors driving the dynamics of @@ -716,7 +718,7 @@ would suggest our AR3 model is appropriate for the latent trend plot(lynx_mvgam, type = 'residuals') ``` - + Another useful utility of `mvgam` is the ability to use rolling window forecasts to evaluate competing models that may represent different @@ -742,55 +744,55 @@ lynx_mvgam_poor <- mvgam(data = lynx_train, #> Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup) #> Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup) #> Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup) -#> Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup) #> Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup) +#> Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup) #> Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup) -#> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup) -#> Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup) #> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup) +#> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup) #> Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup) +#> Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup) +#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup) #> Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup) -#> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup) #> Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup) -#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup) +#> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup) +#> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup) +#> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling) #> Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup) +#> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling) #> Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup) #> Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup) -#> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup) #> Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling) -#> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling) -#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup) #> Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup) -#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling) +#> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling) #> Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling) #> Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling) -#> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling) -#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling) +#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup) +#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling) +#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling) #> Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling) +#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling) #> Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling) -#> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling) +#> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling) +#> Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling) #> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling) #> Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling) -#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling) +#> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling) +#> Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling) +#> Chain 2 finished in 7.1 seconds. #> Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling) -#> Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling) #> Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling) +#> Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling) #> Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling) -#> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling) -#> Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling) -#> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling) #> Chain 4 Iteration: 1000 / 1000 [100%] (Sampling) -#> Chain 1 finished in 7.7 seconds. -#> Chain 4 finished in 7.7 seconds. -#> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling) -#> Chain 2 finished in 7.8 seconds. -#> Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling) +#> Chain 4 finished in 8.1 seconds. +#> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling) #> Chain 3 Iteration: 1000 / 1000 [100%] (Sampling) -#> Chain 3 finished in 9.0 seconds. +#> Chain 1 finished in 8.6 seconds. +#> Chain 3 finished in 8.5 seconds. #> #> All 4 chains finished successfully. -#> Mean chain execution time: 8.0 seconds. -#> Total execution time: 9.1 seconds. +#> Mean chain execution time: 8.1 seconds. +#> Total execution time: 8.7 seconds. ``` We choose a set of timepoints within the training data to forecast from, @@ -812,10 +814,10 @@ markedly better (far lower DRPS) for this evaluation timepoint ``` r summary(mod1_eval$series1$drps) #> Min. 1st Qu. Median Mean 3rd Qu. Max. -#> 58.94 91.37 162.65 164.15 188.98 320.40 +#> 56.23 86.39 172.02 167.16 195.51 328.98 summary(mod2_eval$series1$drps) #> Min. 1st Qu. Median Mean 3rd Qu. Max. -#> 54.24 60.54 266.41 265.25 398.39 624.62 +#> 60.94 64.94 262.89 263.59 386.05 613.20 ``` Nominal coverages for both models’ 90% prediction intervals diff --git a/man/add_base_dgam_lines.Rd b/man/add_base_dgam_lines.Rd deleted file mode 100644 index af66483d..00000000 --- a/man/add_base_dgam_lines.Rd +++ /dev/null @@ -1,21 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/add_base_dgam_lines.R -\name{add_base_dgam_lines} -\alias{add_base_dgam_lines} -\title{Dynamic GAM model file additions} -\usage{ -add_base_dgam_lines(use_lv, stan = FALSE, offset = FALSE) -} -\arguments{ -\item{use_lv}{Logical (use latent variables or not?)} - -\item{stan}{Logical (convert existing model to a Stan model?)} - -\item{offset}{Logical (include an offset in the linear predictor?)} -} -\value{ -A character string to add to the mgcv jagam model file -} -\description{ -Dynamic GAM model file additions -} diff --git a/man/add_poisson_lines.Rd b/man/add_poisson_lines.Rd deleted file mode 100644 index 1ca7e095..00000000 --- a/man/add_poisson_lines.Rd +++ /dev/null @@ -1,19 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/add_poisson_lines.R -\name{add_poisson_lines} -\alias{add_poisson_lines} -\title{Poisson JAGS modifications} -\usage{ -add_poisson_lines(model_file, upper_bounds) -} -\arguments{ -\item{model_file}{A template \code{JAGS} model file to be modified} - -\item{upper_bounds}{Optional upper bounds for the truncated observation likelihood} -} -\value{ -A modified \code{JAGS} model file -} -\description{ -Poisson JAGS modifications -} diff --git a/man/add_stan_data.Rd b/man/add_stan_data.Rd deleted file mode 100644 index 4b0f9df1..00000000 --- a/man/add_stan_data.Rd +++ /dev/null @@ -1,40 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/add_stan_data.R -\name{add_stan_data} -\alias{add_stan_data} -\title{Add remaining data, model and parameter blocks to a Stan model} -\usage{ -add_stan_data( - jags_file, - stan_file, - use_lv = FALSE, - n_lv, - jags_data, - family = "poisson", - upper_bounds -) -} -\arguments{ -\item{jags_file}{Prepared JAGS mvgam model file} - -\item{stan_file}{Incomplete Stan model file to be edited} - -\item{use_lv}{logical} - -\item{n_lv}{\code{integer} number of latent dynamic factors (if \code{use_lv = TRUE})} - -\item{jags_data}{Prepared mvgam data for JAGS modelling} - -\item{family}{\code{character}. Must be either 'nb' (for Negative Binomial), 'tw' (for Tweedie) or 'poisson'} - -\item{upper_bounds}{Optional \code{vector} of \code{integer} values specifying upper limits for each series. If supplied, -this generates a modified likelihood where values above the bound are given a likelihood of zero. Note this modification -is computationally expensive in \code{JAGS} but can lead to better estimates when true bounds exist. Default is to remove -truncation entirely (i.e. there is no upper bound for each series)} -} -\value{ -A \code{list} containing the updated Stan model and model data -} -\description{ -Add remaining data, model and parameter blocks to a Stan model -} diff --git a/man/add_trend_lines.Rd b/man/add_trend_lines.Rd deleted file mode 100644 index ff621598..00000000 --- a/man/add_trend_lines.Rd +++ /dev/null @@ -1,25 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/add_trend_lines.R -\name{add_trend_lines} -\alias{add_trend_lines} -\title{Latent trend model file modifications} -\usage{ -add_trend_lines(model_file, stan = FALSE, use_lv, trend_model, drift) -} -\arguments{ -\item{model_file}{A template \code{JAGS} or \code{Stan} model file to be modified} - -\item{stan}{Logical (convert existing \code{JAGS} model to a \code{Stan} model?)} - -\item{use_lv}{Logical (use latent variable trends or not)} - -\item{trend_model}{The type of trend model to be added to the model file} - -\item{drift}{Logical (add drift or not)} -} -\value{ -A modified \code{JAGS} or \code{Stan} model file -} -\description{ -Latent trend model file modifications -} diff --git a/man/mvgam.Rd b/man/mvgam.Rd index 3a4f3420..520ce25a 100644 --- a/man/mvgam.Rd +++ b/man/mvgam.Rd @@ -27,7 +27,7 @@ mvgam( threads = 1, priors, upper_bounds, - use_stan = FALSE, + use_stan = TRUE, max_treedepth, adapt_delta, jags_path @@ -126,10 +126,8 @@ truncation entirely (i.e. there is no upper bound for each series)} \item{use_stan}{Logical. If \code{TRUE} and if \code{rstan} is installed, the model will be compiled and sampled using the Hamiltonian Monte Carlo with a call to \code{\link[cmdstanr]{cmdstan_model}} or, if \code{cmdstanr} is not available, a call to \code{\link[rstan]{stan}}. Note that -not all options that are available in \code{JAGS} can be used, including no option for a Tweedie family. -However, as \code{Stan} can estimate Hilbert base approximate gaussian processes, which -are much more computationally tractable than full GPs for time series with \verb{>100} observations, estimation -in \code{Stan} can support latent GP trends while estimation in \code{JAGS} cannot} +there are many more options when using \code{Stan} vs \code{JAGS} (the only "advantage" of \code{JAGS} is the ability +to use a Tweedie family).} \item{max_treedepth}{positive integer placing a cap on the number of simulation steps evaluated during each iteration when \code{use_stan == TRUE}. Default is \code{12}. Increasing this value can sometimes help with exploration of complex @@ -208,7 +206,7 @@ autocorrelation will be evident. When a particular observation is missing, the r draws from the model's posterior distribution \cr \cr -\emph{Using Stan}: A useful feature of \code{mvgam} is the ability to use Hamiltonian Monte Carlo for parameter estimation +\emph{Using Stan}: A feature of \code{mvgam} is the ability to use Hamiltonian Monte Carlo for parameter estimation via the software \code{Stan} (using either the \code{cmdstanr} or \code{rstan} interface). Note that the \code{rstan} library is currently required for this option to work, even if using \code{cmdstanr} as the backend. This is because \code{rstan}'s functions are needed to arrange the posterior samples into the correct format for all of \code{mvgam}'s other functions to work. @@ -218,7 +216,10 @@ However there are great advantages when using \code{Stan}, which includes the op via \href{https://arxiv.org/abs/2004.11408}{Hilbert space approximate Gaussian Processes}. This often makes sense for ecological series, which we expect to change smoothly. In \code{mvgam}, latent squared exponential GP trends are approximated using by default \code{40} basis functions, which saves computational costs compared to fitting full GPs while adequately estimating -GP \code{alpha} and \code{rho} parameters +GP \code{alpha} and \code{rho} parameters. Because of the many advantages of \code{Stan} over \code{JAGS}, further development +of the package will only be applied to \code{Stan}. This includes the planned addition of more response distributions, +plans to handle zero-inflation, and plans to incorporate a greater variety of trend models. Users are strongly encouraged +to opt for \code{Stan} over \code{JAGS} in any workflows } \examples{ \donttest{ diff --git a/man/vectorise_stan_lik.Rd b/man/vectorise_stan_lik.Rd deleted file mode 100644 index 48d23d15..00000000 --- a/man/vectorise_stan_lik.Rd +++ /dev/null @@ -1,45 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/vectorise_stan_lik.R -\name{vectorise_stan_lik} -\alias{vectorise_stan_lik} -\title{Vectorise a stan model's likelihood for quicker computation} -\usage{ -vectorise_stan_lik( - model_file, - model_data, - family = "Poisson", - trend_model = "None", - offset = FALSE, - threads = 1 -) -} -\arguments{ -\item{model_file}{Stan model file to be edited} - -\item{model_data}{Prepared mvgam data for Stan modelling} - -\item{family}{\code{character}. Must be either 'Negative Binomial' or 'Poisson'} - -\item{trend_model}{\code{character} specifying the time series dynamics for the latent trend. Options are: -'None' (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, -and the observation process is the only source of error; similarly to what is estimated by \code{\link[mcgv]{gam}}), -'RW' (random walk with possible drift), -'AR1' (AR1 model with intercept), -'AR2' (AR2 model with intercept) or -'AR3' (AR3 model with intercept) or -'GP' (Gaussian process with squared exponential kernel; currently under development and -only available for estimation in \code{stan})} - -\item{offset}{\code{logical}} - -\item{threads}{\code{integer} Experimental option to use multithreading for within-chain -parallelisation in \code{Stan}. We recommend its use only if you are experienced with -\code{Stan}'s \code{reduce_sum} function and have a slow running model that cannot be sped -up by any other means} -} -\value{ -A \code{list} containing the updated Stan model and model data -} -\description{ -Vectorise a stan model's likelihood for quicker computation -} diff --git a/manuscript_analysis/Case studies/mvgam_case_study2.Rmd b/manuscript_analysis/Case studies/mvgam_case_study2.Rmd index 4c937a4e..e5e23fd3 100644 --- a/manuscript_analysis/Case studies/mvgam_case_study2.Rmd +++ b/manuscript_analysis/Case studies/mvgam_case_study2.Rmd @@ -2,10 +2,10 @@ title: 'mvgam case study 2: multivariate models' author: "Nicholas Clark (n.clark@uq.edu.au)" output: - pdf_document: - highlight: zenburn html_document: df_print: paged + pdf_document: + highlight: zenburn word_document: default urlcolor: blue --- diff --git a/manuscript_analysis/Case studies/mvgam_case_study2.html b/manuscript_analysis/Case studies/mvgam_case_study2.html index 812d0adf..59038697 100644 --- a/manuscript_analysis/Case studies/mvgam_case_study2.html +++ b/manuscript_analysis/Case studies/mvgam_case_study2.html @@ -1728,10 +1728,10 @@

Nicholas Clark (Nicholas Clark (plot_mvgam_smooth(object = mod1, series = 1, smooth = 'season') -

+

And the true seasonal function in the simulation

plot(dat$global_seasonality[1:12], type = 'l', bty = 'L', ylab = 'True function', xlab = 'season')

@@ -1776,10 +1776,10 @@

Nicholas Clark (plot_mvgam_factors(mod1) - +

Now we fit the same model but assume that we no nothing about how @@ -1821,34 +1821,34 @@

Nicholas Clark (plot_mvgam_smooth(object = mod2, series = 1, smooth = 'season') -

+

plot(dat$global_seasonality[1:12], type = 'l', bty = 'L', ylab = 'True function', xlab = 'season')

Examining the factor contributions gives us some insight into whether @@ -1857,10 +1857,10 @@

Nicholas Clark (plot_mvgam_factors(mod2) - +

The very weak contributions by some of the factors are a result of @@ -1898,7 +1898,7 @@

Nicholas Clark (plot(series, legend.loc = 'topleft') -

+

Now we will fit an mvgam model with shared seasonality and random intercepts per series. Our first attempt will ignore any temporal component in the residuals so that we can identidy which GAM @@ -1933,45 +1933,45 @@

Nicholas Clark (Nicholas Clark (Nicholas Clark (
AIC(trends_mod1$mgcv_model,
       trends_mod2$mgcv_model)
summary(trends_mod1$mgcv_model)
## 
-## Family: poisson 
+## Family: Negative Binomial(196318.42) 
 ## Link function: log 
 ## 
 ## Formula:
@@ -2068,16 +2068,16 @@ 

Nicholas Clark (summary(trends_mod2$mgcv_model)

## 
-## Family: poisson 
+## Family: Negative Binomial(547945.816) 
 ## Link function: log 
 ## 
 ## Formula:
@@ -2086,19 +2086,19 @@ 

Nicholas Clark (Nicholas Clark (cat(c("```stan", trends_mod3$model_file, "```"), sep = "\n")

@@ -2181,7 +2181,7 @@

Nicholas Clark (Nicholas Clark (Nicholas Clark (Nicholas Clark (Nicholas Clark (Nicholas Clark (Nicholas Clark (pairs(trends_mod3$model_output, pars = c('lv_coefs','rho_gp')) -

+

There are some nasty funnel shapes in this plot, primarily occurring when some of the loadings go to 0 and the length scales approach small values, suggesting that the latent length scales are @@ -2499,45 +2499,45 @@

Nicholas Clark (Nicholas Clark (Nicholas Clark (pairs(trends_mod3$model_output, pars = c('lv_coefs','rho_gp')) -

+

Inspection of the dynamic factors and their relative contributions indicates that both factors are contributing the the series’ latent trends

plot_mvgam_factors(trends_mod3)
- +

Look at Dunn-Smyth residuals for some series from this preferred model to ensure that our dynamic factor process has captured most of the temporal dependencies in the observations

plot_mvgam_resids(trends_mod3, series = 1)
-

+

plot_mvgam_resids(trends_mod3, series = 2)
-

+

plot_mvgam_resids(trends_mod3, series = 3)
-

+

plot_mvgam_resids(trends_mod3, series = 4)
-

+

Perform posterior predictive checks to see if the model is able to simulate data that looks realistic and unbiased by examining simulated kernel densities for posterior predictions (yhat) compared @@ -2642,49 +2642,49 @@

Nicholas Clark (ppc(trends_mod3, series = 1, type = 'hist') -

+

ppc(trends_mod3, series = 2, type = 'hist')
-

+

ppc(trends_mod3, series = 3, type = 'hist')
-

+

ppc(trends_mod3, series = 4, type = 'hist')
-

+

Look at traceplots for the smoothing parameters (rho)

rstan::stan_trace(trends_mod3$model_output, pars = 'rho')
-

+

Plot posterior predictive distributions for the training and testing periods for each series

plot_mvgam_fc(object = trends_mod3, series = 1, newdata = trends_data$data_test)
## Out of sample DRPS:
-
## [1] 29.99636
+
## [1] 21.9474
## 
-

+

plot_mvgam_fc(object = trends_mod3, series = 2, newdata = trends_data$data_test)
## Out of sample DRPS:
-
## [1] 21.21557
+
## [1] 8.076631
## 
-

+

plot_mvgam_fc(object = trends_mod3, series = 3, newdata = trends_data$data_test)
## Out of sample DRPS:
-
## [1] 23.52815
+
## [1] 10.78298
## 
-

+

plot_mvgam_fc(object = trends_mod3, series = 4, newdata = trends_data$data_test)
## Out of sample DRPS:
-
## [1] 26.99851
+
## [1] 16.09829
## 
-

+

Plot posterior distributions for the latent trend estimates, again for the training and testing periods

plot_mvgam_trend(object = trends_mod3, series = 1, newdata = trends_data$data_test)
-

+

plot_mvgam_trend(object = trends_mod3, series = 2, newdata = trends_data$data_test)
-

+

plot_mvgam_trend(object = trends_mod3, series = 3, newdata = trends_data$data_test)
-

+

plot_mvgam_trend(object = trends_mod3, series = 4, newdata = trends_data$data_test)
-

+

Given that we fit a model with hierarchical seasonality, the seasonal smooths are able to deviate from one another (though they share the same wiggliness and all deviate from a common ‘global’ seasonal function). @@ -2697,28 +2697,28 @@

Nicholas Clark (

+

newdat <- data.frame(season = seq(1, 12, length.out = 100),
                      series = levels(trends_data$data_train$series)[2])
   
 plot_mvgam_smooth(object = trends_mod3, series = 2, 
                   smooth = 'season',
                   newdata = newdat)
-

+

newdat <- data.frame(season = seq(1, 12, length.out = 100),
                      series = levels(trends_data$data_train$series)[3])
   
 plot_mvgam_smooth(object = trends_mod3, series = 3, 
                   smooth = 'season',
                   newdata = newdat)
-

+

newdat <- data.frame(season = seq(1, 12, length.out = 100),
                      series = levels(trends_data$data_train$series)[4])
   
 plot_mvgam_smooth(object = trends_mod3, series = 4, 
                   smooth = 'season',
                   newdata = newdat)
-

+

Plot posterior mean estimates of latent trend correlations. These correlations are more useful than looking at latent factor loadings, for example to inspect ordinations. This is because the orders of the @@ -2739,7 +2739,7 @@

Nicholas Clark (

+

There is certainly some evidence of positive trend correlations for a few of these search terms, which is not surprising given how similar some of them are and how closely linked they should be to interest about @@ -2752,21 +2752,21 @@

Nicholas Clark (ppc(trends_mod3, series = 1, type = 'hist', newdata = trends_data$data_test) -

+

ppc(trends_mod3, series = 1, type = 'mean', newdata = trends_data$data_test)
-

+

ppc(trends_mod3, series = 2, type = 'hist', newdata = trends_data$data_test)
-

+

ppc(trends_mod3, series = 2, type = 'mean', newdata = trends_data$data_test)
-

+

ppc(trends_mod3, series = 3, type = 'hist', newdata = trends_data$data_test)
-

+

ppc(trends_mod3, series = 3, type = 'mean', newdata = trends_data$data_test)
-

+

ppc(trends_mod3, series = 4, type = 'hist', newdata = trends_data$data_test)
-

+

ppc(trends_mod3, series = 4, type = 'mean', newdata = trends_data$data_test)
-

+

Other next steps could involve devising a more goal-specific set of posterior predictive checks (see this paper by Gelman et al and relevant diff --git a/manuscript_analysis/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf b/manuscript_analysis/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf index 90b54582..13ca3ce9 100644 --- a/manuscript_analysis/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf +++ b/manuscript_analysis/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf @@ -7,5 +7,5 @@ hostUrl: rpubs.com appId: https://api.rpubs.com/api/v1/document/856733/5be2f57c523545a694d50c31f9a87205 bundleId: https://api.rpubs.com/api/v1/document/856733/5be2f57c523545a694d50c31f9a87205 url: http://rpubs.com/NickClark47/856733 -when: 1678524251.11056 -lastSyncTime: 1678524251.11057 +when: 1680038729.36083 +lastSyncTime: 1680038729.36084