MultiVariate (Dynamic) Generalized Addivite Models
The goal of mvgam
is to fit Bayesian (Dynamic) Generalized Additive
Models. This package constructs State-Space models that can include
highly flexible nonlinear predictor effects for both process and
observation components by leveraging functionalities from the impressive
brms
and
mgcv
packages. This allows mvgam
to
fit a wide range of models, including hierarchical ecological models
such as N-mixture or Joint Species Distribution models, as well as
univariate and multivariate time series models with imperfect detection.
The original motivation for the package is described in Clark & Wells 2022 (published in Methods in
Ecology and Evolution), with additional inspiration on the use of
Bayesian probabilistic modelling coming from
Michael
Betancourt,
Michael Dietze and
Sarah Heaps, among many others.
Install the stable version from CRAN using: install.packages('mvgam')
,
or install the development version from GitHub
using:
devtools::install_github("nicholasjclark/mvgam")
. Note that to
actually condition models with MCMC sampling, the Stan
software must
be installed (along with either rstan
and/or cmdstanr
). Only rstan
is listed as a dependency of mvgam
to ensure that installation is less
difficult. If users wish to fit the models using mvgam
, please refer
to installation links for Stan
with rstan
here, or for Stan
with
cmdstandr
here. You will need a
fairly recent version of Stan
(preferably 2.29 or above) to ensure all
the model syntax is recognized. We highly recommend you use Cmdstan
through the cmdstanr
interface as the backend. This is because
Cmdstan
is easier to install, is more up to date with new features,
and uses less memory than Rstan
. See this documentation from the
Cmdstan
team for more
information.
mvgam
was originally designed to analyse and forecast non-negative
integer-valued data (counts). These data are traditionally challenging
to analyse with existing time-series analysis packages. But further
development of mvgam
has resulted in support for a growing number of
observation families that extend to other types of data. Currently, the
package can handle data for the following families:
gaussian()
for real-valued datastudent_t()
for heavy-tailed real-valued datalognormal()
for non-negative real-valued dataGamma()
for non-negative real-valued databetar()
for proportional data on(0,1)
bernoulli()
for binary datapoisson()
for count datanb()
for overdispersed count databinomial()
for count data with known number of trialsbeta_binomial()
for overdispersed count data with known number of trialsnmix()
for count data with imperfect detection (unknown number of trials)
See ??mvgam_families
for more information. Below is a simple example
for simulating and modelling proportional data with Beta
observations
over a set of seasonal series with independent Gaussian Process dynamic
trends:
set.seed(100)
data <- sim_mvgam(family = betar(),
T = 80,
trend_model = GP(),
prop_trend = 0.5,
seasonality = 'shared')
Plot the series to see how they evolve over time
plot_mvgam_series(data = data$data_train, series = 'all')
Fit a State-Space GAM to these series that uses a hierarchical cyclic seasonal smooth term to capture variation in seasonality among series. The model also includes series-specific latent Gaussian Processes with squared exponential covariance functions to capture temporal dynamics
mod <- mvgam(y ~ s(season, bs = 'cc', k = 7) +
s(season, by = series, m = 1, k = 5),
trend_model = GP(),
data = data$data_train,
newdata = data$data_test,
family = betar())
Plot the estimated posterior hindcast and forecast distributions for each series
layout(matrix(1:4, nrow = 2, byrow = TRUE))
for(i in 1:3){
plot(mod, type = 'forecast', series = i)
}
Various S3
functions can be used to inspect parameter estimates, plot
smooth functions and residuals, and evaluate models through posterior
predictive checks or forecast comparisons. Please see the package
documentation
for more detailed examples.
You can set build_vignettes = TRUE
when installing but be aware this
will slow down the installation drastically. Instead, you can always
access the vignette htmls online at
https://nicholasjclark.github.io/mvgam/articles/
A number of case studies and step-by-step webinars have been compiled to highlight how GAMs and DGAMs can be useful for analysing multivariate data:
- Time series in R and Stan using the
mvgam
package - Ecological Forecasting with Dynamic Generalized Additive Models
- State-Space Vector Autoregressions in
mvgam
- How to interpret and report nonlinear effects from Generalized Additive Models
- Phylogenetic smoothing using mgcv
- Distributed lags (and hierarchical distributed lags) using mgcv and mvgam
- Incorporating time-varying seasonality in forecast models
Please also feel free to use the mvgam
Discussion
Board to hunt for
or post other discussion topics related to the package, and do check out
the mvgam
changelog for
any updates about recent upgrades that the package has incorporated.
I’m actively seeking PhD students and other researchers to work in the
areas of ecological forecasting, multivariate model evaluation and
development of mvgam
. Please reach out if you are interested
(n.clark’at’uq.edu.au). Other contributions are also very welcome, but
please see The Contributor
Instructions
for general guidelines. Note that by participating in this project you
agree to abide by the terms of its Contributor Code of
Conduct.