diff --git a/R/jsdgam.R b/R/jsdgam.R index ee2f28ba..7abdcb97 100644 --- a/R/jsdgam.R +++ b/R/jsdgam.R @@ -9,13 +9,16 @@ #' #'@inheritParams mvgam #'@inheritParams ZMVN -#'@param factor_formula A \code{character} string specifying the linear predictor -#'effects for the latent factors. These are exactly like the formula +#'@param formula A \code{character} string specifying the GAM observation model formula. These are exactly like the formula #'for a GLM except that smooth terms, `s()`, `te()`, `ti()`, `t2()`, as well as time-varying #'`dynamic()` terms and nonparametric `gp()` terms, can be added to the right hand side #'to specify that the linear predictor depends on smooth functions of predictors #'(or linear functionals of these). Details of the formula syntax used by \pkg{mvgam} #'can be found in \code{\link{mvgam_formulae}} +#'@param factor_formula A \code{character} string specifying the linear predictor +#'effects for the latent factors. Use `by = trend` within calls to functional terms +#'(i.e. `s()`, `te()`, `ti()`, `t2()`, `dynamic()`, or `gp()`) to ensure that each factor +#'captures a different axis of variation. See the example below as an illustration #'@param factor_knots An optional \code{list} containing user specified knot values to #' be used for basis construction of any smooth terms in `factor_formula`. #'For most bases the user simply supplies the knots to be used, which must match up with the `k` value supplied @@ -39,6 +42,11 @@ #' of trials #' \item`beta_binomial()` as for `binomial()` but allows for overdispersion} #'Default is `poisson()`. See \code{\link{mvgam_families}} for more details +#' @param species An unquoted string representing the `factor` variable that indexes +#' the different outcome variables in `data` (usually `'species'` in a JSDM). +#' Defaults to `series` to be consistent with other `mvgam` models +#'@param n_lv \code{integer} the number of latent dynamic factors to use if \code{use_lv == TRUE}. +#'Cannot be `n_species`. Defaults arbitrarily to `2` #'@param ... Other arguments to pass to [mvgam] #'@author Nicholas J Clark #'@details Joint Species Distribution Models allow for responses of multiple species to be @@ -48,7 +56,29 @@ #'flexibility to model full communities of species. When calling [jsdgam], an initial State-Space model using #'`trend = 'None'` is set up and then modified to include the latent factors and their linear predictors. #'Consequently, you can inspect priors for these models using [get_mvgam_priors] by supplying the relevant -#'`formula`, `factor_formula`, `data` and `family` arguments and using `trend = 'None'` +#'`formula`, `factor_formula`, `data` and `family` arguments and keeping the default `trend = 'None'`. +#' +#' In a JSDGAM, the expectation of response \eqn{Y_{ij}} is modelled with +#' +#' \deqn{g(\mu_{ij}) = X_i'\beta + u_i'\theta_j,} +#' +#' where \eqn{g(.)} is a known link function, +#' \eqn{X_i'} is a design matrix of linear predictors (with associated \eqn{\beta} coefficients), +#' \eqn{u_i} are \eqn{n_{lv}}-variate latent factors +#' (\eqn{n_{lv}}<<\eqn{n_{species}}) and +#' \eqn{\theta_j} are species-specific loadings on the latent factors, respectively. The design matrix +#' \eqn{X} and \eqn{\beta} coefficients are constructed and modelled using `formula` and can contain +#' any of `mvgam`'s predictor effects, including random intercepts and slopes, multidimensional penalized +#' smooths, GP effects etc... The factor loadings \eqn{\theta_j} are constrained for identifiability but can +#' be used to reconstruct an estimate of the species' residual variance-covariance matrix (see the example below +#' for an illustration of this). The latent factors are further modelled using: +#'\deqn{ +#'u_i \sim \text{Normal}(Q_i\beta_{factor}, 1) \quad +#'} +#'where the second design matrix \eqn{Q} and associated \eqn{\beta_{factor}} coefficients are +#'constructed and modelled using `factor_formula`. Again, the effects that make up this linear +#'predictor can contain any of `mvgam`'s allowed predictor effects, providing enormous flexibility for +#'modelling species' communities. #'@seealso [mvgam] #'@references Nicholas J Clark & Konstans Wells (2020). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. #'Methods in Ecology and Evolution. 14:3, 771-784. @@ -58,7 +88,7 @@ #'@return A \code{list} object of class \code{mvgam} containing model output, #'the text representation of the model file, #'the mgcv model output (for easily generating simulations at -#'unsampled covariate values), Dunn-Smyth residuals for each series and key information needed +#'unsampled covariate values), Dunn-Smyth residuals for each species and key information needed #'for other functions in the package. See \code{\link{mvgam-class}} for details. #'Use `methods(class = "mvgam")` for an overview on available methods #'@examples @@ -195,7 +225,7 @@ #' # The data and the grouping variables #' data = dat, #' unit = site, -#' subgr = species, +#' species = species, #' #' # Poisson observations #' family = poisson(), @@ -261,7 +291,7 @@ jsdgam = function(formula, newdata, family = poisson(), unit = time, - subgr = series, + species = series, share_obs_params = FALSE, priors, n_lv = 2, @@ -286,7 +316,7 @@ jsdgam = function(formula, # Prep the trend so that the data can be structured in the usual # mvgam fashion (with 'time' and 'series' variables) unit <- deparse0(substitute(unit)) - subgr <- deparse0(substitute(subgr)) + subgr <- deparse0(substitute(species)) prepped_trend <- prep_jsdgam_trend(unit = unit, subgr = subgr, data = data) @@ -617,7 +647,7 @@ prep_jsdgam_trend = function(data, unit, subgr){ #' @noRd prep_jsdgam_trendmap = function(data, n_lv){ if(n_lv > nlevels(data$series)){ - stop('Number of factors must be <= number of levels in subgr', + stop('Number of factors must be <= number of levels in species', call. = FALSE) } data.frame(trend = rep(1:n_lv, diff --git a/R/mvgam.R b/R/mvgam.R index 6a9ceb3a..59521d1b 100644 --- a/R/mvgam.R +++ b/R/mvgam.R @@ -100,9 +100,9 @@ #'@param share_obs_params \code{logical}. If \code{TRUE} and the \code{family} #'has additional family-specific observation parameters (e.g. variance components in #'`student_t()` or `gaussian()`, or dispersion parameters in `nb()` or `betar()`), -#'these parameters will be shared across all series. This is handy if you have multiple -#'time series that you believe share some properties, such as being from the same -#'species over different spatial units. Default is \code{FALSE}. +#'these parameters will be shared across all outcome variables. This is handy if you have multiple +#'outcomes (time series in most `mvgam` models) that you believe share some properties, +#'such as being from the same species over different spatial units. Default is \code{FALSE}. #'@param use_lv \code{logical}. If \code{TRUE}, use dynamic factors to estimate series' #'latent trends in a reduced dimension format. Only available for #'`RW()`, `AR()` and `GP()` trend models. Defaults to \code{FALSE} @@ -167,9 +167,9 @@ #'up by any other means. Only available for some families(`poisson()`, `nb()`, `gaussian()`) and #'when using \code{Cmdstan} as the backend #'@param priors An optional \code{data.frame} with prior -#'definitions (in JAGS or Stan syntax). if using Stan, this can also be an object of -#'class `brmsprior` (see. \code{\link[brms]{prior}} for details). See [get_mvgam_priors] and -#''Details' for more information on changing default prior distributions +#'definitions (in JAGS or Stan syntax) or, preferentially, If using Stan, a vector containing +#' objects of class `brmsprior` (see. \code{\link[brms]{prior}} for details). +#' See [get_mvgam_priors] and Details' for more information on changing default prior distributions #'@param refit Logical indicating whether this is a refit, called using [update.mvgam]. Users should leave #'as `FALSE` #'@param lfo Logical indicating whether this is part of a call to [lfo_cv.mvgam]. Returns a diff --git a/docs/reference/jsdgam.html b/docs/reference/jsdgam.html index 49f3bcaa..d0eb1ad8 100644 --- a/docs/reference/jsdgam.html +++ b/docs/reference/jsdgam.html @@ -87,7 +87,7 @@
te()
, ti()
, t2()
, as well as time-varying
dynamic()
terms and nonparametric gp()
terms, can be added to the right hand side
to specify that the linear predictor depends on smooth functions of predictors
-(or linear functionals of these). In nmix()
family models, the formula
is used to
-set up a linear predictor for the detection probability. Details of the formula syntax used by mvgam
+(or linear functionals of these). Details of the formula syntax used by mvgam
can be found in mvgam_formulae
A character
string specifying the linear predictor
-effects for the latent factors. These are exactly like the formula
-for a GLM except that smooth terms, s()
, te()
, ti()
, t2()
, as well as time-varying
-dynamic()
terms and nonparametric gp()
terms, can be added to the right hand side
-to specify that the linear predictor depends on smooth functions of predictors
-(or linear functionals of these). Details of the formula syntax used by mvgam
-can be found in mvgam_formulae
by = trend
within calls to functional terms
+(i.e. s()
, te()
, ti()
, t2()
, dynamic()
, or gp()
) to ensure that each factor
+captures a different axis of variation. See the example below as an illustration
gr
)
-should not include a series
element in data
. Rather, this element will be created internally based
-on the supplied variables for gr
and subgr
. For example, if you are modelling
-counts for a group of species (labelled as species
in the data) across sampling sites
-(labelled as site
in the data) in three
-different geographical regions (labelled as region
), and you would like the residuals to be correlated
-within regions, then you should specify
-unit = site
, gr = region
, and subgr = species
. Internally, mvgam()
will appropriately order
-the data by unit
(in this case, by site
) and create
-the series
element for the data using something like: series = as.factor(paste0(group, '_', subgroup))
+An unquoted string representing the factor
variable that indexes
+the different outcome variables in data
(usually 'species'
in a JSDM).
+Defaults to series
to be consistent with other mvgam
models
logical
. If TRUE
and the family
has additional family-specific observation parameters (e.g. variance components in
student_t()
or gaussian()
, or dispersion parameters in nb()
or betar()
),
-these parameters will be shared across all series. This is handy if you have multiple
-time series that you believe share some properties, such as being from the same
-species over different spatial units. Default is FALSE
.
mvgam
models) that you believe share some properties,
+such as being from the same species over different spatial units. Default is FALSE
.
An optional data.frame
with prior
-definitions (in JAGS or Stan syntax). if using Stan, this can also be an object of
-class brmsprior
(see. prior
for details). See get_mvgam_priors and
-'Details' for more information on changing default prior distributions
brmsprior
(see. prior
for details).
+See get_mvgam_priors and Details' for more information on changing default prior distributions
integer
the number of latent dynamic factors to use if use_lv == TRUE
.
-Cannot be > n_series
. Defaults arbitrarily to min(2, floor(n_series / 2))
n_species
. Defaults arbitrarily to 2
list
object of class mvgam
containing model output,
the text representation of the model file,
the mgcv model output (for easily generating simulations at
-unsampled covariate values), Dunn-Smyth residuals for each series and key information needed
+unsampled covariate values), Dunn-Smyth residuals for each species and key information needed
for other functions in the package. See mvgam-class
for details.
Use methods(class = "mvgam")
for an overview on available methods
@@ -341,7 +327,26 @@ trend = 'None'
is set up and then modified to include the latent factors and their linear predictors.
Consequently, you can inspect priors for these models using get_mvgam_priors by supplying the relevant
-formula
, factor_formula
, data
and family
arguments and using trend = 'None'
+formula
, factor_formula
, data
and family
arguments and keeping the default trend = 'None'
.
+In a JSDGAM, the expectation of response \(Y_{ij}\) is modelled with
+$$g(\mu_{ij}) = X_i'\beta + u_i'\theta_j,$$
+where \(g(.)\) is a known link function,
+\(X_i'\) is a design matrix of linear predictors (with associated \(\beta\) coefficients),
+\(u_i\) are \(n_{lv}\)-variate latent factors
+(\(n_{lv}\)<<\(n_{species}\)) and
+\(\theta_j\) are species-specific loadings on the latent factors, respectively. The design matrix
+\(X\) and \(\beta\) coefficients are constructed and modelled using formula
and can contain
+any of mvgam
's predictor effects, including random intercepts and slopes, multidimensional penalized
+smooths, GP effects etc... The factor loadings \(\theta_j\) are constrained for identifiability but can
+be used to reconstruct an estimate of the species' residual variance-covariance matrix (see the example below
+for an illustration of this). The latent factors are further modelled using:
+$$
+u_i \sim \text{Normal}(Q_i\beta_{factor}, 1) \quad
+$$
+where the second design matrix \(Q\) and associated \(\beta_{factor}\) coefficients are
+constructed and modelled using factor_formula
. Again, the effects that make up this linear
+predictor can contain any of mvgam
's allowed predictor effects, providing enormous flexibility for
+modelling species' communities.
gaussian()
, or dispersion parameters in nb()
or betar()
),
-these parameters will be shared across all series. This is handy if you have multiple
-time series that you believe share some properties, such as being from the same
-species over different spatial units. Default is FALSE
.
+these parameters will be shared across all outcome variables. This is handy if you have multiple
+outcomes (time series in most mvgam
models) that you believe share some properties,
+such as being from the same species over different spatial units. Default is FALSE
.
brmsprior
(see. prior
for details).
+See get_mvgam_priors and Details' for more information on changing default prior distributions
gaussian()
, or dispersion parameters in nb()
or betar()
),
-these parameters will be shared across all series. This is handy if you have multiple
-time series that you believe share some properties, such as being from the same
-species over different spatial units. Default is FALSE
.
+these parameters will be shared across all outcome variables. This is handy if you have multiple
+outcomes (time series in most mvgam
models) that you believe share some properties,
+such as being from the same species over different spatial units. Default is FALSE
.
An optional data.frame
with prior
-definitions (in JAGS or Stan syntax). if using Stan, this can also be an object of
-class brmsprior
(see. prior
for details). See get_mvgam_priors and
-'Details' for more information on changing default prior distributions
brmsprior
(see. prior
for details).
+See get_mvgam_priors and Details' for more information on changing default prior distributions