diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 00000000..6ca83b0a Binary files /dev/null and b/.DS_Store differ diff --git a/NEON_manuscript/.DS_Store b/NEON_manuscript/.DS_Store new file mode 100644 index 00000000..be792cad Binary files /dev/null and b/NEON_manuscript/.DS_Store differ diff --git a/NEON_manuscript/Case studies/.DS_Store b/NEON_manuscript/Case studies/.DS_Store new file mode 100644 index 00000000..8a189d72 Binary files /dev/null and b/NEON_manuscript/Case studies/.DS_Store differ diff --git a/NEON_manuscript/Case studies/mvgam_case_study1.html b/NEON_manuscript/Case studies/mvgam_case_study1.html index 8149b96b..31188442 100644 --- a/NEON_manuscript/Case studies/mvgam_case_study1.html +++ b/NEON_manuscript/Case studies/mvgam_case_study1.html @@ -1,1253 +1,1253 @@ - - - - - - - - - - - - - - -mvgam case study 1: model comparison and data assimilation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - -

Generalised Additive Models (GAMs) are incredibly flexible tools that have found particular application in the analysis of time series. In ecology, a host of recent papers and workshops (i.e. the 2018 Ecological Society of America workshop on GAMs that was hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross) have drawn special attention to the wide range of applications that GAMs have for addressing complex ecological problems. Given the many ways that GAMs can model temporal data, it is tempting to use the smooth functions estimated by a GAM to produce out of sample forecasts. Here we will inspect the behaviours of smoothing splines when extrapolating to data outside the range of the training data to examine whether this can be useful in practice.

-

Here we will work in the mvgam package, which fits dynamic GAMs using MCMC sampling via the JAGS software (Note that JAGS 4.3.0 is required; installation links are found here). Briefly, assume \(\tilde{\boldsymbol{y}}_{t}\) is the conditional expectation of a discrete response variable \(\boldsymbol{y}\) at time \(\boldsymbol{t}\). Asssuming \(\boldsymbol{y}\) is drawn from an exponential distribution (such as Poisson or Negative Binomial) with a log link function, the linear predictor for a dynamic GAM is written as:

-

\[log(\tilde{\boldsymbol{y}}_{t})={\boldsymbol{B}}_0+\sum\limits_{i=1}^I\boldsymbol{s}_{i,t}\boldsymbol{x}_{i,t}+\boldsymbol{z}_{t}\,,\] Here \(\boldsymbol{B}_{0}\) is the unknown intercept, the \(\boldsymbol{s}\)'s are unknown smooth functions of the covariates (\(\boldsymbol{x}\)'s) and \(\boldsymbol{z}\) is a dynamic latent trend component. Each smooth function \(\boldsymbol{s}_{i}\) is composed of spline like basis expansions whose coefficients, which must be estimated, control the shape of the functional relationship between \(\boldsymbol{x}_{i}\) and \(log(\tilde{\boldsymbol{y}})\). The size of the basis expansion limits the smooth’s potential complexity, with a larger set of basis functions allowing greater flexibility. Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson or Negative Binomial) that accommodate ecological features such as zero-inflation, and that they can be formulated to include hierarchical smoothing for multivariate responses. For the dynamic component, in its most basic form we assume a random walk with drift:

-

\[\boldsymbol{z}_{t}=\phi+\boldsymbol{z}_{t-1}+\boldsymbol{e}_{t}\,,\] where \(\phi\) is an optional drift parameter (if the latent trend is assumed to not be stationary) and \(\boldsymbol{e}\) is drawn from a zero-centred Gaussian distribution. This model is easily modified to include autoregressive terms, which mvgam accomodates up to order = 3.

-
-

AirPassengers example

-

First we load the AirPassengers data from the forecast package and convert to an xts object. This series is a good starting point as it should be highly forecastable given its stable seasonal pattern and nearly linear trend

-
# devtools::install_github('nicholasjclark/mvgam')
-library(mvgam)
-library(dplyr)
-library(xts)
-library(forecast)
-data("AirPassengers")
-series <- xts::as.xts(floor(AirPassengers))
-colnames(series) <- c("Air")
-

View the raw series, its STL decomposition and the distribution of observations. There is a clear seasonal pattern as well as an increasing trend over time for this series, and the distribution shows evidence of a skew suggestive of overdispersion

-
plot(series)
-

-
plot(stl(series, s.window = "periodic"))
-

-
hist(series)
-

-

Next use the series_to_mvgam function, which converts ts or xts objects into the correct format for mvgam. Here we set train_prop = 0.75, meaning we will include ~ 75% of the observations in the training set and use the remaining ~25% for forecast validation. We also randomly set ~10% of observations to NA so that we can evaluate models based on their predictive abilities

-
series[sample(1:length(series), floor(length(series)/10), 
-    F)] <- NA
-fake_data <- series_to_mvgam(series, freq = 12, 
-    train_prop = 0.75)
-

Examine the returned object

-
head(fake_data$data_train)
-
- -
-
head(fake_data$data_test)
-
- -
-

As a first pass at modelling this series, we will fit a GAM that includes a seasonal smooth and a yearly smooth, as well as their tensor product interaction. As the seasonality is cyclic, we will use a cyclic cubic regression spline for season. The knots are set so that the boundaries of the cyclic smooth match up between December 31 and January 1. We will stick with the default thin plate regression spline for year. This is similar to what we might do when fitting a model in mgcv to try and forecast ahead, except here we also have an explicit model for the residual component. mvgam uses the jagam function from the mgcv package to generate a skeleton JAGS file and updates that file to incorporate any dynamic trend components (so far limited to no trend, random walks or AR models up to order 3). This is advantageous as any GAM that is allowed in mgcv can in principle be used in mvgam to fit dynamic linear models with smooth functions for nonlinear covariate effects. For multivariate series, mvgam also includes an option for dimension reduction by inducing trend correlations via a dynamic factor process. Here we use the Negative Binomial family and a burnin length of 2000 iterations (in practice we would probably run these models for a bit longer, but they tend to converge rather quickly for univariate series thanks to the useful starting values supplied by jagam). Note that feeding the data_test object does not mean that these values are used in training of the model. Rather, they are included as NA so that we can automatically create a forecast from the posterior predictions for these observations. This is useful for plotting forecasts without needing to run new data through the model's equations later on

-
mod1 <- mvjagam(data_train = fake_data$data_train, 
-    data_test = fake_data$data_test, formula = y ~ 
-        s(season, bs = c("cc"), k = 12) + 
-            s(year, k = 5) + ti(season, year, 
-            bs = c("cc", "tp"), k = c(12, 
-                3)), knots = list(season = c(0.5, 
-        12.5)), family = "nb", trend_model = "None", 
-    burnin = 2000, chains = 4)
-
## Compiling rjags model...
-## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
-## 'localhost'
-## Simulation complete
-## Note: Summary statistics were not produced as there are >50 monitored
-## variables
-## [To override this behaviour see ?add.summary and ?runjags.options]
-## FALSEFinished running the simulation
-## NOTE: Stopping adaptation
-

We can view the JAGS model file that has been generated to see the additions that have been made to the base gam model. If the user selects return_jags_data = TRUE when calling mvjagam, this file can be modified and the resulting jags_data object can also be modified to fit more complex Bayesian models. Note that here the AR and phi terms have been set to zero as this model does not include a dynamic trend component

-
mod1$model_file
-
##  [1] "model {"                                                        
-##  [2] ""                                                               
-##  [3] "## GAM linear predictor"                                        
-##  [4] "eta <- X %*% b"                                                 
-##  [5] ""                                                               
-##  [6] "## Mean expectations"                                           
-##  [7] "for (i in 1:n) {"                                               
-##  [8] " for (s in 1:n_series) {"                                       
-##  [9] "  mu[i, s] <- exp(eta[ytimes[i, s]] + trend[i, s])"             
-## [10] " }"                                                             
-## [11] "}"                                                              
-## [12] ""                                                               
-## [13] ""                                                               
-## [14] "## State space trend estimates"                                 
-## [15] "for(s in 1:n_series) {"                                         
-## [16] " trend[1, s] <- 0"                                              
-## [17] "}"                                                              
-## [18] ""                                                               
-## [19] "for(s in 1:n_series) {"                                         
-## [20] " trend[2, s] <- 0"                                              
-## [21] "}"                                                              
-## [22] ""                                                               
-## [23] "for(s in 1:n_series) {"                                         
-## [24] " trend[3, s] <- 0"                                              
-## [25] "}"                                                              
-## [26] ""                                                               
-## [27] "for (i in 4:n){"                                                
-## [28] " for (s in 1:n_series){"                                        
-## [29] " trend[i, s] <- 0"                                              
-## [30] " }"                                                             
-## [31] "}"                                                              
-## [32] ""                                                               
-## [33] "## AR components"                                               
-## [34] "for (s in 1:n_series){"                                         
-## [35] " phi[s] <- 0"                                                   
-## [36] " ar1[s] <- 0"                                                   
-## [37] " ar2[s] <- 0"                                                   
-## [38] " ar3[s] <- 0"                                                   
-## [39] " tau[s] <- pow(sigma[s], -2)"                                   
-## [40] " sigma[s] ~ dexp(1)"                                            
-## [41] "}"                                                              
-## [42] ""                                                               
-## [43] "## Negative binomial likelihood functions"                      
-## [44] "for (i in 1:n) {"                                               
-## [45] " for (s in 1:n_series) {"                                       
-## [46] "  y[i, s] ~ dnegbin(rate[i, s], r)"                             
-## [47] "  rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps,"
-## [48] "                      (r / (r + mu[i, s])))"                    
-## [49] " }"                                                             
-## [50] "}"                                                              
-## [51] "r ~ dexp(0.001)"                                                
-## [52] ""                                                               
-## [53] "## Posterior predictions"                                       
-## [54] "for (i in 1:n) {"                                               
-## [55] " for (s in 1:n_series) {"                                       
-## [56] "  ypred[i, s] ~ dnegbin(rate[i, s], r)"                         
-## [57] " }"                                                             
-## [58] "}"                                                              
-## [59] ""                                                               
-## [60] "  ## Parametric effect priors CHECK tau=1/54^2 is appropriate!" 
-## [61] "  for (i in 1:1) { b[i] ~ dnorm(p_coefs[i], tau_para[i])"       
-## [62] "   sigma_para[i] ~ dexp(2)"                                     
-## [63] " tau_para[i] <- pow(sigma_para[i], -2) }"                       
-## [64] "  ## prior for s(season)... "                                   
-## [65] "  K1 <- S1[1:10,1:10] * lambda[1] "                             
-## [66] "  b[2:11] ~ dmnorm(zero[2:11],K1) "                             
-## [67] "  ## prior for s(year)... "                                     
-## [68] "  K2 <- S2[1:4,1:4] * lambda[2]  + S2[1:4,5:8] * lambda[3]"     
-## [69] "  b[12:15] ~ dmnorm(zero[12:15],K2) "                           
-## [70] "  ## prior for ti(season,year)... "                             
-## [71] "  K3 <- S3[1:20,1:20] * lambda[4]  + S3[1:20,21:40] * lambda[5]"
-## [72] "  b[16:35] ~ dmnorm(zero[16:35],K3) "                           
-## [73] "  ## smoothing parameter priors CHECK..."                       
-## [74] "  for (i in 1:5) {"                                             
-## [75] "   lambda[i] ~ dexp(0.05)"                                      
-## [76] "    rho[i] <- log(lambda[i])"                                   
-## [77] "  }"                                                            
-## [78] "}"
-

Inspect the summary from the model, which is somewhat similar to an mgcv model summary with extra information about convergences for unobserved parameters. The estimated degrees of freedom, smooth coefficients and smooth penalties are all extracted from the mvgam model using sim2jam so that approximate p-values can be calculated using Nychka's method (following Wood (2013) Biometrika 100(1), 221-228). Note however that this function is still in development and approximate p-values may not be entirely accurate

-
summary_mvgam(mod1)
-
## GAM formula:
-
## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5) + ti(season, 
-##     year, bs = c("cc", "tp"), k = c(12, 3))
-
## 
-
## Family:
-
## Negative Binomial
-
## 
-
## N series:
-
## 1
-
## 
-
## N observations per series:
-
## 108
-
## 
-
## Dispersion parameter estimate:
-
##       2.5%      50%    97.5% Rhat n.eff
-## r 1166.408 2840.465 6488.955    1  5303
-
## 
-
## GAM smooth term approximate significances:
-
##                    edf Ref.df      F p-value    
-## s(season)        9.237 10.000 39.059  <2e-16 ***
-## s(year)          3.023  4.000  2.884  0.0027 ** 
-## ti(season,year) 12.647 20.000  0.015  0.9558    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
## 
-
## GAM coefficient (beta) estimates:
-
##                           2.5%          50%       97.5% Rhat n.eff
-## (Intercept)         5.36161110  5.377037202  5.39078643 1.00  4812
-## s(season).1        -0.27395125 -0.195683936 -0.12748834 1.01   373
-## s(season).2        -0.15882576 -0.108986878 -0.06058414 1.00   672
-## s(season).3        -0.05005644  0.007374626  0.06057666 1.01   544
-## s(season).4        -0.08352341 -0.036483031  0.01123565 1.01   719
-## s(season).5         0.01366607  0.062456843  0.10939919 1.01   688
-## s(season).6         0.17729638  0.226980545  0.27415732 1.01   666
-## s(season).7         0.16115314  0.201809290  0.24212203 1.00   990
-## s(season).8         0.00527439  0.058534115  0.10911778 1.00   662
-## s(season).9        -0.19956690 -0.147148431 -0.09850744 1.00   662
-## s(season).10       -0.20269857 -0.126225004 -0.05640364 1.01   376
-## s(year).1          -0.11042219 -0.030354594  0.02468973 1.03   211
-## s(year).2          -0.10293331  0.012803800  0.17404624 1.13    76
-## s(year).3          -0.09806499  0.040268930  0.23528683 1.12    73
-## s(year).4           0.31935661  0.369656749  0.43594345 1.01   252
-## ti(season,year).1  -0.05295287  0.055320955  0.15291445 1.01   300
-## ti(season,year).2  -0.12990817 -0.038994350  0.04204320 1.01   373
-## ti(season,year).3  -0.03642839  0.053236795  0.14769136 1.02   312
-## ti(season,year).4  -0.13084884 -0.051034084  0.02517270 1.00   425
-## ti(season,year).5  -0.08023771  0.010951993  0.12402632 1.02   267
-## ti(season,year).6  -0.10168859 -0.020381921  0.05678394 1.01   433
-## ti(season,year).7  -0.12291677 -0.035747887  0.05794980 1.01   316
-## ti(season,year).8  -0.05371343  0.019790513  0.09595653 1.02   428
-## ti(season,year).9  -0.13322484 -0.042223151  0.05111752 1.01   319
-## ti(season,year).10 -0.02453354  0.050234988  0.12213985 1.01   510
-## ti(season,year).11 -0.13620192 -0.044711164  0.04004527 1.01   344
-## ti(season,year).12 -0.03258447  0.048864148  0.12770832 1.01   415
-## ti(season,year).13 -0.11312038 -0.026994407  0.05857484 1.01   355
-## ti(season,year).14 -0.02419923  0.041417377  0.10786147 1.02   601
-## ti(season,year).15 -0.09920499 -0.005217718  0.08684654 1.02   329
-## ti(season,year).16 -0.07354610  0.006287603  0.07916318 1.01   392
-## ti(season,year).17 -0.10774977 -0.011918860  0.07952767 1.03   292
-## ti(season,year).18 -0.08303156 -0.006336232  0.06698698 1.04   395
-## ti(season,year).19 -0.09187687  0.003976165  0.09731229 1.03   279
-## ti(season,year).20 -0.10244777 -0.018392718  0.06748642 1.03   355
-
## 
-
## GAM smoothing parameter (rho) estimates:
-
##                        2.5%      50%    97.5% Rhat n.eff
-## s(season)         3.4269741 4.403783 5.164925 1.00  2766
-## s(year)           1.5139504 3.304091 4.547796 1.01  1488
-## s(year)2         -0.0783527 2.290532 3.672388 1.00  7751
-## ti(season,year)   3.6016565 4.601123 5.345439 1.00  4090
-## ti(season,year)2  2.5125925 3.899878 4.854497 1.01  2398
-
## 
-
## Latent trend drift (phi) and AR parameter estimates:
-
##     2.5% 50% 97.5% Rhat n.eff
-## phi    0   0     0  NaN     0
-## ar1    0   0     0  NaN     0
-## ar2    0   0     0  NaN     0
-## ar3    0   0     0  NaN     0
-
## 
-

mvgam takes a fitted gam model and adapts the model file to fit in JAGS, with possible extensions to deal with stochastic trend components and other features. But as this model has not been modified much from the original gam model with the same formula (i.e. we have not added any stochastic trend components to the linear predictor yet, which is the main feature of mvgam), the summary of the unmodified gam model is also useful and fairly accurate

-
summary(mod1$mgcv_model)
-
## 
-## Family: Negative Binomial(944980754.781) 
-## Link function: log 
-## 
-## Formula:
-## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5) + ti(season, 
-##     year, bs = c("cc", "tp"), k = c(12, 3))
-## 
-## Parametric coefficients:
-##             Estimate Std. Error t value Pr(>|t|)    
-## (Intercept) 5.389265   0.006976   772.6   <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##                   edf Ref.df        F p-value    
-## s(season)       7.820 10.000   32.326  <2e-16 ***
-## s(year)         1.451  1.763 1404.871  <2e-16 ***
-## ti(season,year) 1.771 20.000    0.287  0.0274 *  
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =  0.989   Deviance explained = 98.9%
-## fREML = 138.54  Scale est. = 1         n = 100
-

Ordinarily we would be quite pleased with this result, as we have explained most of the variation in the series with a fairly simple model. We can plot the estimated smooth functions and with their associated credible intervals, which are interpreted similarly to mgcv plots except they are not centred at zero (mvgam predicts the smooth with all other covariates at their means, rather than at zero). So the plot below, for the season smooth, shows the estimated values for the linear predictor (on the log scale in this case) if year was set to its mean

-
plot_mvgam_smooth(mod1, series = 1, smooth = "season")
-

-

And here is the plot of the smooth function for year, which has essentially estimated a straight line

-
plot_mvgam_smooth(mod1, series = 1, smooth = "year")
-

-

Perform a series of posterior predictive checks to see if the model is able to simulate data for the training period that looks realistic and unbiased. First, examine simulated kernel densities for posterior predictions (yhat) and compare to the density of the observations (y)

-
plot_mvgam_ppc(mod1, series = 1, type = "density", 
-    legend_position = "bottomright")
-

-

Now plot the distribution of predicted means compared to the observed mean

-
plot_mvgam_ppc(mod1, series = 1, type = "mean")
-

-

Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions (yhat) and compare to the CDF of the observations (y)

-
plot_mvgam_ppc(mod1, series = 1, type = "cdf")
-

-

Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform

-
plot_mvgam_ppc(mod1, series = 1, type = "pit")
-

-

All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Now for some investigation of the estimated relationships and forecasts. We can also perform residual diagnostics using randomised quantile (Dunn-Smyth) residuals. These look reasonable overall, though there is some autocorrelation left in the residuals for this series

-
plot_mvgam_resids(mod1, series = 1)
-

-

Ok so the model is doing well when fitting against the training data, but how are its forecasts? The yearly trend is being extrapolated into the future, which controls most of the shape and uncertainty in the forecast. We see there is a reasonable estimate of uncertainty and the out of sample observations (to the right of the dashed line) are all within the model's 95% HPD intervals

-
plot_mvgam_fc(mod1, series = 1, data_test = fake_data$data_test)
-

-

The extrapolation of the smooth for year can be better viewed by feeding new data to the plot_mvgam_smooth function. Here we feed values of year to cover the training and testing set to see how the extrapolation would continue into the future. The dashed line marks the end of the training period

-
plot_mvgam_smooth(mod1, series = 1, "year", 
-    newdata = expand.grid(year = seq(min(fake_data$data_train$year), 
-        max(fake_data$data_test$year), length.out = 500), 
-        season = mean(fake_data$data_train$season), 
-        series = unique(fake_data$data_train$series)))
-abline(v = max(fake_data$data_train$year), 
-    lty = "dashed")
-

-

We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better insight into how the model is performing and whether it is able to simulate realistic and unbiased future values

-
plot_mvgam_ppc(mod1, series = 1, type = "density", 
-    data_test = fake_data$data_test, legend_position = "bottomright")
-

-
plot_mvgam_ppc(mod1, series = 1, type = "mean", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod1, series = 1, type = "cdf", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod1, series = 1, type = "pit", 
-    data_test = fake_data$data_test)
-

-

There are some very clear problems with the way this model is generating future predictions. Perhaps a different smooth function for year can help? Here we fit our original model again but use a different smooth term for year to try and capture the long-term trend (using B splines with multiple penalties, following the excellent example by Gavin Simpson about extrapolating with smooth terms). This is similar to what we might do when trying to forecast ahead from a more wiggly function, as B splines have useful properties by allowing the penalty to be extended into the range of values we wish to predict (in this case, the years in data_test)

-
mod2 <- mvjagam(data_train = fake_data$data_train, 
-    data_test = fake_data$data_test, formula = y ~ 
-        s(season, bs = c("cc"), k = 12) + 
-            s(year, bs = "bs", m = c(3, 2, 
-                1, 0)) + ti(season, year, 
-            bs = c("cc", "tp"), k = c(12, 
-                3), m = c(2, 1)), knots = list(season = c(0.5, 
-        12.5), year = c(min(fake_data$data_train$year) - 
-        1, min(fake_data$data_train$year), 
-        max(fake_data$data_train$year), max(fake_data$data_test$year))), 
-    family = "nb", trend_model = "None", 
-    burnin = 2000, chains = 4)
-
## Compiling rjags model...
-## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
-## 'localhost'
-## Simulation complete
-## Note: Summary statistics were not produced as there are >50 monitored
-## variables
-## [To override this behaviour see ?add.summary and ?runjags.options]
-## FALSEFinished running the simulation
-## NOTE: Stopping adaptation
-

Again as we haven't modified the base gam much, the summary from the mgcv model is fairly accurate

-
summary_mvgam(mod2)
-
## GAM formula:
-
## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3, 
-##     2, 1, 0)) + ti(season, year, bs = c("cc", "tp"), k = c(12, 
-##     3), m = c(2, 1))
-
## 
-
## Family:
-
## Negative Binomial
-
## 
-
## N series:
-
## 1
-
## 
-
## N observations per series:
-
## 108
-
## 
-
## Dispersion parameter estimate:
-
##       2.5%      50%    97.5% Rhat n.eff
-## r 1334.131 2672.686 7620.356 1.93   118
-
## 
-
## GAM smooth term approximate significances:
-
##                    edf Ref.df      F p-value    
-## s(season)        8.015 10.000 68.888  <2e-16 ***
-## s(year)          2.378  9.000 33.149  <2e-16 ***
-## ti(season,year)  2.465 10.000  0.487  0.0911 .  
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
## 
-
## GAM coefficient (beta) estimates:
-
##                            2.5%          50%        97.5% Rhat n.eff
-## (Intercept)         5.362070579  5.376806898  5.390820518 1.03  3785
-## s(season).1        -0.213430308 -0.151517156 -0.101007334 1.02   500
-## s(season).2        -0.148922301 -0.099921057 -0.055643721 1.02   676
-## s(season).3        -0.028226483  0.016931885  0.059414423 1.01   792
-## s(season).4        -0.061996975 -0.017003318  0.025089712 1.00   670
-## s(season).5         0.038099953  0.079826082  0.120362340 1.00   834
-## s(season).6         0.199688149  0.239707637  0.282409200 1.00   840
-## s(season).7         0.174262894  0.212309518  0.248851873 1.00   996
-## s(season).8         0.019906807  0.065582961  0.109381762 1.01   763
-## s(season).9        -0.169165279 -0.122570964 -0.078652162 1.02   636
-## s(season).10       -0.153424964 -0.102932723 -0.053071421 1.02   577
-## s(year).1          -0.999313467 -0.882993791  0.224713835 3.06    33
-## s(year).2          -0.450536661 -0.294986783 -0.267501857 1.69   158
-## s(year).3          -0.187506134 -0.062664032 -0.013656990 1.91   152
-## s(year).4           0.144094577  0.196083560  0.298539501 1.31    80
-## s(year).5           0.234552496  0.390496398  0.440267686 2.11    78
-## s(year).6           0.520509735  0.655588193  0.704282327 1.55   124
-## s(year).7           0.708036658  0.773862240  0.885204923 1.30   164
-## s(year).8          -0.038655710  0.872917548  0.969214795 2.61    41
-## s(year).9          -0.693474072  1.123709900  1.271857701 5.32    28
-## ti(season,year).1  -0.043038651 -0.016464163  0.003306236 1.02    42
-## ti(season,year).2  -0.038248661 -0.015365567  0.006086726 1.07    59
-## ti(season,year).3  -0.034843090 -0.006495594  0.020358404 1.18    46
-## ti(season,year).4  -0.017460649  0.006593620  0.033680203 1.32    36
-## ti(season,year).5  -0.009328059  0.015302505  0.044225325 1.27    37
-## ti(season,year).6  -0.003194770  0.021267455  0.046238239 1.05    44
-## ti(season,year).7  -0.004592578  0.018187670  0.044772543 1.11    41
-## ti(season,year).8  -0.011737224  0.012365263  0.037459252 1.17    58
-## ti(season,year).9  -0.020332936  0.001175399  0.020449767 1.26    54
-## ti(season,year).10 -0.025481801 -0.004238640  0.013211187 1.28    46
-
## 
-
## GAM smoothing parameter (rho) estimates:
-
##                        2.5%        50%      97.5% Rhat n.eff
-## s(season)          5.263804   6.420443   7.411276 1.01  1252
-## s(year)            4.116279  10.065229  12.064278 4.51    55
-## s(year)2          -1.193273   1.246155   2.616958 1.02  7090
-## s(year)3         -17.333931 -13.882385 -12.211856 1.00  7055
-## ti(season,year)    8.243846  10.029634  11.275795 1.02   246
-## ti(season,year)2 -13.562819 -10.142882  -8.485700 1.00  6398
-
## 
-
## Latent trend drift (phi) and AR parameter estimates:
-
##     2.5% 50% 97.5% Rhat n.eff
-## phi    0   0     0  NaN     0
-## ar1    0   0     0  NaN     0
-## ar2    0   0     0  NaN     0
-## ar3    0   0     0  NaN     0
-
## 
-
summary(mod2$mgcv_model)
-
## 
-## Family: Negative Binomial(572232647.608) 
-## Link function: log 
-## 
-## Formula:
-## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3, 
-##     2, 1, 0)) + ti(season, year, bs = c("cc", "tp"), k = c(12, 
-##     3), m = c(2, 1))
-## 
-## Parametric coefficients:
-##             Estimate Std. Error t value Pr(>|t|)    
-## (Intercept) 5.389254   0.006977   772.4   <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##                   edf Ref.df       F p-value    
-## s(season)       7.819     10  32.353  <2e-16 ***
-## s(year)         1.640      7 352.636  <2e-16 ***
-## ti(season,year) 1.766     10   0.571  0.0278 *  
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =  0.989   Deviance explained = 98.9%
-## fREML = 138.86  Scale est. = 1         n = 100
-

This model explains even more of the variation than the thin plate yearly model above, so we'd be tempted to use it for prediction (though of course we'd want to perform a series of checks following advice from Simon Wood; see lectures by Gavin Simpson for more information on how to perform these checks). So how do the forecasts look?

-
plot_mvgam_fc(mod2, series = 1, data_test = fake_data$data_test)
-

-

Again the forecast is being driven primarily by the extrapolation behaviour of the B spline. Look at this behaviour for the year smooth as we did above by feeding new data for prediction

-
plot_mvgam_smooth(mod2, series = 1, "year", 
-    newdata = expand.grid(year = seq(min(fake_data$data_train$year), 
-        max(fake_data$data_test$year), length.out = 500), 
-        season = mean(fake_data$data_train$season), 
-        series = unique(fake_data$data_train$series)))
-abline(v = max(fake_data$data_train$year), 
-    lty = "dashed")
-

-

Residual and posterior in-training predictive checks for this model will look similar to the above, with some autocorrelation left in residuals but nothing terribly alarming. But the forecast checks again show some problems:

-
plot_mvgam_ppc(mod2, series = 1, type = "density", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod2, series = 1, type = "mean", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod2, series = 1, type = "cdf", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod2, series = 1, type = "pit", 
-    data_test = fake_data$data_test)
-

-

Can we improve the forecasts by removing our reliance on extrapolation? Now we will fit a model in which the GAM component of the linear predictor captures the repeated seasonality (again with a cyclic smooth) and a dynamic latent trend captures the residual process using AR parameters (up to order 3). This model is a mildly modified version of the base mgcv model where the linear predictor is augmented with the latent trend component. Slightly longer burnin is used here due to the added complexity of the time series component, but the model still fits in ~ 30 seconds on most machines

-
mod3 <- mvjagam(data_train = fake_data$data_train, 
-    data_test = fake_data$data_test, formula = y ~ 
-        s(season, bs = c("cc"), k = 12) + 
-            ti(season, year, bs = c("cc", 
-                "tp"), k = c(12, 3), m = c(2, 
-                1)), knots = list(season = c(0.5, 
-        12.5)), family = "nb", trend_model = "AR3", 
-    drift = TRUE, burnin = 2000, chains = 4)
-
## Compiling rjags model...
-## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
-## 'localhost'
-## Simulation complete
-## Note: Summary statistics were not produced as there are >50 monitored
-## variables
-## [To override this behaviour see ?add.summary and ?runjags.options]
-## FALSEFinished running the simulation
-## NOTE: Stopping adaptation
-

In this case the fitted model is more different to the base mgcv model that was used in jagam to produce the skeleton JAGS file, so the summary of that base model is less accurate. But we can still check the model summary for the mvjagam mdoel to examine convergence for key parameters

-
summary_mvgam(mod3)
-
## GAM formula:
-
## y ~ s(season, bs = c("cc"), k = 12) + ti(season, year, bs = c("cc", 
-##     "tp"), k = c(12, 3), m = c(2, 1))
-
## 
-
## Family:
-
## Negative Binomial
-
## 
-
## N series:
-
## 1
-
## 
-
## N observations per series:
-
## 108
-
## 
-
## Dispersion parameter estimate:
-
##       2.5%      50%    97.5% Rhat n.eff
-## r 1196.621 2948.698 6678.804    1  6741
-
## 
-
## GAM smooth term approximate significances:
-
##                       edf    Ref.df     F  p-value    
-## s(season)        5.667668 10.000000 2.004 0.000153 ***
-## ti(season,year)  0.001279 20.000000 0.000 0.996471    
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
## 
-
## GAM coefficient (beta) estimates:
-
##                             2.5%           50%         97.5% Rhat n.eff
-## (Intercept)         4.6854902871  4.750101e+00  4.8313016766 1.43    45
-## s(season).1        -0.1351084443 -8.320542e-02 -0.0390496157 1.01   472
-## s(season).2        -0.0921056962 -4.486880e-02 -0.0002825825 1.02   538
-## s(season).3        -0.0012647269  4.104691e-02  0.0858638736 1.01   656
-## s(season).4        -0.0322487483  1.093151e-02  0.0513603091 1.01   644
-## s(season).5         0.0549977555  9.608617e-02  0.1347515796 1.01   759
-## s(season).6         0.2005196547  2.402462e-01  0.2807636509 1.00   743
-## s(season).7         0.1671701980  2.040605e-01  0.2376592899 1.01   924
-## s(season).8        -0.0009857419  3.802109e-02  0.0762435832 1.00   850
-## s(season).9        -0.2105717582 -1.639122e-01 -0.1236429061 1.00   521
-## s(season).10       -0.2101943002 -1.634046e-01 -0.1186487128 1.00   545
-## ti(season,year).1  -0.0019294697  3.637336e-05  0.0023289937 1.32   169
-## ti(season,year).2  -0.0017799667 -6.150140e-05  0.0021345201 1.25   175
-## ti(season,year).3  -0.0020113169  1.671438e-06  0.0020285698 1.25   192
-## ti(season,year).4  -0.0021180786  1.266004e-05  0.0017667586 1.23   230
-## ti(season,year).5  -0.0016374544 -3.889510e-05  0.0016336857 1.22   252
-## ti(season,year).6  -0.0016751972  1.712028e-04  0.0018318152 1.24   252
-## ti(season,year).7  -0.0022636098  4.533724e-05  0.0016577034 1.20   171
-## ti(season,year).8  -0.0018772721  4.654626e-05  0.0023905603 1.18   153
-## ti(season,year).9  -0.0028771654  1.257391e-05  0.0014579699 1.24   168
-## ti(season,year).10 -0.0014505372 -6.230076e-05  0.0027049151 1.33   173
-## ti(season,year).11 -0.0019597910 -7.261990e-05  0.0018234465 1.25   237
-## ti(season,year).12 -0.0017144692  2.979686e-05  0.0023041621 1.28   184
-## ti(season,year).13 -0.0018457651  1.133942e-04  0.0019172022 1.24   266
-## ti(season,year).14 -0.0017974155  2.019889e-05  0.0019218896 1.24   210
-## ti(season,year).15 -0.0017507757 -4.452746e-06  0.0018043474 1.24   225
-## ti(season,year).16 -0.0020461346 -8.446822e-05  0.0019519123 1.22   144
-## ti(season,year).17 -0.0022132499  5.598248e-05  0.0016638093 1.35   163
-## ti(season,year).18 -0.0017969733 -6.662812e-05  0.0019232353 1.26   167
-## ti(season,year).19 -0.0016653811 -1.340496e-04  0.0014943742 1.23   270
-## ti(season,year).20 -0.0017798682  8.252861e-05  0.0018383077 1.25   204
-
## 
-
## GAM smoothing parameter (rho) estimates:
-
##                       2.5%       50%     97.5% Rhat n.eff
-## s(season)         5.805841  7.056992  8.120666 1.01   963
-## ti(season,year)  -4.638952 -1.362070  0.308262 1.00  6254
-## ti(season,year)2 11.091656 14.146277 17.685874 4.64    33
-
## 
-
## Latent trend drift (phi) and AR parameter estimates:
-
##            2.5%        50%      97.5% Rhat n.eff
-## phi  0.01348130 0.02565249 0.03748029 1.07   618
-## ar1 -0.14844196 0.28966886 0.76114118 1.01  1191
-## ar2 -0.12923925 0.33436562 0.76015759 1.00  1506
-## ar3 -0.07148772 0.37387705 0.79635136 1.00  1159
-
## 
-

The seasonal term is obviously still very important. Plot it here

-
plot_mvgam_smooth(mod3, series = 1, "season")
-

-

Plot diagnostics of posterior median Dunn-Smyth residuals, where the autocorrelation is now captured by the latent trend process

-
plot_mvgam_resids(mod3, series = 1)
-

-

How does the model's posterior forecast distribution compare to the previous models?

-
plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod3, series = 1, type = "density", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod3, series = 1, type = "mean", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod3, series = 1, type = "cdf", 
-    data_test = fake_data$data_test)
-

-
plot_mvgam_ppc(mod3, series = 1, type = "pit", 
-    data_test = fake_data$data_test)
-

-

None of the model's is a perfect representation of the data generating process, but the forecast for our dynamic GAM is more stable and more accurate than the previous models, with the added advantage that we can place more trust our estimated smooth for season because we have captured the residual autocorrelation. The posterior checks for our dynamic GAM also look much better than the two previous models. Plot the estimated latent dynamic trend (which is on the log scale)

-
plot_mvgam_trend(mod3, series = 1, data_test = fake_data$data_test)
-

-

Benchmarking against "null" models is a very important part of evaluating a proposed forecast model. After all, if our complex dynamic model can't generate better predictions then a random walk or mean forecast, is it really telling us anything new about the data-generating process? Here we examine the model comparison utilities in mvgam. Here we illustrate how this can be done in mvgam by fitting a simpler model by smoothing on a white noise covariate rather than on the seasonal variable. Because the white noise covariate is not informative and we are using a random walk for the trend process, this model essentially becomes a Poisson observation model over a dynamic random walk process.

-
fake_data$data_train$fake_cov <- rnorm(NROW(fake_data$data_train))
-fake_data$data_test$fake_cov <- rnorm(NROW(fake_data$data_test))
-mod4 <- mvjagam(data_train = fake_data$data_train, 
-    data_test = fake_data$data_test, formula = y ~ 
-        s(fake_cov, k = 3), family = "poisson", 
-    trend_model = "RW", drift = TRUE, burnin = 3000, 
-    chains = 4)
-
## Compiling rjags model...
-## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
-## 'localhost'
-## Simulation complete
-## Note: Summary statistics were not produced as there are >50 monitored
-## variables
-## [To override this behaviour see ?add.summary and ?runjags.options]
-## FALSEFinished running the simulation
-## NOTE: Stopping adaptation
-

Look at this model's proposed forecast

-
plot_mvgam_fc(mod4, series = 1, data_test = fake_data$data_test)
-

-

Here we showcase how different dynamic models can be compared using rolling probabilistic forecast evaluation, which is especially useful if we don't already have out of sample observations for comparing forecasts. This function sets up a sequence of evaluation timepoints along a rolling window within the training data to evaluate 'out-of-sample' forecasts. The trends are rolled forward a total of fc_horizon timesteps according to their estimated state space dynamics to generate an 'out-of-sample' forecast that is evaluated against the true observations in the horizon window. We are therefore simulating a situation where the model's parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts that consider the possible future paths for the latent trends and the true observed values for any other covariates in the horizon window. Evaluation involves calculating the Discrete Rank Probability Score and a binary indicator for whether or not the true value lies within the forecast's 90% prediction interval. For this test we compare the two models on the exact same sequence of 30 evaluation points using horizon = 6

-
compare_mvgams(model1 = mod3, model2 = mod4, 
-    fc_horizon = 6, n_evaluations = 30, n_cores = 3)
-
## DRPS summaries per model (lower is better)
-##             Min.   1st Qu.    Median      Mean   3rd Qu.     Max. NA's
-## Model 1 2.800801  3.885114  5.058327  5.778573  5.993394 21.73604   13
-## Model 2 4.658159 10.698314 16.765719 23.569713 27.393819 99.26183   13
-## 
-## 90% interval coverages per model (closer to 0.9 is better)
-## Model 1 0.988024 
-## Model 2 0.9461078
-

-

The series of plots generated by compare_mvgams clearly show that the first dynamic model generates better predictions. In each plot, DRPS for the forecast horizon is lower for the first model than for the second model. This kind of evaluation is often more appropriate for forecast models than complexity-penalising fit metrics such as AIC or BIC However, comparing forecasts of the dynamic models against the two models with yearly smooth terms (mod1 and mod2) using the rolling window approach is actually not recommended, as the yearly smooth models have already seen all the possible in-sample values of year and so should be able to predict incredibly well by interpolating through the range of the fitted smooth. By contrast, the dynamic component in our dynamic GAMs (models 3 and 4) produce true forecasts when running the rolling window approach. Nevertheless, when we compare some of these models (here mod1 with the thin plate yearly smooth vs mod3 with the AR3 trend process) as we did above for the random walk model, we still find that our dynamic GAM produces superior probabilistic forecasts

-
compare_mvgams(model1 = mod1, model2 = mod3, 
-    fc_horizon = 6, n_evaluations = 30, n_cores = 3)
-
## DRPS summaries per model (lower is better)
-##              Min.   1st Qu.    Median      Mean   3rd Qu.     Max. NA's
-## Model 1 19.413509 30.945251 38.888128 41.390934 48.733559 79.53264   13
-## Model 2  2.740936  3.891306  5.054429  5.800003  5.977418 21.87450   13
-## 
-## 90% interval coverages per model (closer to 0.9 is better)
-## Model 1 1 
-## Model 2 0.994012
-

-

The same holds true when comparing against the B spline model (mod2)

-
compare_mvgams(model1 = mod2, model2 = mod3, 
-    fc_horizon = 6, n_evaluations = 30, n_cores = 3)
-
## DRPS summaries per model (lower is better)
-##             Min.   1st Qu.    Median      Mean   3rd Qu.     Max. NA's
-## Model 1 18.49409 29.184917 37.871936 38.060259 45.877689 65.60411   13
-## Model 2  2.75831  3.876155  5.066004  5.785132  6.081017 20.60188   13
-## 
-## 90% interval coverages per model (closer to 0.9 is better)
-## Model 1 1 
-## Model 2 0.994012
-

-

Now we proceed by exploring how forecast distributions from an mvgam object can be automatically updated in light of new incoming observations. This works by generating a set of "particles" that each captures a unique proposal about the current state of the system (in this case, the current estimate of the latent trend component). The next observation in data_assim is assimilated and particles are weighted by how well their proposal (i.e. their proposed forecast, prior to seeing the new data) matched the new observations. For univariate models such as the ones we've fitted so far, this weight is represented by the proposal's Negative Binomial log-likelihood. For multivariate models, a multivariate composite likelihood is used for weights. Once weights are calculated, we use importance sampling to update the model's forecast distribution for the remaining forecast horizon. Begin by initiating a set of 5000 particles by assimilating the next observation in data_test and storing the particles in the default location (in a directory called particles within the working directory)

-
pfilter_mvgam_init(object = mod3, n_particles = 5000, 
-    n_cores = 2, data_assim = fake_data$data_test)
-
## Saving particles to pfilter/particles.rda 
-##  ESS = 12.22888
-

Now we are ready to run the particle filter. This function will assimilate the next six out of sample observations in data_test and update the forecast after each assimilation step. This works in an iterative fashion by calculating each particle's weight, then using a kernel smoothing algorithm to "pull" low weight particles toward the high-likelihood space before assimilating the next observation. The strength of the kernel smoother is controlled by kernel_lambda, which in our experience works well when left to the default of 1. If the Effective Sample Size of particles drops too low, suggesting we are putting most of our belief in a very small set of particles, an automatic resampling step is triggered to increase particle diversity and reduce the chance that our forecast intervals become too narrow and incapable of adapting to changing conditions

-
pfilter_mvgam_online(data_assim = fake_data$data_test[1:7, 
-    ], n_cores = 2, kernel_lambda = 1)
-
## Particles have already assimilated one or more observations. Skipping these
-## 
-## Assimilating the next 6 observations
-## 
-## Effective sample size is 6.331851 ...
-## 
-## Smoothing particles ...
-## 
-## Effective sample size is 6.269106 ...
-## 
-## Smoothing particles ...
-## 
-## Effective sample size is 37.31189 ...
-## 
-## Smoothing particles ...
-## 
-## Effective sample size is 541.3617 ...
-## 
-## Smoothing particles ...
-## 
-## Effective sample size is 1934.641 ...
-## 
-## Smoothing particles ...
-## 
-## Effective sample size is 3116.806 ...
-## 
-## Smoothing particles ...
-## 
-## Last assimilation time was 7 1958 
-## 
-## Saving particles to pfilter/particles.rda 
-##  ESS = 3116.806
-

Once assimilation is complete, generate the updated forecast from the particles using the covariate information in remaining data_test observations. This function is designed to hopefully make it simpler to assimilate observations, as all that needs to be provided once the particles are initiated as a dataframe of test data in exactly the same format as the data that were used to train the initial model. If no new observations are found (observations are arranged by year and then by season so the consistent indexing of these two variables is very important!) then the function returns a NULL and the particles remain where they are in state space.

-
fc <- pfilter_mvgam_fc(file_path = "pfilter", 
-    n_cores = 2, data_test = fake_data$data_test, 
-    ylim = c(min(fake_data$data_train$y, 
-        na.rm = T), max(fake_data$data_test$y, 
-        na.rm = T) * 1.25))
-

Compare the updated forecast to the original forecast to see how it has changed in light of the most recent observations

-
par(mfrow = c(1, 2))
-plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test, 
-    ylim = c(min(fake_data$data_train$y, 
-        na.rm = T), max(fake_data$data_test$y, 
-        na.rm = T) * 1.25))
-fc$Air()
-

-

Here it is apparent that the distribution has shifted slightly in light of the 6 observations that have been assimilated, and that our confidence in the remaining forecast horizon has improved (tighter uncertainty intervals). This is an advantageous way of allowing a model to slowly adapt to new conditions while breaking free of restrictive assumptions about residual distributions. See some of the many particle filtering lectures by Nathaniel Osgood for more details. Remove the particles from their stored directory when finished

-
unlink("pfilter", recursive = T)
-
-
-

Lynx example

-

For our next univariate example, we will again pursue how challenging it can be to forecast ahead with conventional GAMs and how mvgam overcomes these challenges. We begin by replicating the lynx analysis from 2018 Ecological Society of America workshop on GAMs that was hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross, with some minor adjustments. First, load the data and plot the series as well as its estimated autocorrelation function

-
library(mvgam)
-data(lynx)
-lynx_full = data.frame(year = 1821:1934, 
-    population = as.numeric(lynx))
-plot(lynx_full$population, type = "l", ylab = "Lynx trappings", 
-    xlab = "Time")
-acf(lynx_full$population, main = "")
-

-

There is a clear ~19-year cyclic pattern to the data, so I create a season term that can be used to model this effect and give a better representation of the data generating process. I also create a new year term that represents which long-term cycle each observation is in

-
plot(stl(ts(lynx_full$population, frequency = 19), 
-    s.window = "periodic"))
-

-
lynx_full$season <- (lynx_full$year%%19) + 
-    1
-cycle_ends <- c(which(lynx_full$season == 
-    19), NROW(lynx_full))
-cycle_starts <- c(1, cycle_ends[1:length(which(lynx_full$season == 
-    19))] + 1)
-cycle <- vector(length = NROW(lynx_full))
-for (i in 1:length(cycle_starts)) {
-    cycle[cycle_starts[i]:cycle_ends[i]] <- i
-}
-lynx_full$year <- cycle
-

Add lag indicators needed to fit the nonlinear lag models that gave the best one step ahead point forecasts in the ESA workshop example. As in the example, we specify the default argument in the lag function as the mean log population.

-
mean_pop_l = mean(log(lynx_full$population))
-lynx_full = dplyr::mutate(lynx_full, popl = log(population), 
-    lag1 = dplyr::lag(popl, 1, default = mean_pop_l), 
-    lag2 = dplyr::lag(popl, 2, default = mean_pop_l), 
-    lag3 = dplyr::lag(popl, 3, default = mean_pop_l), 
-    lag4 = dplyr::lag(popl, 4, default = mean_pop_l), 
-    lag5 = dplyr::lag(popl, 5, default = mean_pop_l), 
-    lag6 = dplyr::lag(popl, 6, default = mean_pop_l))
-

For mvgam models, the response needs to be labelled y and we also need an indicator of the series name as a factor variable

-
lynx_full$y <- lynx_full$population
-lynx_full$series <- factor("series1")
-

Split the data into training (first 40 years) and testing (next 10 years of data) to evaluate multi-step ahead forecasts

-
lynx_train = lynx_full[1:40, ]
-lynx_test = lynx_full[41:50, ]
-

The best-forecasting model in the course was with nonlinear smooths of lags 1 and 2; we use those here is that we also include a cyclic smooth for the 19-year cycles as this seems like an important feature, as well as a yearly smooth for the long-term trend. Following the information about spline extrapolation above, we again fit a cubic B spline for the trend with a mix of penalties to try and reign in wacky extrapolation behaviours, and we extend the penalty to cover the years that we wish to predict. This will hopefully give us better uncertainty estimates for the forecast. In this example we assume the observations are Poisson distributed

-
lynx_mgcv = gam(population ~ s(season, bs = "cc", 
-    k = 19) + s(year, bs = "bs", m = c(3, 
-    2, 1, 0)) + s(lag1, k = 5) + s(lag2, 
-    k = 5), knots = list(season = c(0.5, 
-    19.5), year = c(min(lynx_train$year - 
-    1), min(lynx_train$year), max(lynx_test$year), 
-    max(lynx_test$year) + 1)), data = lynx_train, 
-    family = "poisson", method = "REML")
-

Inspect the model's summary and estimated smooth functions for the season, year and lag terms

-
summary(lynx_mgcv)
-
## 
-## Family: poisson 
-## Link function: log 
-## 
-## Formula:
-## population ~ s(season, bs = "cc", k = 19) + s(year, bs = "bs", 
-##     m = c(3, 2, 1, 0)) + s(lag1, k = 5) + s(lag2, k = 5)
-## 
-## Parametric coefficients:
-##             Estimate Std. Error z value Pr(>|z|)    
-## (Intercept) 6.666631   0.007511   887.6   <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## Approximate significance of smooth terms:
-##              edf Ref.df Chi.sq p-value    
-## s(season) 16.229 17.000 6770.2  <2e-16 ***
-## s(year)    1.990  2.000  244.2  <2e-16 ***
-## s(lag1)    3.984  3.999  712.0  <2e-16 ***
-## s(lag2)    3.892  3.993  488.1  <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-## 
-## R-sq.(adj) =  0.967   Deviance explained = 97.7%
-## -REML = 880.28  Scale est. = 1         n = 40
-
plot(lynx_mgcv, select = 1)
-plot(lynx_mgcv, select = 2)
-plot(lynx_mgcv, select = 3)
-plot(lynx_mgcv, select = 4)
-

-

This model captures most of the deviance in the series and the functions are all confidently estimated to be non-zero and non-flat. So far, so good. Now for some forecasts for the out of sample period. First we must take posterior draws of smooth beta coefficients to incorporate the uncertainties around smooth functions when simulating forecast paths

-
coef_sim <- gam.mh(lynx_mgcv)$bs
-

Now we define a function to perform forecast simulations from the nonlinear lag model in a recursive fashion. Using starting values for the last two lags, the function will iteratively project the path ahead with a random sample from the model's coefficient posterior

-
recurse_nonlin = function(model, lagged_vals, 
-    h) {
-    # Initiate state vector
-    states <- rep(NA, length = h + 2)
-    # Last two values of the conditional
-    # expectations begin the state vector
-    states[1] <- as.numeric(exp(lagged_vals[2]))
-    states[2] <- as.numeric(exp(lagged_vals[1]))
-    # Get a random sample of the smooth
-    # coefficient uncertainty matrix to use
-    # for the entire forecast horizon of this
-    # particular path
-    gam_coef_index <- sample(seq(1, NROW(coef_sim)), 
-        1, T)
-    # For each following timestep,
-    # recursively predict based on the
-    # predictions at each previous lag
-    for (t in 3:(h + 2)) {
-        # Build the GAM linear predictor matrix
-        # using the two previous lags of the
-        # (log) density
-        newdata <- data.frame(lag1 = log(states[t - 
-            1] + 0.01), lag2 = log(states[t - 
-            2] + 0.01), season = lynx_test$season[t - 
-            2], year = lynx_test$year[t - 
-            2])
-        colnames(newdata) <- c("lag1", "lag2", 
-            "season", "year")
-        Xp <- predict(model, newdata = newdata, 
-            type = "lpmatrix")
-        # Calculate the posterior prediction for
-        # this timepoint
-        mu <- rpois(1, lambda = exp(Xp %*% 
-            coef_sim[gam_coef_index, ]))
-        # Fill in the state vector and iterate to
-        # the next timepoint
-        states[t] <- mu
-    }
-    # Return the forecast path
-    states[-c(1:2)]
-}
-

Create the GAM's forecast distribution by generating 1000 simulated forecast paths. Each path is fed the true observed values for the last two lags of the first out of sample timepoint, but they can deviate when simulating ahead depending on their particular draw of possible coefficients. Note, this is a bit slow and could easily be parallelised to speed up computations

-
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
-for (i in 1:1000) {
-    gam_sims[i, ] <- recurse_nonlin(lynx_mgcv, 
-        lagged_vals = c(lynx_test$lag1[1], 
-            lynx_test$lag2[1]), h = 10)
-}
-

Plot the mgcv model's out of sample forecast for the next 10 years ahead

-
cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
-    0.95))
-yupper <- max(cred_ints) * 1.25
-plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)), 
-    type = "l", col = rgb(1, 0, 0, alpha = 0), 
-    ylim = c(0, yupper), ylab = "Predicted lynx trappings", 
-    xlab = "Forecast horizon", main = "mgcv")
-polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
-    c(cred_ints[1, ], rev(cred_ints[3, ])), 
-    col = rgb(150, 0, 0, max = 255, alpha = 100), 
-    border = NA)
-cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
-    0.68))
-polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
-    c(cred_ints[1, ], rev(cred_ints[3, ])), 
-    col = rgb(150, 0, 0, max = 255, alpha = 180), 
-    border = NA)
-lines(cred_ints[2, ], col = rgb(150, 0, 0, 
-    max = 255), lwd = 2, lty = "dashed")
-points(lynx_test$population[1:10], pch = 16)
-lines(lynx_test$population[1:10])
-

-

A decent forecast? The shape is certainly correct, but the 95% uncertainty intervals appear to be far too wide (i.e. our upper interval extends to up to 8 times the maximum number of trappings that have ever been recorded up to this point). This is almost entirely due to the extrapolation behaviour of the B spline, as the lag smooth functions are not encountering values very far outside the ranges they've already been trained on so they are resorting mostly to interpolation. But a better way to evaluate than simply using visuals is to calculate a probabilistic score. Here we use the Discrete Rank Probabiilty Score, which gives us an indication of how well calibrated our forecast's uncertainty intervals are by comparing the mass of the forecast density against the true observed values. Forecasts with overly wide intervals are penalised, as are forecasts with overly narrow intervals that do not contain the true observations. At the same time we calculate coverage of the forecast's 90% intervals, which is another useful way of evaluating different forecast proposals

-
# Discrete Rank Probability Score and
-# coverage of 90% interval
-drps_score <- function(truth, fc, interval_width = 0.9) {
-    nsum <- 1000
-    Fy = ecdf(fc)
-    ysum <- 0:nsum
-    indicator <- ifelse(ysum - truth >= 0, 
-        1, 0)
-    score <- sum((indicator - Fy(ysum))^2)
-    
-    # Is value within 90% HPD?
-    interval <- hpd(fc, interval_width)
-    in_interval <- ifelse(truth <= interval[3] & 
-        truth >= interval[1], 1, 0)
-    return(c(score, in_interval))
-}
-
-# Wrapper to operate on all observations
-# in fc_horizon
-drps_mcmc_object <- function(truth, fc, interval_width = 0.9) {
-    indices_keep <- which(!is.na(truth))
-    if (length(indices_keep) == 0) {
-        scores = data.frame(drps = rep(NA, 
-            length(truth)), interval = rep(NA, 
-            length(truth)))
-    } else {
-        scores <- matrix(NA, nrow = length(truth), 
-            ncol = 2)
-        for (i in indices_keep) {
-            scores[i, ] <- drps_score(truth = as.vector(truth)[i], 
-                fc = fc[, i], interval_width)
-        }
-    }
-    scores
-}
-
-# Calculate DRPS over the 10-year horizon
-# for the mgcv model
-lynx_mgcv1_drps <- drps_mcmc_object(truth = lynx_test$population[1:10], 
-    fc = gam_sims)
-

What if we remove the yearly trend and let the lag smooths capture more of the temporal dependencies? Will that improve the forecast distribution? Run a second model and plot the forecast (note that this plot will be on quite a different y-axis scale compared to the first plot above)

-
lynx_mgcv2 = gam(population ~ s(season, bs = "cc", 
-    k = 19) + s(lag1, k = 5) + s(lag2, k = 5), 
-    data = lynx_train, family = "poisson", 
-    method = "REML")
-coef_sim <- gam.mh(lynx_mgcv2)$bs
-gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
-for (i in 1:1000) {
-    gam_sims[i, ] <- recurse_nonlin(lynx_mgcv2, 
-        lagged_vals = c(lynx_test$lag1[1], 
-            lynx_test$lag2[1]), h = 10)
-}
-cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
-    0.95))
-yupper <- max(cred_ints) * 1.25
-plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)), 
-    type = "l", col = rgb(1, 0, 0, alpha = 0), 
-    ylim = c(0, yupper), ylab = "Predicted lynx trappings", 
-    xlab = "Forecast horizon", main = "mgcv")
-polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
-    c(cred_ints[1, ], rev(cred_ints[3, ])), 
-    col = rgb(150, 0, 0, max = 255, alpha = 100), 
-    border = NA)
-cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
-    0.68))
-polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
-    c(cred_ints[1, ], rev(cred_ints[3, ])), 
-    col = rgb(150, 0, 0, max = 255, alpha = 180), 
-    border = NA)
-lines(cred_ints[2, ], col = rgb(150, 0, 0, 
-    max = 255), lwd = 2, lty = "dashed")
-points(lynx_test$population[1:10], pch = 16)
-lines(lynx_test$population[1:10])
-

-

Calculate DRPS over the 10-year horizon for the second mgcv model

-
lynx_mgcv2_drps <- drps_mcmc_object(truth = lynx_test$population[1:10], 
-    fc = gam_sims)
-

This forecast is highly overconfident, with very unrealistic uncertainty intervals due to the interpolation behaviours of the lag smooths. You can certainly keep trying different formulations (our experience is that the B spline variant above produces the best forecasts from any tested mgcv model, but we did not test an exhaustive set), but hopefully it is clear that forecasting using splines is tricky business and it is likely that each time you do it you'll end up honing in on different combinations of penalties, knot selections etc.... Now we will fit an mvgam model for comparison. This model fits a similar model to the mgcv model directly above but with a full time series model for the errors (in this case an AR1 process), rather than smoothing splines that do not incorporate a concept of the future. We do not use a year term to reduce any possible extrapolation and because the latent dynamic component should capture this temporal variation. We estimate the model in JAGS using MCMC sampling (Note that JAGS 4.3.0 is required; installation links are found here).

-
lynx_mvgam <- mvjagam(data_train = lynx_train, 
-    data_test = lynx_test, formula = y ~ 
-        s(season, bs = "cc", k = 19), knots = list(season = c(0.5, 
-        19.5)), family = "poisson", trend_model = "AR1", 
-    burnin = 5000, chains = 4)
-
## Compiling rjags model...
-## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
-## 'localhost'
-## Simulation complete
-## Note: Summary statistics were not produced as there are >50 monitored
-## variables
-## [To override this behaviour see ?add.summary and ?runjags.options]
-## FALSEFinished running the simulation
-## NOTE: Stopping adaptation
-

Calculate the out of sample forecast from the fitted mvgam model and plot

-
fits <- MCMCvis::MCMCchains(lynx_mvgam$jags_output, 
-    "ypred")
-fits <- fits[, (NROW(lynx_mvgam$obs_data) + 
-    1):(NROW(lynx_mvgam$obs_data) + 10)]
-cred_ints <- apply(fits, 2, function(x) hpd(x, 
-    0.95))
-yupper <- max(cred_ints) * 1.25
-plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)), 
-    type = "l", col = rgb(1, 0, 0, alpha = 0), 
-    ylim = c(0, yupper), ylab = "", xlab = "Forecast horizon", 
-    main = "mvgam")
-polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
-    c(cred_ints[1, ], rev(cred_ints[3, ])), 
-    col = rgb(150, 0, 0, max = 255, alpha = 100), 
-    border = NA)
-cred_ints <- apply(fits, 2, function(x) hpd(x, 
-    0.68))
-polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
-    c(cred_ints[1, ], rev(cred_ints[3, ])), 
-    col = rgb(150, 0, 0, max = 255, alpha = 180), 
-    border = NA)
-lines(cred_ints[2, ], col = rgb(150, 0, 0, 
-    max = 255), lwd = 2, lty = "dashed")
-points(lynx_test$population[1:10], pch = 16)
-lines(lynx_test$population[1:10])
-

-

Calculate DRPS over the 10-year horizon for the mvgam model

-
lynx_mvgam_drps <- drps_mcmc_object(truth = lynx_test$population[1:10], 
-    fc = fits)
-

How do the out of sample DRPS scores stack up for these three models? Remember, our goal is to minimise DRPS while providing 90% intervals that are near, but not less than, 0.9. The DRPS and 90% interval coverage for the first mgcv model (with the B spline year term)

-
sum(lynx_mgcv1_drps[, 1])
-
## [1] 1576.198
-
mean(lynx_mgcv1_drps[, 2])
-
## [1] 0.8
-

For the second mgcv model

-
sum(lynx_mgcv2_drps[, 1])
-
## [1] 2041.019
-
mean(lynx_mgcv2_drps[, 2])
-
## [1] 0
-

And for the mvgam model

-
sum(lynx_mvgam_drps[, 1])
-
## [1] 737.153
-
mean(lynx_mvgam_drps[, 2])
-
## [1] 1
-

The mvgam has much more realistic uncertainty than the mgcv versions above. Of course this is just one out of sample comparison, and to really determine which model is most appropriate for forecasting we would want to run many of these tests using a rolling window approach. Have a look at this model's summary to see what is being estimated (note that longer MCMC runs would probably be needed to increase effective sample sizes)

-
summary_mvgam(lynx_mvgam)
-
## GAM formula:
-
## y ~ s(season, bs = "cc", k = 19)
-
## 
-
## Family:
-
## Poisson
-
## 
-
## N series:
-
## 1
-
## 
-
## N observations per series:
-
## 40
-
## 
-
## GAM smooth term approximate significances:
-
##           edf Ref.df Chi.sq p-value    
-## s(season)  16     17  123.1  <2e-16 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
## 
-
## GAM coefficient (beta) estimates:
-
##                    2.5%         50%       97.5% Rhat n.eff
-## (Intercept)   6.3147932  6.66215103  6.89250988 1.17    34
-## s(season).1  -1.3128155 -0.67418798 -0.07776714 1.25    97
-## s(season).2  -0.3940034  0.17016843  0.77839705 1.23    95
-## s(season).3   0.3739162  0.95359629  1.51739338 1.27    47
-## s(season).4   0.8946542  1.49719833  2.30096068 1.34    41
-## s(season).5   1.2548317  1.75534210  2.61569981 1.34    27
-## s(season).6   0.6014648  1.30104444  1.97618846 1.11    40
-## s(season).7  -0.5940465  0.08063823  0.69005974 1.06    68
-## s(season).8  -1.5610967 -0.86593061 -0.15707434 1.07   119
-## s(season).9  -1.9056630 -0.96684704 -0.22116190 1.02   113
-## s(season).10 -1.5232002 -0.51073646  0.20061146 1.04    83
-## s(season).11 -0.6061202  0.31370531  1.03244342 1.14    64
-## s(season).12  0.3937823  1.32583475  2.01290396 1.29    54
-## s(season).13  0.8141474  1.55456240  2.34072562 1.36    43
-## s(season).14  0.5380122  1.21424495  1.94509201 1.35    34
-## s(season).15 -0.4012842  0.14697983  0.76468321 1.18    87
-## s(season).16 -1.2059968 -0.62834129 -0.07789867 1.02   168
-## s(season).17 -1.5776786 -1.05558099 -0.48019170 1.05   119
-
## 
-
## GAM smoothing parameter (rho) estimates:
-
##               2.5%      50%    97.5% Rhat n.eff
-## s(season) 3.534611 4.446877 5.209046 1.01  1090
-
## 
-
## Latent trend drift (phi) and AR parameter estimates:
-
##         2.5%       50%     97.5% Rhat n.eff
-## phi 0.000000 0.0000000 0.0000000  NaN     0
-## ar1 0.466993 0.6939366 0.8978668    1  1259
-## ar2 0.000000 0.0000000 0.0000000  NaN     0
-## ar3 0.000000 0.0000000 0.0000000  NaN     0
-
## 
-

Now inspect each model's estimated smooth for the 19-year cyclic pattern. Note that the mvgam smooth plot is on a different scale compared to the mgcv plot, but interpretation is similar. The mgcv smooth is much wigglier, likely because it is compensating for any remaining autocorrelation not captured by the lag smooths. We could probably remedy this by reducing k in the seasonal smooth for the mgcv model (in practice this works well, but leaving k larger for the mvgam's seasonal smooth is recommended as our experience is that this tends to lead to better performance and convergence)

-
plot(lynx_mgcv, select = 1, shade = T)
-plot_mvgam_smooth(lynx_mvgam, 1, "season")
-

-

We can also view the mvgam's posterior predictions for the entire series (testing and training)

-
plot_mvgam_fc(lynx_mvgam, data_test = lynx_test)
-

-

And the estimated trend

-
plot_mvgam_trend(lynx_mvgam, data_test = lynx_test)
-

-

A key aspect of ecological forecasting is to understand how different components of a model contribute to forecast uncertainty. We can estimate contributions to forecast uncertainty for the GAM smooth functions and the latent trend using mvgam

-
plot_mvgam_uncertainty(lynx_mvgam, data_test = lynx_test)
-

-

Both components contribute to forecast uncertainty, with the trend component contributing more over time (as it should since this is the stochastic forecast component). This suggests we would still need some more work to learn about factors driving the dynamics of the system. But we will leave the model as-is for this example. Diagnostics of the model can also be performed using mvgam. Have a look at the model's residuals, which are posterior medians of Dunn-Smyth randomised quantile residuals so should follow approximate normality. We are primarily looking for a lack of autocorrelation, which would suggest our AR1 model is appropriate for the latent trend

-
plot_mvgam_resids(lynx_mvgam)
-

-
- - - - -
- - - - - - - - - - - - - - - + + + + + + + + + + + + + + +mvgam case study 1: model comparison and data assimilation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +

Generalised Additive Models (GAMs) are incredibly flexible tools that have found particular application in the analysis of time series. In ecology, a host of recent papers and workshops (i.e. the 2018 Ecological Society of America workshop on GAMs that was hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross) have drawn special attention to the wide range of applications that GAMs have for addressing complex ecological problems. Given the many ways that GAMs can model temporal data, it is tempting to use the smooth functions estimated by a GAM to produce out of sample forecasts. Here we will inspect the behaviours of smoothing splines when extrapolating to data outside the range of the training data to examine whether this can be useful in practice.

+

Here we will work in the mvgam package, which fits dynamic GAMs using MCMC sampling via the JAGS software (Note that JAGS 4.3.0 is required; installation links are found here). Briefly, assume \(\tilde{\boldsymbol{y}}_{t}\) is the conditional expectation of a discrete response variable \(\boldsymbol{y}\) at time \(\boldsymbol{t}\). Asssuming \(\boldsymbol{y}\) is drawn from an exponential distribution (such as Poisson or Negative Binomial) with a log link function, the linear predictor for a dynamic GAM is written as:

+

\[log(\tilde{\boldsymbol{y}}_{t})={\boldsymbol{B}}_0+\sum\limits_{i=1}^I\boldsymbol{s}_{i,t}\boldsymbol{x}_{i,t}+\boldsymbol{z}_{t}\,,\] Here \(\boldsymbol{B}_{0}\) is the unknown intercept, the \(\boldsymbol{s}\)'s are unknown smooth functions of the covariates (\(\boldsymbol{x}\)'s) and \(\boldsymbol{z}\) is a dynamic latent trend component. Each smooth function \(\boldsymbol{s}_{i}\) is composed of spline like basis expansions whose coefficients, which must be estimated, control the shape of the functional relationship between \(\boldsymbol{x}_{i}\) and \(log(\tilde{\boldsymbol{y}})\). The size of the basis expansion limits the smooth’s potential complexity, with a larger set of basis functions allowing greater flexibility. Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson or Negative Binomial) that accommodate ecological features such as zero-inflation, and that they can be formulated to include hierarchical smoothing for multivariate responses. For the dynamic component, in its most basic form we assume a random walk with drift:

+

\[\boldsymbol{z}_{t}=\phi+\boldsymbol{z}_{t-1}+\boldsymbol{e}_{t}\,,\] where \(\phi\) is an optional drift parameter (if the latent trend is assumed to not be stationary) and \(\boldsymbol{e}\) is drawn from a zero-centred Gaussian distribution. This model is easily modified to include autoregressive terms, which mvgam accomodates up to order = 3.

+
+

AirPassengers example

+

First we load the AirPassengers data from the forecast package and convert to an xts object. This series is a good starting point as it should be highly forecastable given its stable seasonal pattern and nearly linear trend

+
# devtools::install_github('nicholasjclark/mvgam')
+library(mvgam)
+library(dplyr)
+library(xts)
+library(forecast)
+data("AirPassengers")
+series <- xts::as.xts(floor(AirPassengers))
+colnames(series) <- c("Air")
+

View the raw series, its STL decomposition and the distribution of observations. There is a clear seasonal pattern as well as an increasing trend over time for this series, and the distribution shows evidence of a skew suggestive of overdispersion

+
plot(series)
+

+
plot(stl(series, s.window = "periodic"))
+

+
hist(series)
+

+

Next use the series_to_mvgam function, which converts ts or xts objects into the correct format for mvgam. Here we set train_prop = 0.75, meaning we will include ~ 75% of the observations in the training set and use the remaining ~25% for forecast validation. We also randomly set ~10% of observations to NA so that we can evaluate models based on their predictive abilities

+
series[sample(1:length(series), floor(length(series)/10), 
+    F)] <- NA
+fake_data <- series_to_mvgam(series, freq = 12, 
+    train_prop = 0.75)
+

Examine the returned object

+
head(fake_data$data_train)
+
+ +
+
head(fake_data$data_test)
+
+ +
+

As a first pass at modelling this series, we will fit a GAM that includes a seasonal smooth and a yearly smooth, as well as their tensor product interaction. As the seasonality is cyclic, we will use a cyclic cubic regression spline for season. The knots are set so that the boundaries of the cyclic smooth match up between December 31 and January 1. We will stick with the default thin plate regression spline for year. This is similar to what we might do when fitting a model in mgcv to try and forecast ahead, except here we also have an explicit model for the residual component. mvgam uses the jagam function from the mgcv package to generate a skeleton JAGS file and updates that file to incorporate any dynamic trend components (so far limited to no trend, random walks or AR models up to order 3). This is advantageous as any GAM that is allowed in mgcv can in principle be used in mvgam to fit dynamic linear models with smooth functions for nonlinear covariate effects. For multivariate series, mvgam also includes an option for dimension reduction by inducing trend correlations via a dynamic factor process. Here we use the Negative Binomial family and a burnin length of 2000 iterations (in practice we would probably run these models for a bit longer, but they tend to converge rather quickly for univariate series thanks to the useful starting values supplied by jagam). Note that feeding the data_test object does not mean that these values are used in training of the model. Rather, they are included as NA so that we can automatically create a forecast from the posterior predictions for these observations. This is useful for plotting forecasts without needing to run new data through the model's equations later on

+
mod1 <- mvjagam(data_train = fake_data$data_train, 
+    data_test = fake_data$data_test, formula = y ~ 
+        s(season, bs = c("cc"), k = 12) + 
+            s(year, k = 5) + ti(season, year, 
+            bs = c("cc", "tp"), k = c(12, 
+                3)), knots = list(season = c(0.5, 
+        12.5)), family = "nb", trend_model = "None", 
+    burnin = 2000, chains = 4)
+
## Compiling rjags model...
+## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
+## 'localhost'
+## Simulation complete
+## Note: Summary statistics were not produced as there are >50 monitored
+## variables
+## [To override this behaviour see ?add.summary and ?runjags.options]
+## FALSEFinished running the simulation
+## NOTE: Stopping adaptation
+

We can view the JAGS model file that has been generated to see the additions that have been made to the base gam model. If the user selects return_jags_data = TRUE when calling mvjagam, this file can be modified and the resulting jags_data object can also be modified to fit more complex Bayesian models. Note that here the AR and phi terms have been set to zero as this model does not include a dynamic trend component

+
mod1$model_file
+
##  [1] "model {"                                                        
+##  [2] ""                                                               
+##  [3] "## GAM linear predictor"                                        
+##  [4] "eta <- X %*% b"                                                 
+##  [5] ""                                                               
+##  [6] "## Mean expectations"                                           
+##  [7] "for (i in 1:n) {"                                               
+##  [8] " for (s in 1:n_series) {"                                       
+##  [9] "  mu[i, s] <- exp(eta[ytimes[i, s]] + trend[i, s])"             
+## [10] " }"                                                             
+## [11] "}"                                                              
+## [12] ""                                                               
+## [13] ""                                                               
+## [14] "## State space trend estimates"                                 
+## [15] "for(s in 1:n_series) {"                                         
+## [16] " trend[1, s] <- 0"                                              
+## [17] "}"                                                              
+## [18] ""                                                               
+## [19] "for(s in 1:n_series) {"                                         
+## [20] " trend[2, s] <- 0"                                              
+## [21] "}"                                                              
+## [22] ""                                                               
+## [23] "for(s in 1:n_series) {"                                         
+## [24] " trend[3, s] <- 0"                                              
+## [25] "}"                                                              
+## [26] ""                                                               
+## [27] "for (i in 4:n){"                                                
+## [28] " for (s in 1:n_series){"                                        
+## [29] " trend[i, s] <- 0"                                              
+## [30] " }"                                                             
+## [31] "}"                                                              
+## [32] ""                                                               
+## [33] "## AR components"                                               
+## [34] "for (s in 1:n_series){"                                         
+## [35] " phi[s] <- 0"                                                   
+## [36] " ar1[s] <- 0"                                                   
+## [37] " ar2[s] <- 0"                                                   
+## [38] " ar3[s] <- 0"                                                   
+## [39] " tau[s] <- pow(sigma[s], -2)"                                   
+## [40] " sigma[s] ~ dexp(1)"                                            
+## [41] "}"                                                              
+## [42] ""                                                               
+## [43] "## Negative binomial likelihood functions"                      
+## [44] "for (i in 1:n) {"                                               
+## [45] " for (s in 1:n_series) {"                                       
+## [46] "  y[i, s] ~ dnegbin(rate[i, s], r)"                             
+## [47] "  rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps,"
+## [48] "                      (r / (r + mu[i, s])))"                    
+## [49] " }"                                                             
+## [50] "}"                                                              
+## [51] "r ~ dexp(0.001)"                                                
+## [52] ""                                                               
+## [53] "## Posterior predictions"                                       
+## [54] "for (i in 1:n) {"                                               
+## [55] " for (s in 1:n_series) {"                                       
+## [56] "  ypred[i, s] ~ dnegbin(rate[i, s], r)"                         
+## [57] " }"                                                             
+## [58] "}"                                                              
+## [59] ""                                                               
+## [60] "  ## Parametric effect priors CHECK tau=1/54^2 is appropriate!" 
+## [61] "  for (i in 1:1) { b[i] ~ dnorm(p_coefs[i], tau_para[i])"       
+## [62] "   sigma_para[i] ~ dexp(2)"                                     
+## [63] " tau_para[i] <- pow(sigma_para[i], -2) }"                       
+## [64] "  ## prior for s(season)... "                                   
+## [65] "  K1 <- S1[1:10,1:10] * lambda[1] "                             
+## [66] "  b[2:11] ~ dmnorm(zero[2:11],K1) "                             
+## [67] "  ## prior for s(year)... "                                     
+## [68] "  K2 <- S2[1:4,1:4] * lambda[2]  + S2[1:4,5:8] * lambda[3]"     
+## [69] "  b[12:15] ~ dmnorm(zero[12:15],K2) "                           
+## [70] "  ## prior for ti(season,year)... "                             
+## [71] "  K3 <- S3[1:20,1:20] * lambda[4]  + S3[1:20,21:40] * lambda[5]"
+## [72] "  b[16:35] ~ dmnorm(zero[16:35],K3) "                           
+## [73] "  ## smoothing parameter priors CHECK..."                       
+## [74] "  for (i in 1:5) {"                                             
+## [75] "   lambda[i] ~ dexp(0.05)"                                      
+## [76] "    rho[i] <- log(lambda[i])"                                   
+## [77] "  }"                                                            
+## [78] "}"
+

Inspect the summary from the model, which is somewhat similar to an mgcv model summary with extra information about convergences for unobserved parameters. The estimated degrees of freedom, smooth coefficients and smooth penalties are all extracted from the mvgam model using sim2jam so that approximate p-values can be calculated using Nychka's method (following Wood (2013) Biometrika 100(1), 221-228). Note however that this function is still in development and approximate p-values may not be entirely accurate

+
summary_mvgam(mod1)
+
## GAM formula:
+
## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5) + ti(season, 
+##     year, bs = c("cc", "tp"), k = c(12, 3))
+
## 
+
## Family:
+
## Negative Binomial
+
## 
+
## N series:
+
## 1
+
## 
+
## N observations per series:
+
## 108
+
## 
+
## Dispersion parameter estimate:
+
##       2.5%      50%    97.5% Rhat n.eff
+## r 1166.408 2840.465 6488.955    1  5303
+
## 
+
## GAM smooth term approximate significances:
+
##                    edf Ref.df      F p-value    
+## s(season)        9.237 10.000 39.059  <2e-16 ***
+## s(year)          3.023  4.000  2.884  0.0027 ** 
+## ti(season,year) 12.647 20.000  0.015  0.9558    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
## 
+
## GAM coefficient (beta) estimates:
+
##                           2.5%          50%       97.5% Rhat n.eff
+## (Intercept)         5.36161110  5.377037202  5.39078643 1.00  4812
+## s(season).1        -0.27395125 -0.195683936 -0.12748834 1.01   373
+## s(season).2        -0.15882576 -0.108986878 -0.06058414 1.00   672
+## s(season).3        -0.05005644  0.007374626  0.06057666 1.01   544
+## s(season).4        -0.08352341 -0.036483031  0.01123565 1.01   719
+## s(season).5         0.01366607  0.062456843  0.10939919 1.01   688
+## s(season).6         0.17729638  0.226980545  0.27415732 1.01   666
+## s(season).7         0.16115314  0.201809290  0.24212203 1.00   990
+## s(season).8         0.00527439  0.058534115  0.10911778 1.00   662
+## s(season).9        -0.19956690 -0.147148431 -0.09850744 1.00   662
+## s(season).10       -0.20269857 -0.126225004 -0.05640364 1.01   376
+## s(year).1          -0.11042219 -0.030354594  0.02468973 1.03   211
+## s(year).2          -0.10293331  0.012803800  0.17404624 1.13    76
+## s(year).3          -0.09806499  0.040268930  0.23528683 1.12    73
+## s(year).4           0.31935661  0.369656749  0.43594345 1.01   252
+## ti(season,year).1  -0.05295287  0.055320955  0.15291445 1.01   300
+## ti(season,year).2  -0.12990817 -0.038994350  0.04204320 1.01   373
+## ti(season,year).3  -0.03642839  0.053236795  0.14769136 1.02   312
+## ti(season,year).4  -0.13084884 -0.051034084  0.02517270 1.00   425
+## ti(season,year).5  -0.08023771  0.010951993  0.12402632 1.02   267
+## ti(season,year).6  -0.10168859 -0.020381921  0.05678394 1.01   433
+## ti(season,year).7  -0.12291677 -0.035747887  0.05794980 1.01   316
+## ti(season,year).8  -0.05371343  0.019790513  0.09595653 1.02   428
+## ti(season,year).9  -0.13322484 -0.042223151  0.05111752 1.01   319
+## ti(season,year).10 -0.02453354  0.050234988  0.12213985 1.01   510
+## ti(season,year).11 -0.13620192 -0.044711164  0.04004527 1.01   344
+## ti(season,year).12 -0.03258447  0.048864148  0.12770832 1.01   415
+## ti(season,year).13 -0.11312038 -0.026994407  0.05857484 1.01   355
+## ti(season,year).14 -0.02419923  0.041417377  0.10786147 1.02   601
+## ti(season,year).15 -0.09920499 -0.005217718  0.08684654 1.02   329
+## ti(season,year).16 -0.07354610  0.006287603  0.07916318 1.01   392
+## ti(season,year).17 -0.10774977 -0.011918860  0.07952767 1.03   292
+## ti(season,year).18 -0.08303156 -0.006336232  0.06698698 1.04   395
+## ti(season,year).19 -0.09187687  0.003976165  0.09731229 1.03   279
+## ti(season,year).20 -0.10244777 -0.018392718  0.06748642 1.03   355
+
## 
+
## GAM smoothing parameter (rho) estimates:
+
##                        2.5%      50%    97.5% Rhat n.eff
+## s(season)         3.4269741 4.403783 5.164925 1.00  2766
+## s(year)           1.5139504 3.304091 4.547796 1.01  1488
+## s(year)2         -0.0783527 2.290532 3.672388 1.00  7751
+## ti(season,year)   3.6016565 4.601123 5.345439 1.00  4090
+## ti(season,year)2  2.5125925 3.899878 4.854497 1.01  2398
+
## 
+
## Latent trend drift (phi) and AR parameter estimates:
+
##     2.5% 50% 97.5% Rhat n.eff
+## phi    0   0     0  NaN     0
+## ar1    0   0     0  NaN     0
+## ar2    0   0     0  NaN     0
+## ar3    0   0     0  NaN     0
+
## 
+

mvgam takes a fitted gam model and adapts the model file to fit in JAGS, with possible extensions to deal with stochastic trend components and other features. But as this model has not been modified much from the original gam model with the same formula (i.e. we have not added any stochastic trend components to the linear predictor yet, which is the main feature of mvgam), the summary of the unmodified gam model is also useful and fairly accurate

+
summary(mod1$mgcv_model)
+
## 
+## Family: Negative Binomial(944980754.781) 
+## Link function: log 
+## 
+## Formula:
+## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5) + ti(season, 
+##     year, bs = c("cc", "tp"), k = c(12, 3))
+## 
+## Parametric coefficients:
+##             Estimate Std. Error t value Pr(>|t|)    
+## (Intercept) 5.389265   0.006976   772.6   <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##                   edf Ref.df        F p-value    
+## s(season)       7.820 10.000   32.326  <2e-16 ***
+## s(year)         1.451  1.763 1404.871  <2e-16 ***
+## ti(season,year) 1.771 20.000    0.287  0.0274 *  
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.989   Deviance explained = 98.9%
+## fREML = 138.54  Scale est. = 1         n = 100
+

Ordinarily we would be quite pleased with this result, as we have explained most of the variation in the series with a fairly simple model. We can plot the estimated smooth functions and with their associated credible intervals, which are interpreted similarly to mgcv plots except they are not centred at zero (mvgam predicts the smooth with all other covariates at their means, rather than at zero). So the plot below, for the season smooth, shows the estimated values for the linear predictor (on the log scale in this case) if year was set to its mean

+
plot_mvgam_smooth(mod1, series = 1, smooth = "season")
+

+

And here is the plot of the smooth function for year, which has essentially estimated a straight line

+
plot_mvgam_smooth(mod1, series = 1, smooth = "year")
+

+

Perform a series of posterior predictive checks to see if the model is able to simulate data for the training period that looks realistic and unbiased. First, examine simulated kernel densities for posterior predictions (yhat) and compare to the density of the observations (y)

+
plot_mvgam_ppc(mod1, series = 1, type = "density", 
+    legend_position = "bottomright")
+

+

Now plot the distribution of predicted means compared to the observed mean

+
plot_mvgam_ppc(mod1, series = 1, type = "mean")
+

+

Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions (yhat) and compare to the CDF of the observations (y)

+
plot_mvgam_ppc(mod1, series = 1, type = "cdf")
+

+

Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform

+
plot_mvgam_ppc(mod1, series = 1, type = "pit")
+

+

All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Now for some investigation of the estimated relationships and forecasts. We can also perform residual diagnostics using randomised quantile (Dunn-Smyth) residuals. These look reasonable overall, though there is some autocorrelation left in the residuals for this series

+
plot_mvgam_resids(mod1, series = 1)
+

+

Ok so the model is doing well when fitting against the training data, but how are its forecasts? The yearly trend is being extrapolated into the future, which controls most of the shape and uncertainty in the forecast. We see there is a reasonable estimate of uncertainty and the out of sample observations (to the right of the dashed line) are all within the model's 95% HPD intervals

+
plot_mvgam_fc(mod1, series = 1, data_test = fake_data$data_test)
+

+

The extrapolation of the smooth for year can be better viewed by feeding new data to the plot_mvgam_smooth function. Here we feed values of year to cover the training and testing set to see how the extrapolation would continue into the future. The dashed line marks the end of the training period

+
plot_mvgam_smooth(mod1, series = 1, "year", 
+    newdata = expand.grid(year = seq(min(fake_data$data_train$year), 
+        max(fake_data$data_test$year), length.out = 500), 
+        season = mean(fake_data$data_train$season), 
+        series = unique(fake_data$data_train$series)))
+abline(v = max(fake_data$data_train$year), 
+    lty = "dashed")
+

+

We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better insight into how the model is performing and whether it is able to simulate realistic and unbiased future values

+
plot_mvgam_ppc(mod1, series = 1, type = "density", 
+    data_test = fake_data$data_test, legend_position = "bottomright")
+

+
plot_mvgam_ppc(mod1, series = 1, type = "mean", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod1, series = 1, type = "cdf", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod1, series = 1, type = "pit", 
+    data_test = fake_data$data_test)
+

+

There are some very clear problems with the way this model is generating future predictions. Perhaps a different smooth function for year can help? Here we fit our original model again but use a different smooth term for year to try and capture the long-term trend (using B splines with multiple penalties, following the excellent example by Gavin Simpson about extrapolating with smooth terms). This is similar to what we might do when trying to forecast ahead from a more wiggly function, as B splines have useful properties by allowing the penalty to be extended into the range of values we wish to predict (in this case, the years in data_test)

+
mod2 <- mvjagam(data_train = fake_data$data_train, 
+    data_test = fake_data$data_test, formula = y ~ 
+        s(season, bs = c("cc"), k = 12) + 
+            s(year, bs = "bs", m = c(3, 2, 
+                1, 0)) + ti(season, year, 
+            bs = c("cc", "tp"), k = c(12, 
+                3), m = c(2, 1)), knots = list(season = c(0.5, 
+        12.5), year = c(min(fake_data$data_train$year) - 
+        1, min(fake_data$data_train$year), 
+        max(fake_data$data_train$year), max(fake_data$data_test$year))), 
+    family = "nb", trend_model = "None", 
+    burnin = 2000, chains = 4)
+
## Compiling rjags model...
+## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
+## 'localhost'
+## Simulation complete
+## Note: Summary statistics were not produced as there are >50 monitored
+## variables
+## [To override this behaviour see ?add.summary and ?runjags.options]
+## FALSEFinished running the simulation
+## NOTE: Stopping adaptation
+

Again as we haven't modified the base gam much, the summary from the mgcv model is fairly accurate

+
summary_mvgam(mod2)
+
## GAM formula:
+
## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3, 
+##     2, 1, 0)) + ti(season, year, bs = c("cc", "tp"), k = c(12, 
+##     3), m = c(2, 1))
+
## 
+
## Family:
+
## Negative Binomial
+
## 
+
## N series:
+
## 1
+
## 
+
## N observations per series:
+
## 108
+
## 
+
## Dispersion parameter estimate:
+
##       2.5%      50%    97.5% Rhat n.eff
+## r 1334.131 2672.686 7620.356 1.93   118
+
## 
+
## GAM smooth term approximate significances:
+
##                    edf Ref.df      F p-value    
+## s(season)        8.015 10.000 68.888  <2e-16 ***
+## s(year)          2.378  9.000 33.149  <2e-16 ***
+## ti(season,year)  2.465 10.000  0.487  0.0911 .  
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
## 
+
## GAM coefficient (beta) estimates:
+
##                            2.5%          50%        97.5% Rhat n.eff
+## (Intercept)         5.362070579  5.376806898  5.390820518 1.03  3785
+## s(season).1        -0.213430308 -0.151517156 -0.101007334 1.02   500
+## s(season).2        -0.148922301 -0.099921057 -0.055643721 1.02   676
+## s(season).3        -0.028226483  0.016931885  0.059414423 1.01   792
+## s(season).4        -0.061996975 -0.017003318  0.025089712 1.00   670
+## s(season).5         0.038099953  0.079826082  0.120362340 1.00   834
+## s(season).6         0.199688149  0.239707637  0.282409200 1.00   840
+## s(season).7         0.174262894  0.212309518  0.248851873 1.00   996
+## s(season).8         0.019906807  0.065582961  0.109381762 1.01   763
+## s(season).9        -0.169165279 -0.122570964 -0.078652162 1.02   636
+## s(season).10       -0.153424964 -0.102932723 -0.053071421 1.02   577
+## s(year).1          -0.999313467 -0.882993791  0.224713835 3.06    33
+## s(year).2          -0.450536661 -0.294986783 -0.267501857 1.69   158
+## s(year).3          -0.187506134 -0.062664032 -0.013656990 1.91   152
+## s(year).4           0.144094577  0.196083560  0.298539501 1.31    80
+## s(year).5           0.234552496  0.390496398  0.440267686 2.11    78
+## s(year).6           0.520509735  0.655588193  0.704282327 1.55   124
+## s(year).7           0.708036658  0.773862240  0.885204923 1.30   164
+## s(year).8          -0.038655710  0.872917548  0.969214795 2.61    41
+## s(year).9          -0.693474072  1.123709900  1.271857701 5.32    28
+## ti(season,year).1  -0.043038651 -0.016464163  0.003306236 1.02    42
+## ti(season,year).2  -0.038248661 -0.015365567  0.006086726 1.07    59
+## ti(season,year).3  -0.034843090 -0.006495594  0.020358404 1.18    46
+## ti(season,year).4  -0.017460649  0.006593620  0.033680203 1.32    36
+## ti(season,year).5  -0.009328059  0.015302505  0.044225325 1.27    37
+## ti(season,year).6  -0.003194770  0.021267455  0.046238239 1.05    44
+## ti(season,year).7  -0.004592578  0.018187670  0.044772543 1.11    41
+## ti(season,year).8  -0.011737224  0.012365263  0.037459252 1.17    58
+## ti(season,year).9  -0.020332936  0.001175399  0.020449767 1.26    54
+## ti(season,year).10 -0.025481801 -0.004238640  0.013211187 1.28    46
+
## 
+
## GAM smoothing parameter (rho) estimates:
+
##                        2.5%        50%      97.5% Rhat n.eff
+## s(season)          5.263804   6.420443   7.411276 1.01  1252
+## s(year)            4.116279  10.065229  12.064278 4.51    55
+## s(year)2          -1.193273   1.246155   2.616958 1.02  7090
+## s(year)3         -17.333931 -13.882385 -12.211856 1.00  7055
+## ti(season,year)    8.243846  10.029634  11.275795 1.02   246
+## ti(season,year)2 -13.562819 -10.142882  -8.485700 1.00  6398
+
## 
+
## Latent trend drift (phi) and AR parameter estimates:
+
##     2.5% 50% 97.5% Rhat n.eff
+## phi    0   0     0  NaN     0
+## ar1    0   0     0  NaN     0
+## ar2    0   0     0  NaN     0
+## ar3    0   0     0  NaN     0
+
## 
+
summary(mod2$mgcv_model)
+
## 
+## Family: Negative Binomial(572232647.608) 
+## Link function: log 
+## 
+## Formula:
+## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3, 
+##     2, 1, 0)) + ti(season, year, bs = c("cc", "tp"), k = c(12, 
+##     3), m = c(2, 1))
+## 
+## Parametric coefficients:
+##             Estimate Std. Error t value Pr(>|t|)    
+## (Intercept) 5.389254   0.006977   772.4   <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##                   edf Ref.df       F p-value    
+## s(season)       7.819     10  32.353  <2e-16 ***
+## s(year)         1.640      7 352.636  <2e-16 ***
+## ti(season,year) 1.766     10   0.571  0.0278 *  
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.989   Deviance explained = 98.9%
+## fREML = 138.86  Scale est. = 1         n = 100
+

This model explains even more of the variation than the thin plate yearly model above, so we'd be tempted to use it for prediction (though of course we'd want to perform a series of checks following advice from Simon Wood; see lectures by Gavin Simpson for more information on how to perform these checks). So how do the forecasts look?

+
plot_mvgam_fc(mod2, series = 1, data_test = fake_data$data_test)
+

+

Again the forecast is being driven primarily by the extrapolation behaviour of the B spline. Look at this behaviour for the year smooth as we did above by feeding new data for prediction

+
plot_mvgam_smooth(mod2, series = 1, "year", 
+    newdata = expand.grid(year = seq(min(fake_data$data_train$year), 
+        max(fake_data$data_test$year), length.out = 500), 
+        season = mean(fake_data$data_train$season), 
+        series = unique(fake_data$data_train$series)))
+abline(v = max(fake_data$data_train$year), 
+    lty = "dashed")
+

+

Residual and posterior in-training predictive checks for this model will look similar to the above, with some autocorrelation left in residuals but nothing terribly alarming. But the forecast checks again show some problems:

+
plot_mvgam_ppc(mod2, series = 1, type = "density", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod2, series = 1, type = "mean", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod2, series = 1, type = "cdf", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod2, series = 1, type = "pit", 
+    data_test = fake_data$data_test)
+

+

Can we improve the forecasts by removing our reliance on extrapolation? Now we will fit a model in which the GAM component of the linear predictor captures the repeated seasonality (again with a cyclic smooth) and a dynamic latent trend captures the residual process using AR parameters (up to order 3). This model is a mildly modified version of the base mgcv model where the linear predictor is augmented with the latent trend component. Slightly longer burnin is used here due to the added complexity of the time series component, but the model still fits in ~ 30 seconds on most machines

+
mod3 <- mvjagam(data_train = fake_data$data_train, 
+    data_test = fake_data$data_test, formula = y ~ 
+        s(season, bs = c("cc"), k = 12) + 
+            ti(season, year, bs = c("cc", 
+                "tp"), k = c(12, 3), m = c(2, 
+                1)), knots = list(season = c(0.5, 
+        12.5)), family = "nb", trend_model = "AR3", 
+    drift = TRUE, burnin = 2000, chains = 4)
+
## Compiling rjags model...
+## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
+## 'localhost'
+## Simulation complete
+## Note: Summary statistics were not produced as there are >50 monitored
+## variables
+## [To override this behaviour see ?add.summary and ?runjags.options]
+## FALSEFinished running the simulation
+## NOTE: Stopping adaptation
+

In this case the fitted model is more different to the base mgcv model that was used in jagam to produce the skeleton JAGS file, so the summary of that base model is less accurate. But we can still check the model summary for the mvjagam mdoel to examine convergence for key parameters

+
summary_mvgam(mod3)
+
## GAM formula:
+
## y ~ s(season, bs = c("cc"), k = 12) + ti(season, year, bs = c("cc", 
+##     "tp"), k = c(12, 3), m = c(2, 1))
+
## 
+
## Family:
+
## Negative Binomial
+
## 
+
## N series:
+
## 1
+
## 
+
## N observations per series:
+
## 108
+
## 
+
## Dispersion parameter estimate:
+
##       2.5%      50%    97.5% Rhat n.eff
+## r 1196.621 2948.698 6678.804    1  6741
+
## 
+
## GAM smooth term approximate significances:
+
##                       edf    Ref.df     F  p-value    
+## s(season)        5.667668 10.000000 2.004 0.000153 ***
+## ti(season,year)  0.001279 20.000000 0.000 0.996471    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
## 
+
## GAM coefficient (beta) estimates:
+
##                             2.5%           50%         97.5% Rhat n.eff
+## (Intercept)         4.6854902871  4.750101e+00  4.8313016766 1.43    45
+## s(season).1        -0.1351084443 -8.320542e-02 -0.0390496157 1.01   472
+## s(season).2        -0.0921056962 -4.486880e-02 -0.0002825825 1.02   538
+## s(season).3        -0.0012647269  4.104691e-02  0.0858638736 1.01   656
+## s(season).4        -0.0322487483  1.093151e-02  0.0513603091 1.01   644
+## s(season).5         0.0549977555  9.608617e-02  0.1347515796 1.01   759
+## s(season).6         0.2005196547  2.402462e-01  0.2807636509 1.00   743
+## s(season).7         0.1671701980  2.040605e-01  0.2376592899 1.01   924
+## s(season).8        -0.0009857419  3.802109e-02  0.0762435832 1.00   850
+## s(season).9        -0.2105717582 -1.639122e-01 -0.1236429061 1.00   521
+## s(season).10       -0.2101943002 -1.634046e-01 -0.1186487128 1.00   545
+## ti(season,year).1  -0.0019294697  3.637336e-05  0.0023289937 1.32   169
+## ti(season,year).2  -0.0017799667 -6.150140e-05  0.0021345201 1.25   175
+## ti(season,year).3  -0.0020113169  1.671438e-06  0.0020285698 1.25   192
+## ti(season,year).4  -0.0021180786  1.266004e-05  0.0017667586 1.23   230
+## ti(season,year).5  -0.0016374544 -3.889510e-05  0.0016336857 1.22   252
+## ti(season,year).6  -0.0016751972  1.712028e-04  0.0018318152 1.24   252
+## ti(season,year).7  -0.0022636098  4.533724e-05  0.0016577034 1.20   171
+## ti(season,year).8  -0.0018772721  4.654626e-05  0.0023905603 1.18   153
+## ti(season,year).9  -0.0028771654  1.257391e-05  0.0014579699 1.24   168
+## ti(season,year).10 -0.0014505372 -6.230076e-05  0.0027049151 1.33   173
+## ti(season,year).11 -0.0019597910 -7.261990e-05  0.0018234465 1.25   237
+## ti(season,year).12 -0.0017144692  2.979686e-05  0.0023041621 1.28   184
+## ti(season,year).13 -0.0018457651  1.133942e-04  0.0019172022 1.24   266
+## ti(season,year).14 -0.0017974155  2.019889e-05  0.0019218896 1.24   210
+## ti(season,year).15 -0.0017507757 -4.452746e-06  0.0018043474 1.24   225
+## ti(season,year).16 -0.0020461346 -8.446822e-05  0.0019519123 1.22   144
+## ti(season,year).17 -0.0022132499  5.598248e-05  0.0016638093 1.35   163
+## ti(season,year).18 -0.0017969733 -6.662812e-05  0.0019232353 1.26   167
+## ti(season,year).19 -0.0016653811 -1.340496e-04  0.0014943742 1.23   270
+## ti(season,year).20 -0.0017798682  8.252861e-05  0.0018383077 1.25   204
+
## 
+
## GAM smoothing parameter (rho) estimates:
+
##                       2.5%       50%     97.5% Rhat n.eff
+## s(season)         5.805841  7.056992  8.120666 1.01   963
+## ti(season,year)  -4.638952 -1.362070  0.308262 1.00  6254
+## ti(season,year)2 11.091656 14.146277 17.685874 4.64    33
+
## 
+
## Latent trend drift (phi) and AR parameter estimates:
+
##            2.5%        50%      97.5% Rhat n.eff
+## phi  0.01348130 0.02565249 0.03748029 1.07   618
+## ar1 -0.14844196 0.28966886 0.76114118 1.01  1191
+## ar2 -0.12923925 0.33436562 0.76015759 1.00  1506
+## ar3 -0.07148772 0.37387705 0.79635136 1.00  1159
+
## 
+

The seasonal term is obviously still very important. Plot it here

+
plot_mvgam_smooth(mod3, series = 1, "season")
+

+

Plot diagnostics of posterior median Dunn-Smyth residuals, where the autocorrelation is now captured by the latent trend process

+
plot_mvgam_resids(mod3, series = 1)
+

+

How does the model's posterior forecast distribution compare to the previous models?

+
plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod3, series = 1, type = "density", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod3, series = 1, type = "mean", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod3, series = 1, type = "cdf", 
+    data_test = fake_data$data_test)
+

+
plot_mvgam_ppc(mod3, series = 1, type = "pit", 
+    data_test = fake_data$data_test)
+

+

None of the model's is a perfect representation of the data generating process, but the forecast for our dynamic GAM is more stable and more accurate than the previous models, with the added advantage that we can place more trust our estimated smooth for season because we have captured the residual autocorrelation. The posterior checks for our dynamic GAM also look much better than the two previous models. Plot the estimated latent dynamic trend (which is on the log scale)

+
plot_mvgam_trend(mod3, series = 1, data_test = fake_data$data_test)
+

+

Benchmarking against "null" models is a very important part of evaluating a proposed forecast model. After all, if our complex dynamic model can't generate better predictions then a random walk or mean forecast, is it really telling us anything new about the data-generating process? Here we examine the model comparison utilities in mvgam. Here we illustrate how this can be done in mvgam by fitting a simpler model by smoothing on a white noise covariate rather than on the seasonal variable. Because the white noise covariate is not informative and we are using a random walk for the trend process, this model essentially becomes a Poisson observation model over a dynamic random walk process.

+
fake_data$data_train$fake_cov <- rnorm(NROW(fake_data$data_train))
+fake_data$data_test$fake_cov <- rnorm(NROW(fake_data$data_test))
+mod4 <- mvjagam(data_train = fake_data$data_train, 
+    data_test = fake_data$data_test, formula = y ~ 
+        s(fake_cov, k = 3), family = "poisson", 
+    trend_model = "RW", drift = TRUE, burnin = 3000, 
+    chains = 4)
+
## Compiling rjags model...
+## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
+## 'localhost'
+## Simulation complete
+## Note: Summary statistics were not produced as there are >50 monitored
+## variables
+## [To override this behaviour see ?add.summary and ?runjags.options]
+## FALSEFinished running the simulation
+## NOTE: Stopping adaptation
+

Look at this model's proposed forecast

+
plot_mvgam_fc(mod4, series = 1, data_test = fake_data$data_test)
+

+

Here we showcase how different dynamic models can be compared using rolling probabilistic forecast evaluation, which is especially useful if we don't already have out of sample observations for comparing forecasts. This function sets up a sequence of evaluation timepoints along a rolling window within the training data to evaluate 'out-of-sample' forecasts. The trends are rolled forward a total of fc_horizon timesteps according to their estimated state space dynamics to generate an 'out-of-sample' forecast that is evaluated against the true observations in the horizon window. We are therefore simulating a situation where the model's parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts that consider the possible future paths for the latent trends and the true observed values for any other covariates in the horizon window. Evaluation involves calculating the Discrete Rank Probability Score and a binary indicator for whether or not the true value lies within the forecast's 90% prediction interval. For this test we compare the two models on the exact same sequence of 30 evaluation points using horizon = 6

+
compare_mvgams(model1 = mod3, model2 = mod4, 
+    fc_horizon = 6, n_evaluations = 30, n_cores = 3)
+
## DRPS summaries per model (lower is better)
+##             Min.   1st Qu.    Median      Mean   3rd Qu.     Max. NA's
+## Model 1 2.800801  3.885114  5.058327  5.778573  5.993394 21.73604   13
+## Model 2 4.658159 10.698314 16.765719 23.569713 27.393819 99.26183   13
+## 
+## 90% interval coverages per model (closer to 0.9 is better)
+## Model 1 0.988024 
+## Model 2 0.9461078
+

+

The series of plots generated by compare_mvgams clearly show that the first dynamic model generates better predictions. In each plot, DRPS for the forecast horizon is lower for the first model than for the second model. This kind of evaluation is often more appropriate for forecast models than complexity-penalising fit metrics such as AIC or BIC However, comparing forecasts of the dynamic models against the two models with yearly smooth terms (mod1 and mod2) using the rolling window approach is actually not recommended, as the yearly smooth models have already seen all the possible in-sample values of year and so should be able to predict incredibly well by interpolating through the range of the fitted smooth. By contrast, the dynamic component in our dynamic GAMs (models 3 and 4) produce true forecasts when running the rolling window approach. Nevertheless, when we compare some of these models (here mod1 with the thin plate yearly smooth vs mod3 with the AR3 trend process) as we did above for the random walk model, we still find that our dynamic GAM produces superior probabilistic forecasts

+
compare_mvgams(model1 = mod1, model2 = mod3, 
+    fc_horizon = 6, n_evaluations = 30, n_cores = 3)
+
## DRPS summaries per model (lower is better)
+##              Min.   1st Qu.    Median      Mean   3rd Qu.     Max. NA's
+## Model 1 19.413509 30.945251 38.888128 41.390934 48.733559 79.53264   13
+## Model 2  2.740936  3.891306  5.054429  5.800003  5.977418 21.87450   13
+## 
+## 90% interval coverages per model (closer to 0.9 is better)
+## Model 1 1 
+## Model 2 0.994012
+

+

The same holds true when comparing against the B spline model (mod2)

+
compare_mvgams(model1 = mod2, model2 = mod3, 
+    fc_horizon = 6, n_evaluations = 30, n_cores = 3)
+
## DRPS summaries per model (lower is better)
+##             Min.   1st Qu.    Median      Mean   3rd Qu.     Max. NA's
+## Model 1 18.49409 29.184917 37.871936 38.060259 45.877689 65.60411   13
+## Model 2  2.75831  3.876155  5.066004  5.785132  6.081017 20.60188   13
+## 
+## 90% interval coverages per model (closer to 0.9 is better)
+## Model 1 1 
+## Model 2 0.994012
+

+

Now we proceed by exploring how forecast distributions from an mvgam object can be automatically updated in light of new incoming observations. This works by generating a set of "particles" that each captures a unique proposal about the current state of the system (in this case, the current estimate of the latent trend component). The next observation in data_assim is assimilated and particles are weighted by how well their proposal (i.e. their proposed forecast, prior to seeing the new data) matched the new observations. For univariate models such as the ones we've fitted so far, this weight is represented by the proposal's Negative Binomial log-likelihood. For multivariate models, a multivariate composite likelihood is used for weights. Once weights are calculated, we use importance sampling to update the model's forecast distribution for the remaining forecast horizon. Begin by initiating a set of 5000 particles by assimilating the next observation in data_test and storing the particles in the default location (in a directory called particles within the working directory)

+
pfilter_mvgam_init(object = mod3, n_particles = 5000, 
+    n_cores = 2, data_assim = fake_data$data_test)
+
## Saving particles to pfilter/particles.rda 
+##  ESS = 12.22888
+

Now we are ready to run the particle filter. This function will assimilate the next six out of sample observations in data_test and update the forecast after each assimilation step. This works in an iterative fashion by calculating each particle's weight, then using a kernel smoothing algorithm to "pull" low weight particles toward the high-likelihood space before assimilating the next observation. The strength of the kernel smoother is controlled by kernel_lambda, which in our experience works well when left to the default of 1. If the Effective Sample Size of particles drops too low, suggesting we are putting most of our belief in a very small set of particles, an automatic resampling step is triggered to increase particle diversity and reduce the chance that our forecast intervals become too narrow and incapable of adapting to changing conditions

+
pfilter_mvgam_online(data_assim = fake_data$data_test[1:7, 
+    ], n_cores = 2, kernel_lambda = 1)
+
## Particles have already assimilated one or more observations. Skipping these
+## 
+## Assimilating the next 6 observations
+## 
+## Effective sample size is 6.331851 ...
+## 
+## Smoothing particles ...
+## 
+## Effective sample size is 6.269106 ...
+## 
+## Smoothing particles ...
+## 
+## Effective sample size is 37.31189 ...
+## 
+## Smoothing particles ...
+## 
+## Effective sample size is 541.3617 ...
+## 
+## Smoothing particles ...
+## 
+## Effective sample size is 1934.641 ...
+## 
+## Smoothing particles ...
+## 
+## Effective sample size is 3116.806 ...
+## 
+## Smoothing particles ...
+## 
+## Last assimilation time was 7 1958 
+## 
+## Saving particles to pfilter/particles.rda 
+##  ESS = 3116.806
+

Once assimilation is complete, generate the updated forecast from the particles using the covariate information in remaining data_test observations. This function is designed to hopefully make it simpler to assimilate observations, as all that needs to be provided once the particles are initiated as a dataframe of test data in exactly the same format as the data that were used to train the initial model. If no new observations are found (observations are arranged by year and then by season so the consistent indexing of these two variables is very important!) then the function returns a NULL and the particles remain where they are in state space.

+
fc <- pfilter_mvgam_fc(file_path = "pfilter", 
+    n_cores = 2, data_test = fake_data$data_test, 
+    ylim = c(min(fake_data$data_train$y, 
+        na.rm = T), max(fake_data$data_test$y, 
+        na.rm = T) * 1.25))
+

Compare the updated forecast to the original forecast to see how it has changed in light of the most recent observations

+
par(mfrow = c(1, 2))
+plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test, 
+    ylim = c(min(fake_data$data_train$y, 
+        na.rm = T), max(fake_data$data_test$y, 
+        na.rm = T) * 1.25))
+fc$Air()
+

+

Here it is apparent that the distribution has shifted slightly in light of the 6 observations that have been assimilated, and that our confidence in the remaining forecast horizon has improved (tighter uncertainty intervals). This is an advantageous way of allowing a model to slowly adapt to new conditions while breaking free of restrictive assumptions about residual distributions. See some of the many particle filtering lectures by Nathaniel Osgood for more details. Remove the particles from their stored directory when finished

+
unlink("pfilter", recursive = T)
+
+
+

Lynx example

+

For our next univariate example, we will again pursue how challenging it can be to forecast ahead with conventional GAMs and how mvgam overcomes these challenges. We begin by replicating the lynx analysis from 2018 Ecological Society of America workshop on GAMs that was hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross, with some minor adjustments. First, load the data and plot the series as well as its estimated autocorrelation function

+
library(mvgam)
+data(lynx)
+lynx_full = data.frame(year = 1821:1934, 
+    population = as.numeric(lynx))
+plot(lynx_full$population, type = "l", ylab = "Lynx trappings", 
+    xlab = "Time")
+acf(lynx_full$population, main = "")
+

+

There is a clear ~19-year cyclic pattern to the data, so I create a season term that can be used to model this effect and give a better representation of the data generating process. I also create a new year term that represents which long-term cycle each observation is in

+
plot(stl(ts(lynx_full$population, frequency = 19), 
+    s.window = "periodic"))
+

+
lynx_full$season <- (lynx_full$year%%19) + 
+    1
+cycle_ends <- c(which(lynx_full$season == 
+    19), NROW(lynx_full))
+cycle_starts <- c(1, cycle_ends[1:length(which(lynx_full$season == 
+    19))] + 1)
+cycle <- vector(length = NROW(lynx_full))
+for (i in 1:length(cycle_starts)) {
+    cycle[cycle_starts[i]:cycle_ends[i]] <- i
+}
+lynx_full$year <- cycle
+

Add lag indicators needed to fit the nonlinear lag models that gave the best one step ahead point forecasts in the ESA workshop example. As in the example, we specify the default argument in the lag function as the mean log population.

+
mean_pop_l = mean(log(lynx_full$population))
+lynx_full = dplyr::mutate(lynx_full, popl = log(population), 
+    lag1 = dplyr::lag(popl, 1, default = mean_pop_l), 
+    lag2 = dplyr::lag(popl, 2, default = mean_pop_l), 
+    lag3 = dplyr::lag(popl, 3, default = mean_pop_l), 
+    lag4 = dplyr::lag(popl, 4, default = mean_pop_l), 
+    lag5 = dplyr::lag(popl, 5, default = mean_pop_l), 
+    lag6 = dplyr::lag(popl, 6, default = mean_pop_l))
+

For mvgam models, the response needs to be labelled y and we also need an indicator of the series name as a factor variable

+
lynx_full$y <- lynx_full$population
+lynx_full$series <- factor("series1")
+

Split the data into training (first 40 years) and testing (next 10 years of data) to evaluate multi-step ahead forecasts

+
lynx_train = lynx_full[1:40, ]
+lynx_test = lynx_full[41:50, ]
+

The best-forecasting model in the course was with nonlinear smooths of lags 1 and 2; we use those here is that we also include a cyclic smooth for the 19-year cycles as this seems like an important feature, as well as a yearly smooth for the long-term trend. Following the information about spline extrapolation above, we again fit a cubic B spline for the trend with a mix of penalties to try and reign in wacky extrapolation behaviours, and we extend the penalty to cover the years that we wish to predict. This will hopefully give us better uncertainty estimates for the forecast. In this example we assume the observations are Poisson distributed

+
lynx_mgcv = gam(population ~ s(season, bs = "cc", 
+    k = 19) + s(year, bs = "bs", m = c(3, 
+    2, 1, 0)) + s(lag1, k = 5) + s(lag2, 
+    k = 5), knots = list(season = c(0.5, 
+    19.5), year = c(min(lynx_train$year - 
+    1), min(lynx_train$year), max(lynx_test$year), 
+    max(lynx_test$year) + 1)), data = lynx_train, 
+    family = "poisson", method = "REML")
+

Inspect the model's summary and estimated smooth functions for the season, year and lag terms

+
summary(lynx_mgcv)
+
## 
+## Family: poisson 
+## Link function: log 
+## 
+## Formula:
+## population ~ s(season, bs = "cc", k = 19) + s(year, bs = "bs", 
+##     m = c(3, 2, 1, 0)) + s(lag1, k = 5) + s(lag2, k = 5)
+## 
+## Parametric coefficients:
+##             Estimate Std. Error z value Pr(>|z|)    
+## (Intercept) 6.666631   0.007511   887.6   <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Approximate significance of smooth terms:
+##              edf Ref.df Chi.sq p-value    
+## s(season) 16.229 17.000 6770.2  <2e-16 ***
+## s(year)    1.990  2.000  244.2  <2e-16 ***
+## s(lag1)    3.984  3.999  712.0  <2e-16 ***
+## s(lag2)    3.892  3.993  488.1  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## R-sq.(adj) =  0.967   Deviance explained = 97.7%
+## -REML = 880.28  Scale est. = 1         n = 40
+
plot(lynx_mgcv, select = 1)
+plot(lynx_mgcv, select = 2)
+plot(lynx_mgcv, select = 3)
+plot(lynx_mgcv, select = 4)
+

+

This model captures most of the deviance in the series and the functions are all confidently estimated to be non-zero and non-flat. So far, so good. Now for some forecasts for the out of sample period. First we must take posterior draws of smooth beta coefficients to incorporate the uncertainties around smooth functions when simulating forecast paths

+
coef_sim <- gam.mh(lynx_mgcv)$bs
+

Now we define a function to perform forecast simulations from the nonlinear lag model in a recursive fashion. Using starting values for the last two lags, the function will iteratively project the path ahead with a random sample from the model's coefficient posterior

+
recurse_nonlin = function(model, lagged_vals, 
+    h) {
+    # Initiate state vector
+    states <- rep(NA, length = h + 2)
+    # Last two values of the conditional
+    # expectations begin the state vector
+    states[1] <- as.numeric(exp(lagged_vals[2]))
+    states[2] <- as.numeric(exp(lagged_vals[1]))
+    # Get a random sample of the smooth
+    # coefficient uncertainty matrix to use
+    # for the entire forecast horizon of this
+    # particular path
+    gam_coef_index <- sample(seq(1, NROW(coef_sim)), 
+        1, T)
+    # For each following timestep,
+    # recursively predict based on the
+    # predictions at each previous lag
+    for (t in 3:(h + 2)) {
+        # Build the GAM linear predictor matrix
+        # using the two previous lags of the
+        # (log) density
+        newdata <- data.frame(lag1 = log(states[t - 
+            1] + 0.01), lag2 = log(states[t - 
+            2] + 0.01), season = lynx_test$season[t - 
+            2], year = lynx_test$year[t - 
+            2])
+        colnames(newdata) <- c("lag1", "lag2", 
+            "season", "year")
+        Xp <- predict(model, newdata = newdata, 
+            type = "lpmatrix")
+        # Calculate the posterior prediction for
+        # this timepoint
+        mu <- rpois(1, lambda = exp(Xp %*% 
+            coef_sim[gam_coef_index, ]))
+        # Fill in the state vector and iterate to
+        # the next timepoint
+        states[t] <- mu
+    }
+    # Return the forecast path
+    states[-c(1:2)]
+}
+

Create the GAM's forecast distribution by generating 1000 simulated forecast paths. Each path is fed the true observed values for the last two lags of the first out of sample timepoint, but they can deviate when simulating ahead depending on their particular draw of possible coefficients. Note, this is a bit slow and could easily be parallelised to speed up computations

+
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
+for (i in 1:1000) {
+    gam_sims[i, ] <- recurse_nonlin(lynx_mgcv, 
+        lagged_vals = c(lynx_test$lag1[1], 
+            lynx_test$lag2[1]), h = 10)
+}
+

Plot the mgcv model's out of sample forecast for the next 10 years ahead

+
cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
+    0.95))
+yupper <- max(cred_ints) * 1.25
+plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)), 
+    type = "l", col = rgb(1, 0, 0, alpha = 0), 
+    ylim = c(0, yupper), ylab = "Predicted lynx trappings", 
+    xlab = "Forecast horizon", main = "mgcv")
+polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
+    c(cred_ints[1, ], rev(cred_ints[3, ])), 
+    col = rgb(150, 0, 0, max = 255, alpha = 100), 
+    border = NA)
+cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
+    0.68))
+polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
+    c(cred_ints[1, ], rev(cred_ints[3, ])), 
+    col = rgb(150, 0, 0, max = 255, alpha = 180), 
+    border = NA)
+lines(cred_ints[2, ], col = rgb(150, 0, 0, 
+    max = 255), lwd = 2, lty = "dashed")
+points(lynx_test$population[1:10], pch = 16)
+lines(lynx_test$population[1:10])
+

+

A decent forecast? The shape is certainly correct, but the 95% uncertainty intervals appear to be far too wide (i.e. our upper interval extends to up to 8 times the maximum number of trappings that have ever been recorded up to this point). This is almost entirely due to the extrapolation behaviour of the B spline, as the lag smooth functions are not encountering values very far outside the ranges they've already been trained on so they are resorting mostly to interpolation. But a better way to evaluate than simply using visuals is to calculate a probabilistic score. Here we use the Discrete Rank Probabiilty Score, which gives us an indication of how well calibrated our forecast's uncertainty intervals are by comparing the mass of the forecast density against the true observed values. Forecasts with overly wide intervals are penalised, as are forecasts with overly narrow intervals that do not contain the true observations. At the same time we calculate coverage of the forecast's 90% intervals, which is another useful way of evaluating different forecast proposals

+
# Discrete Rank Probability Score and
+# coverage of 90% interval
+drps_score <- function(truth, fc, interval_width = 0.9) {
+    nsum <- 1000
+    Fy = ecdf(fc)
+    ysum <- 0:nsum
+    indicator <- ifelse(ysum - truth >= 0, 
+        1, 0)
+    score <- sum((indicator - Fy(ysum))^2)
+    
+    # Is value within 90% HPD?
+    interval <- hpd(fc, interval_width)
+    in_interval <- ifelse(truth <= interval[3] & 
+        truth >= interval[1], 1, 0)
+    return(c(score, in_interval))
+}
+
+# Wrapper to operate on all observations
+# in fc_horizon
+drps_mcmc_object <- function(truth, fc, interval_width = 0.9) {
+    indices_keep <- which(!is.na(truth))
+    if (length(indices_keep) == 0) {
+        scores = data.frame(drps = rep(NA, 
+            length(truth)), interval = rep(NA, 
+            length(truth)))
+    } else {
+        scores <- matrix(NA, nrow = length(truth), 
+            ncol = 2)
+        for (i in indices_keep) {
+            scores[i, ] <- drps_score(truth = as.vector(truth)[i], 
+                fc = fc[, i], interval_width)
+        }
+    }
+    scores
+}
+
+# Calculate DRPS over the 10-year horizon
+# for the mgcv model
+lynx_mgcv1_drps <- drps_mcmc_object(truth = lynx_test$population[1:10], 
+    fc = gam_sims)
+

What if we remove the yearly trend and let the lag smooths capture more of the temporal dependencies? Will that improve the forecast distribution? Run a second model and plot the forecast (note that this plot will be on quite a different y-axis scale compared to the first plot above)

+
lynx_mgcv2 = gam(population ~ s(season, bs = "cc", 
+    k = 19) + s(lag1, k = 5) + s(lag2, k = 5), 
+    data = lynx_train, family = "poisson", 
+    method = "REML")
+coef_sim <- gam.mh(lynx_mgcv2)$bs
+gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
+for (i in 1:1000) {
+    gam_sims[i, ] <- recurse_nonlin(lynx_mgcv2, 
+        lagged_vals = c(lynx_test$lag1[1], 
+            lynx_test$lag2[1]), h = 10)
+}
+cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
+    0.95))
+yupper <- max(cred_ints) * 1.25
+plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)), 
+    type = "l", col = rgb(1, 0, 0, alpha = 0), 
+    ylim = c(0, yupper), ylab = "Predicted lynx trappings", 
+    xlab = "Forecast horizon", main = "mgcv")
+polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
+    c(cred_ints[1, ], rev(cred_ints[3, ])), 
+    col = rgb(150, 0, 0, max = 255, alpha = 100), 
+    border = NA)
+cred_ints <- apply(gam_sims, 2, function(x) hpd(x, 
+    0.68))
+polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
+    c(cred_ints[1, ], rev(cred_ints[3, ])), 
+    col = rgb(150, 0, 0, max = 255, alpha = 180), 
+    border = NA)
+lines(cred_ints[2, ], col = rgb(150, 0, 0, 
+    max = 255), lwd = 2, lty = "dashed")
+points(lynx_test$population[1:10], pch = 16)
+lines(lynx_test$population[1:10])
+

+

Calculate DRPS over the 10-year horizon for the second mgcv model

+
lynx_mgcv2_drps <- drps_mcmc_object(truth = lynx_test$population[1:10], 
+    fc = gam_sims)
+

This forecast is highly overconfident, with very unrealistic uncertainty intervals due to the interpolation behaviours of the lag smooths. You can certainly keep trying different formulations (our experience is that the B spline variant above produces the best forecasts from any tested mgcv model, but we did not test an exhaustive set), but hopefully it is clear that forecasting using splines is tricky business and it is likely that each time you do it you'll end up honing in on different combinations of penalties, knot selections etc.... Now we will fit an mvgam model for comparison. This model fits a similar model to the mgcv model directly above but with a full time series model for the errors (in this case an AR1 process), rather than smoothing splines that do not incorporate a concept of the future. We do not use a year term to reduce any possible extrapolation and because the latent dynamic component should capture this temporal variation. We estimate the model in JAGS using MCMC sampling (Note that JAGS 4.3.0 is required; installation links are found here).

+
lynx_mvgam <- mvjagam(data_train = lynx_train, 
+    data_test = lynx_test, formula = y ~ 
+        s(season, bs = "cc", k = 19), knots = list(season = c(0.5, 
+        19.5)), family = "poisson", trend_model = "AR1", 
+    burnin = 5000, chains = 4)
+
## Compiling rjags model...
+## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
+## 'localhost'
+## Simulation complete
+## Note: Summary statistics were not produced as there are >50 monitored
+## variables
+## [To override this behaviour see ?add.summary and ?runjags.options]
+## FALSEFinished running the simulation
+## NOTE: Stopping adaptation
+

Calculate the out of sample forecast from the fitted mvgam model and plot

+
fits <- MCMCvis::MCMCchains(lynx_mvgam$jags_output, 
+    "ypred")
+fits <- fits[, (NROW(lynx_mvgam$obs_data) + 
+    1):(NROW(lynx_mvgam$obs_data) + 10)]
+cred_ints <- apply(fits, 2, function(x) hpd(x, 
+    0.95))
+yupper <- max(cred_ints) * 1.25
+plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)), 
+    type = "l", col = rgb(1, 0, 0, alpha = 0), 
+    ylim = c(0, yupper), ylab = "", xlab = "Forecast horizon", 
+    main = "mvgam")
+polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
+    c(cred_ints[1, ], rev(cred_ints[3, ])), 
+    col = rgb(150, 0, 0, max = 255, alpha = 100), 
+    border = NA)
+cred_ints <- apply(fits, 2, function(x) hpd(x, 
+    0.68))
+polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))), 
+    c(cred_ints[1, ], rev(cred_ints[3, ])), 
+    col = rgb(150, 0, 0, max = 255, alpha = 180), 
+    border = NA)
+lines(cred_ints[2, ], col = rgb(150, 0, 0, 
+    max = 255), lwd = 2, lty = "dashed")
+points(lynx_test$population[1:10], pch = 16)
+lines(lynx_test$population[1:10])
+

+

Calculate DRPS over the 10-year horizon for the mvgam model

+
lynx_mvgam_drps <- drps_mcmc_object(truth = lynx_test$population[1:10], 
+    fc = fits)
+

How do the out of sample DRPS scores stack up for these three models? Remember, our goal is to minimise DRPS while providing 90% intervals that are near, but not less than, 0.9. The DRPS and 90% interval coverage for the first mgcv model (with the B spline year term)

+
sum(lynx_mgcv1_drps[, 1])
+
## [1] 1576.198
+
mean(lynx_mgcv1_drps[, 2])
+
## [1] 0.8
+

For the second mgcv model

+
sum(lynx_mgcv2_drps[, 1])
+
## [1] 2041.019
+
mean(lynx_mgcv2_drps[, 2])
+
## [1] 0
+

And for the mvgam model

+
sum(lynx_mvgam_drps[, 1])
+
## [1] 737.153
+
mean(lynx_mvgam_drps[, 2])
+
## [1] 1
+

The mvgam has much more realistic uncertainty than the mgcv versions above. Of course this is just one out of sample comparison, and to really determine which model is most appropriate for forecasting we would want to run many of these tests using a rolling window approach. Have a look at this model's summary to see what is being estimated (note that longer MCMC runs would probably be needed to increase effective sample sizes)

+
summary_mvgam(lynx_mvgam)
+
## GAM formula:
+
## y ~ s(season, bs = "cc", k = 19)
+
## 
+
## Family:
+
## Poisson
+
## 
+
## N series:
+
## 1
+
## 
+
## N observations per series:
+
## 40
+
## 
+
## GAM smooth term approximate significances:
+
##           edf Ref.df Chi.sq p-value    
+## s(season)  16     17  123.1  <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+
## 
+
## GAM coefficient (beta) estimates:
+
##                    2.5%         50%       97.5% Rhat n.eff
+## (Intercept)   6.3147932  6.66215103  6.89250988 1.17    34
+## s(season).1  -1.3128155 -0.67418798 -0.07776714 1.25    97
+## s(season).2  -0.3940034  0.17016843  0.77839705 1.23    95
+## s(season).3   0.3739162  0.95359629  1.51739338 1.27    47
+## s(season).4   0.8946542  1.49719833  2.30096068 1.34    41
+## s(season).5   1.2548317  1.75534210  2.61569981 1.34    27
+## s(season).6   0.6014648  1.30104444  1.97618846 1.11    40
+## s(season).7  -0.5940465  0.08063823  0.69005974 1.06    68
+## s(season).8  -1.5610967 -0.86593061 -0.15707434 1.07   119
+## s(season).9  -1.9056630 -0.96684704 -0.22116190 1.02   113
+## s(season).10 -1.5232002 -0.51073646  0.20061146 1.04    83
+## s(season).11 -0.6061202  0.31370531  1.03244342 1.14    64
+## s(season).12  0.3937823  1.32583475  2.01290396 1.29    54
+## s(season).13  0.8141474  1.55456240  2.34072562 1.36    43
+## s(season).14  0.5380122  1.21424495  1.94509201 1.35    34
+## s(season).15 -0.4012842  0.14697983  0.76468321 1.18    87
+## s(season).16 -1.2059968 -0.62834129 -0.07789867 1.02   168
+## s(season).17 -1.5776786 -1.05558099 -0.48019170 1.05   119
+
## 
+
## GAM smoothing parameter (rho) estimates:
+
##               2.5%      50%    97.5% Rhat n.eff
+## s(season) 3.534611 4.446877 5.209046 1.01  1090
+
## 
+
## Latent trend drift (phi) and AR parameter estimates:
+
##         2.5%       50%     97.5% Rhat n.eff
+## phi 0.000000 0.0000000 0.0000000  NaN     0
+## ar1 0.466993 0.6939366 0.8978668    1  1259
+## ar2 0.000000 0.0000000 0.0000000  NaN     0
+## ar3 0.000000 0.0000000 0.0000000  NaN     0
+
## 
+

Now inspect each model's estimated smooth for the 19-year cyclic pattern. Note that the mvgam smooth plot is on a different scale compared to the mgcv plot, but interpretation is similar. The mgcv smooth is much wigglier, likely because it is compensating for any remaining autocorrelation not captured by the lag smooths. We could probably remedy this by reducing k in the seasonal smooth for the mgcv model (in practice this works well, but leaving k larger for the mvgam's seasonal smooth is recommended as our experience is that this tends to lead to better performance and convergence)

+
plot(lynx_mgcv, select = 1, shade = T)
+plot_mvgam_smooth(lynx_mvgam, 1, "season")
+

+

We can also view the mvgam's posterior predictions for the entire series (testing and training)

+
plot_mvgam_fc(lynx_mvgam, data_test = lynx_test)
+

+

And the estimated trend

+
plot_mvgam_trend(lynx_mvgam, data_test = lynx_test)
+

+

A key aspect of ecological forecasting is to understand how different components of a model contribute to forecast uncertainty. We can estimate contributions to forecast uncertainty for the GAM smooth functions and the latent trend using mvgam

+
plot_mvgam_uncertainty(lynx_mvgam, data_test = lynx_test)
+

+

Both components contribute to forecast uncertainty, with the trend component contributing more over time (as it should since this is the stochastic forecast component). This suggests we would still need some more work to learn about factors driving the dynamics of the system. But we will leave the model as-is for this example. Diagnostics of the model can also be performed using mvgam. Have a look at the model's residuals, which are posterior medians of Dunn-Smyth randomised quantile residuals so should follow approximate normality. We are primarily looking for a lack of autocorrelation, which would suggest our AR1 model is appropriate for the latent trend

+
plot_mvgam_resids(lynx_mvgam)
+

+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/NEON_manuscript/Case studies/mvgam_case_study2.Rmd b/NEON_manuscript/Case studies/mvgam_case_study2.Rmd index cb1bef58..dd18b236 100644 --- a/NEON_manuscript/Case studies/mvgam_case_study2.Rmd +++ b/NEON_manuscript/Case studies/mvgam_case_study2.Rmd @@ -132,17 +132,17 @@ Plot the series to see how similar their seasonal shapes are over time plot(series, legend.loc = 'topleft') ``` -Now we will fit an `mvgam` model with shared seasonality and random intercepts per series. Our first attempt will ignore any temporal component in the residuals so that we can identidy which GAM predictor combination gives us the best fit, prior to investigating how to deal with any remaining autocorrelation. We assume a Negative Binomial distrubution with shared overdispersion parameter for the response. Also note that any smooths using the random effects basis (`s(series, bs = "re")` below) are automatically re-parameterised to use the [non-centred parameterisation that is necessary to help avoid common posterior degeneracies in hierarchical models](https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html) +Now we will fit an `mvgam` model with shared seasonality and random intercepts per series. Our first attempt will ignore any temporal component in the residuals so that we can identidy which GAM predictor combination gives us the best fit, prior to investigating how to deal with any remaining autocorrelation. We assume a Negative Binomial distrubution with series-specific overdispersion parameters for the response. We use a complexity-penalising prior for the overdispersion, which allows the model to reduce toward a more simple Poisson observation process unless the data provide adequate information to support overdispersion. Also note that any smooths using the random effects basis (`s(series, bs = "re")` below) are automatically re-parameterised to use the [non-centred parameterisation that is necessary to help avoid common posterior degeneracies in hierarchical models](https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html). We suppress the global intercept as it is not needed and will lead to identifiability issues when estimating the series-specific random intercepts ```{r, message=FALSE, warning=FALSE} trends_mod1 <- mvjagam(data_train = trends_data$data_train, data_test = trends_data$data_test, formula = y ~ s(series, bs = 're') + - s(season, k = 12, m = 2, bs = 'cc'), + s(season, k = 12, m = 2, bs = 'cc') - 1, knots = list(season = c(0.5, 12.5)), trend_model = 'None', family = 'nb', chains = 4, - burnin = 5000) + burnin = 8000) ``` Given that these series could potentially be following a hierarchical seasonality, we will also trial a slghtly more complex model with an extra smoothing term per series that allows its seasonal curve to deviate from the global seasonal smooth. Ignore the warning about repeated smooths, as this is not an issue for estimation. @@ -155,10 +155,10 @@ trends_mod2 <- mvjagam(data_train = trends_data$data_train, trend_model = 'None', family = 'nb', chains = 4, - burnin = 5000) + burnin = 8000) ``` -How can we compare these models to ensure we choose one that performs well and provides useful inferences? Beyond posterior retrodictive and predictive checks, we can take advantage of the fact that `mvgam` fits an `mgcv` model to provide all the necessary penalty matrices, as well as to identify good initial values for smoothing and beta parameters. Because we did not modify this model by adding a trend component we can still inspect the usual `mgcv` model comparison routines +How can we compare these models to ensure we choose one that performs well and provides useful inferences? Beyond posterior retrodictive and predictive checks, we can take advantage of the fact that `mvgam` fits an `mgcv` model to provide all the necessary penalty matrices, as well as to identify good initial values for smoothing and beta parameters. Because we did not modify this model by adding a trend component (the only modification is that we estimated series-specific overdispersion parameters), we can still inspect the usual `mgcv` model comparison routines ```{r} anova(trends_mod1$mgcv_model, trends_mod2$mgcv_model, test = 'LRT') @@ -186,20 +186,19 @@ runjags::extract(trends_mod1$jags_model, 'dic') runjags::extract(trends_mod2$jags_model, 'dic') ``` -Model 2 seems to fit better so far, suggesting that hierarchical seasonality gives better performance for these series. But a problem with both of the above models is that their forecast uncertainties will not increase into the future, which is not how time series forecasts should behave. Here we fit Model 2 again but specifying a time series model for the latent trends. We assume the dynamic trends can be represented using latent factors that each follow an AR1 process, and we will rely on the exponential penalties to help regularise any un-needed factors by setting `n_lv = 4` +Model 1 seems to fit better so far, suggesting that hierarchical seasonality does not give better performance for these series. But a problem with both of the above models is that their forecast uncertainties will not increase into the future, which is not how time series forecasts should behave. Here we fit Model 1 again but specifying a time series model for the latent trends. We assume the dynamic trends can be represented using latent factors that each follow an AR1 process, and we will rely on the exponential penalties to help regularise any un-needed factors by setting `n_lv = 4` ```{r, message=FALSE, warning=FALSE} trends_mod3 <- mvjagam(data_train = trends_data$data_train, data_test = trends_data$data_test, - formula = y ~ s(season, k = 12, m = 2, bs = 'cc') + - s(season, series, k = 5, bs = 'fs', m = 1), + formula = y ~ s(series, bs = 're') + + s(season, k = 12, m = 2, bs = 'cc') - 1, knots = list(season = c(0.5, 12.5)), trend_model = 'AR1', use_lv = T, n_lv = 4, family = 'nb', chains = 4, - drift = T, - burnin = 5000) + burnin = 8000) ``` @@ -215,7 +214,7 @@ plot_mvgam_factors(trends_mod3) How do forecasts for this model compare to the previous one that did not include any trend component? ```{r, fig.width = 6, fig.height = 5, fig.align='center'} -compare_mvgams(trends_mod2, trends_mod3, fc_horizon = 6, +compare_mvgams(trends_mod1, trends_mod3, fc_horizon = 6, n_cores = 3, n_evaluations = 20) ``` @@ -242,7 +241,7 @@ plot_mvgam_resids(trends_mod3, 3) plot_mvgam_resids(trends_mod3, 4) ``` -Perform posterior predictive checks to see if the model is able to simulate data that looks realistic and unbiased by examining simulated kernel densities for posterior predictions (`yhat`) compared to the density of the observations (`y`). This will be particularly useful for examining whether the shared overdispersion parameter is able to produce realistic looking simulations for each individual series. +Perform posterior predictive checks to see if the model is able to simulate data that looks realistic and unbiased by examining simulated kernel densities for posterior predictions (`yhat`) compared to the density of the observations (`y`). This will be particularly useful for examining whether the Negative Binomial observation model is able to produce realistic looking simulations for each individual series. ```{r, fig.width = 5, fig.height = 4, fig.align='center'} plot_mvgam_ppc(trends_mod3, series = 1, type = 'density') ``` @@ -298,7 +297,7 @@ plot_mvgam_trend(object = trends_mod3, series = 3, data_test = trends_data$data_ plot_mvgam_trend(object = trends_mod3, series = 4, data_test = trends_data$data_test) ``` -Given that we fit a model with hierarchical seasonality, the seasonal smooths are allowed to differ somewhat +Given that we fit a model with sharedseasonality, the seasonal smooths will not differ ```{r, fig.width = 4, fig.height = 3, fig.align='center'} plot_mvgam_smooth(object = trends_mod3, series = 1, smooth = 'season') ``` @@ -345,7 +344,7 @@ plot(stl(ts(as.vector(series$`dog tick`), frequency = 12), 'periodic')) plot(stl(ts(as.vector(series$`tick bite`), frequency = 12), 'periodic')) ``` -A logical next step for this analysis would be to trial varying overdispersion parameters for each series. This may be necessary as the residual diagnostic plots and forecast period posterior predictive checks suggest that the estimated Negative Binomial overdispersion parameter is more suitable for some series than for others: +Forecast period posterior predictive checks suggest that the model still has room for improvement: ```{r, fig.width = 5, fig.height = 4, fig.align='center'} plot_mvgam_ppc(trends_mod3, series = 1, type = 'density', data_test = trends_data$data_test) ``` diff --git a/NEON_manuscript/Case studies/mvgam_case_study2.html b/NEON_manuscript/Case studies/mvgam_case_study2.html index 33e36e52..f0c6cf9c 100644 --- a/NEON_manuscript/Case studies/mvgam_case_study2.html +++ b/NEON_manuscript/Case studies/mvgam_case_study2.html @@ -219,16 +219,16 @@

Nicholas Clark (n.clark@uq

Look at a few plots. The estimated smooth function

plot_mvgam_smooth(object = mod1, series = 1, 
     smooth = "season")
-

+

And the true seasonal function in the simulation

plot(dat$global_seasonality[1:12], type = "l")

Check whether each factor was retained using the plot_mvgam_factors function. Here, each factor is tested against a null hypothesis of white noise by calculating the sum of the factor's 1st derivatives. A factor that has a larger contribution to the series' latent trends will have a larger sum, both because that factor's absolute magnitudes will be larger (due to the weaker penalty on the factor's precision) and because the factor will move around more. By normalising these estimated first derivative sums, it should be apparent whether some factors have been dropped from the model. Here we see that each factor is contributing to the series' latent trends, and the plots show that neither has been forced to evolve as white noise

plot_mvgam_factors(mod1)
- +

Now we fit the same model but assume that we no nothing about how many factors to use, so we specify the maximum allowed (the total number of series; 8). Note that this model is computationally more expensive so it will take longer to fit

@@ -251,15 +251,15 @@

Nicholas Clark (n.clark@uq

The same plots as model 1 to see if this model has also fit the data well

plot_mvgam_smooth(object = mod2, series = 1, 
     smooth = "season")
-

+

plot(dat$global_seasonality[1:12], type = "l")

Examining the factor contributions gives us some insight into whether we set n_lv larger than we perhaps needed to. These contributions can be interpreted similarly to ordination axes when deciding how many latent variables to specify

plot_mvgam_factors(mod2)
- +

The very weak contributions by some of the factors are a result of the penalisation, which will become more important as the dimensionality of the data grows. Now onto an empirical example. Here we will access monthly search volume data from Google Trends, focusing on relative importances of search terms related to tick paralysis in Queensland, Australia

@@ -285,14 +285,15 @@

Nicholas Clark (n.clark@uq train_prop = 0.9)

Plot the series to see how similar their seasonal shapes are over time

plot(series, legend.loc = "topleft")
-

-

Now we will fit an mvgam model with shared seasonality and random intercepts per series. Our first attempt will ignore any temporal component in the residuals so that we can identidy which GAM predictor combination gives us the best fit, prior to investigating how to deal with any remaining autocorrelation. We assume a Negative Binomial distrubution with shared overdispersion parameter for the response. Also note that any smooths using the random effects basis (s(series, bs = "re") below) are automatically re-parameterised to use the non-centred parameterisation that is necessary to help avoid common posterior degeneracies in hierarchical models

+

+

Now we will fit an mvgam model with shared seasonality and random intercepts per series. Our first attempt will ignore any temporal component in the residuals so that we can identidy which GAM predictor combination gives us the best fit, prior to investigating how to deal with any remaining autocorrelation. We assume a Negative Binomial distrubution with series-specific overdispersion parameters for the response. We use a complexity-penalising prior for the overdispersion, which allows the model to reduce toward a more simple Poisson observation process unless the data provide adequate information to support overdispersion. Also note that any smooths using the random effects basis (s(series, bs = "re") below) are automatically re-parameterised to use the non-centred parameterisation that is necessary to help avoid common posterior degeneracies in hierarchical models. We suppress the global intercept as it is not needed and will lead to identifiability issues when estimating the series-specific random intercepts

trends_mod1 <- mvjagam(data_train = trends_data$data_train, 
     data_test = trends_data$data_test, formula = y ~ 
         s(series, bs = "re") + s(season, 
-            k = 12, m = 2, bs = "cc"), knots = list(season = c(0.5, 
-        12.5)), trend_model = "None", family = "nb", 
-    chains = 4, burnin = 5000)
+ k = 12, m = 2, bs = "cc") - 1, + knots = list(season = c(0.5, 12.5)), + trend_model = "None", family = "nb", + chains = 4, burnin = 8000)
## Compiling rjags model...
 ## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
 ## 'localhost'
@@ -309,7 +310,7 @@ 

Nicholas Clark (n.clark@uq s(season, series, k = 5, bs = "fs", m = 1), knots = list(season = c(0.5, 12.5)), trend_model = "None", family = "nb", - chains = 4, burnin = 5000)

+ chains = 4, burnin = 8000)
## Compiling rjags model...
 ## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
 ## 'localhost'
@@ -319,46 +320,41 @@ 

Nicholas Clark (n.clark@uq ## [To override this behaviour see ?add.summary and ?runjags.options] ## FALSEFinished running the simulation ## NOTE: Stopping adaptation

-

How can we compare these models to ensure we choose one that performs well and provides useful inferences? Beyond posterior retrodictive and predictive checks, we can take advantage of the fact that mvgam fits an mgcv model to provide all the necessary penalty matrices, as well as to identify good initial values for smoothing and beta parameters. Because we did not modify this model by adding a trend component we can still inspect the usual mgcv model comparison routines

+

How can we compare these models to ensure we choose one that performs well and provides useful inferences? Beyond posterior retrodictive and predictive checks, we can take advantage of the fact that mvgam fits an mgcv model to provide all the necessary penalty matrices, as well as to identify good initial values for smoothing and beta parameters. Because we did not modify this model by adding a trend component (the only modification is that we estimated series-specific overdispersion parameters), we can still inspect the usual mgcv model comparison routines

anova(trends_mod1$mgcv_model, trends_mod2$mgcv_model, 
     test = "LRT")
AIC(trends_mod1$mgcv_model, trends_mod2$mgcv_model)
summary(trends_mod1$mgcv_model)
## 
-## Family: Negative Binomial(3.919) 
+## Family: Negative Binomial(4.829) 
 ## Link function: log 
 ## 
 ## Formula:
-## y ~ s(series, bs = "re") + s(season, k = 12, m = 2, bs = "cc")
-## 
-## Parametric coefficients:
-##             Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)    2.019      0.424   4.761 2.74e-06 ***
-## ---
-## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## y ~ s(series, bs = "re") + s(season, k = 12, m = 2, bs = "cc") - 
+##     1
 ## 
 ## Approximate significance of smooth terms:
 ##             edf Ref.df      F p-value    
-## s(series) 2.981      3 118.77  <2e-16 ***
-## s(season) 5.157     10  15.27  <2e-16 ***
+## s(series) 3.996      4 1194.2  <2e-16 ***
+## s(season) 5.712     10  533.6  <2e-16 ***
 ## ---
 ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 ## 
-## R-sq.(adj) =  0.558   Deviance explained = 51.1%
-## fREML = 600.14  Scale est. = 1         n = 392
+## R-sq.(adj) = 0.6 Deviance explained = 54.6% +## fREML = 607.22 Scale est. = 1 n = 392
summary(trends_mod2$mgcv_model)
## 
-## Family: Negative Binomial(4.521) 
+## Family: Negative Binomial(4.912) 
 ## Link function: log 
 ## 
 ## Formula:
@@ -367,43 +363,42 @@ 

Nicholas Clark (n.clark@uq ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 2.003 0.422 4.746 2.96e-06 *** +## (Intercept) 1.8548 0.4101 4.523 8.17e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: -## edf Ref.df F p-value -## s(season) 5.055 10 3.879 <2e-16 *** -## s(season,series) 10.154 19 20.828 <2e-16 *** +## edf Ref.df F p-value +## s(season) 5.704 10 16.23 <2e-16 *** +## s(season,series) 4.097 19 18.54 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## R-sq.(adj) = 0.593 Deviance explained = 54.2% -## fREML = 619.8 Scale est. = 1 n = 392

+## R-sq.(adj) = 0.605 Deviance explained = 54.9% +## fREML = 603.28 Scale est. = 1 n = 392

WE can also use JAGS routines for interrogating models. Which model minimises in-sample DIC?

runjags::extract(trends_mod1$jags_model, 
     "dic")
## Compiling rjags model and adapting for 1000 iterations...
 ## Obtaining DIC samples from 2000 iterations...
-
## Mean deviance:  2307 
-## penalty 10.73 
-## Penalized deviance: 2317
+
## Mean deviance:  2138 
+## penalty 14.87 
+## Penalized deviance: 2153
runjags::extract(trends_mod2$jags_model, 
     "dic")
## Compiling rjags model and adapting for 1000 iterations...
 ## Obtaining DIC samples from 2000 iterations...
-
## Mean deviance:  2285 
-## penalty 21.48 
-## Penalized deviance: 2306
-

Model 2 seems to fit better so far, suggesting that hierarchical seasonality gives better performance for these series. But a problem with both of the above models is that their forecast uncertainties will not increase into the future, which is not how time series forecasts should behave. Here we fit Model 2 again but specifying a time series model for the latent trends. We assume the dynamic trends can be represented using latent factors that each follow an AR1 process, and we will rely on the exponential penalties to help regularise any un-needed factors by setting n_lv = 4

+
## Mean deviance:  2133 
+## penalty 20.57 
+## Penalized deviance: 2154
+

Model 1 seems to fit better so far, suggesting that hierarchical seasonality does not give better performance for these series. But a problem with both of the above models is that their forecast uncertainties will not increase into the future, which is not how time series forecasts should behave. Here we fit Model 1 again but specifying a time series model for the latent trends. We assume the dynamic trends can be represented using latent factors that each follow an AR1 process, and we will rely on the exponential penalties to help regularise any un-needed factors by setting n_lv = 4

trends_mod3 <- mvjagam(data_train = trends_data$data_train, 
     data_test = trends_data$data_test, formula = y ~ 
-        s(season, k = 12, m = 2, bs = "cc") + 
-            s(season, series, k = 5, bs = "fs", 
-                m = 1), knots = list(season = c(0.5, 
-        12.5)), trend_model = "AR1", use_lv = T, 
-    n_lv = 4, family = "nb", chains = 4, 
-    drift = T, burnin = 5000)
+ s(series, bs = "re") + s(season, + k = 12, m = 2, bs = "cc") - 1, + knots = list(season = c(0.5, 12.5)), + trend_model = "AR1", use_lv = T, n_lv = 4, + family = "nb", chains = 4, burnin = 8000)
## Fitting a multivariate GAM with latent dynamic factors for the trends...
 ## Compiling rjags model...
 ## Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host
@@ -451,7 +446,7 @@ 

Nicholas Clark (n.clark@uq ## [33] "" ## [34] " ## AR components" ## [35] " for (s in 1:n_lv){" -## [36] " phi[s] ~ dnorm(0, 10)" +## [36] " phi[s] <- 0" ## [37] " ar1[s] ~ dnorm(0, 10)" ## [38] " ar2[s] <- 0" ## [39] " ar3[s] <- 0" @@ -525,59 +520,66 @@

Nicholas Clark (n.clark@uq ## [107] " ## Negative binomial likelihood functions" ## [108] " for (i in 1:n) {" ## [109] " for (s in 1:n_series) {" -## [110] " y[i, s] ~ dnegbin(rate[i, s], r)" -## [111] " rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps," -## [112] " (r / (r + mu[i, s])))" +## [110] " y[i, s] ~ dnegbin(rate[i, s], r[s])" +## [111] " rate[i, s] <- ifelse((r[s] / (r[s] + mu[i, s])) < min_eps, min_eps," +## [112] " (r[s] / (r[s] + mu[i, s])))" ## [113] " }" ## [114] " }" -## [115] " r ~ dexp(0.001)" -## [116] "" -## [117] " ## Posterior predictions" -## [118] " for (i in 1:n) {" -## [119] " for (s in 1:n_series) {" -## [120] " ypred[i, s] ~ dnegbin(rate[i, s], r)" -## [121] " }" +## [115] "" +## [116] " ## Complexity penalising prior for the overdispersion parameter;" +## [117] " ## where the likelihood reduces to a 'base' model (Poisson) unless" +## [118] " ## the data support overdispersion" +## [119] " for(s in 1:n_series){" +## [120] " r[s] <- 1 / r_raw[s]" +## [121] " r_raw[s] ~ dexp(0.5)" ## [122] " }" -## [123] " " -## [124] " ## parametric effect priors (regularised for identifiability)" -## [125] " for (i in 1:1) { b[i] ~ dnorm(p_coefs[i], p_taus[i]) }" -## [126] " ## prior for s(season)... " -## [127] " K1 <- S1[1:10,1:10] * lambda[1] " -## [128] " b[2:11] ~ dmnorm(zero[2:11],K1) " -## [129] " ## prior for s(season,series)... " -## [130] " for (i in c(12:15,17:20,22:25,27:30)) { b[i] ~ dnorm(0, lambda[2]) }" -## [131] " for (i in c(16,21,26,31)) { b[i] ~ dnorm(0, lambda[3]) }" -## [132] " ## smoothing parameter priors..." -## [133] " for (i in 1:3) {" -## [134] " lambda[i] ~ dexp(1/sp[i])" -## [135] " rho[i] <- log(lambda[i])" -## [136] " }" -## [137] "}"

+## [123] "" +## [124] " ## Posterior predictions" +## [125] " for (i in 1:n) {" +## [126] " for (s in 1:n_series) {" +## [127] " ypred[i, s] ~ dnegbin(rate[i, s], r[s])" +## [128] " }" +## [129] " }" +## [130] " " +## [131] " ## prior (non-centred) for s(series)..." +## [132] " for (i in 1:4) { b_raw[i] ~ dnorm(0, 1)" +## [133] " b[i] <- b_raw[i] * sigma_raw1 " +## [134] " }" +## [135] " sigma_raw1 ~ dexp(1)" +## [136] " ## prior for s(season)... " +## [137] " K2 <- S2[1:10,1:10] * lambda[2] " +## [138] " b[5:14] ~ dmnorm(zero[5:14],K2) " +## [139] " ## smoothing parameter priors..." +## [140] " for (i in 1:2) {" +## [141] " lambda[i] ~ dexp(1/sp[i])" +## [142] " rho[i] <- log(lambda[i])" +## [143] " }" +## [144] "}"

Inspection of the dynamic factors and their relative contributions indicates that the first factor is by far the most important

plot_mvgam_factors(trends_mod3)
- +

How do forecasts for this model compare to the previous one that did not include any trend component?

-
compare_mvgams(trends_mod2, trends_mod3, 
+
compare_mvgams(trends_mod1, trends_mod3, 
     fc_horizon = 6, n_cores = 3, n_evaluations = 20)
## DRPS summaries per model (lower is better)
-##             Min.   1st Qu.   Median     Mean  3rd Qu.     Max.
-## Model 1 8.101701 12.176904 14.52162 15.73781 19.15470 31.90079
-## Model 2 4.317081  8.441469 10.43314 11.67726 13.32969 30.46248
+##             Min.  1st Qu.    Median     Mean  3rd Qu.     Max.
+## Model 1 5.826137 9.639554 12.291386 13.53729 15.91000 30.84393
+## Model 2 3.461601 6.756917  8.463104 10.03756 11.90309 31.37406
 ## 
 ## 90% interval coverages per model (closer to 0.9 is better)
-## Model 1 0.9979167 
-## Model 2 0.9208333
-

+## Model 1 1 +## Model 2 0.9125
+

Model 3 (with the dynamic trend) provides far superior forecasts than relying only on the estimated smooths. Inspect the model summary (note again that p-value approximations are a work in progress here and so may not be totally reliable).

summary_mvgam(trends_mod3)
## GAM formula:
-
## y ~ s(season, k = 12, m = 2, bs = "cc") + s(season, series, k = 5, 
-##     bs = "fs", m = 1)
+
## y ~ s(series, bs = "re") + s(season, k = 12, m = 2, bs = "cc") - 
+##     1
## 
## Family:
## Negative Binomial
@@ -591,136 +593,121 @@

Nicholas Clark (n.clark@uq
## N observations per series:
## 98
## 
-
## Dispersion parameter estimate:
-
##      2.5%      50%    97.5% Rhat n.eff
-## r 8.05956 56.56785 2441.941 1.18   183
+
## Dispersion parameter estimates:
+
##           2.5%        50%      97.5% Rhat n.eff
+## r[1] 3.3584017  5.5804300  41.596852 1.17  5783
+## r[2] 0.4499711  0.7655672   1.712845 1.22  1827
+## r[3] 3.0929973  5.6888491  27.317153 1.05  4651
+## r[4] 5.0651734 10.2326561 138.094892 1.11  5705
## 
## GAM smooth term approximate significances:
-
##                     edf Ref.df      F p-value    
-## s(season)         6.617 10.000  1.115  0.0799 .  
-## s(season,series)  8.926 20.000 16.441  <2e-16 ***
+
##              edf Ref.df      F p-value    
+## s(series)  3.999  4.000 831.81  <2e-16 ***
+## s(season)  6.824 10.000  19.04  <2e-16 ***
 ## ---
 ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## GAM coefficient (beta) estimates:
-
##                            2.5%         50%       97.5% Rhat n.eff
-## (Intercept)          0.49081732  1.54053886  2.58258410 1.14    24
-## s(season).1         -0.43774935 -0.15106098  0.17520061 1.01    89
-## s(season).2         -0.74418298 -0.39577543 -0.09289377 1.03   107
-## s(season).3         -0.80004176 -0.36725786 -0.01339273 1.09    89
-## s(season).4         -0.73068115 -0.24485481  0.19808258 1.11    72
-## s(season).5         -0.62694112 -0.17067752  0.32235656 1.10    72
-## s(season).6         -0.46370820 -0.01584879  0.39928092 1.02    74
-## s(season).7         -0.20456979  0.30351197  0.66691363 1.02    67
-## s(season).8          0.07932437  0.54116629  0.89569089 1.05    80
-## s(season).9          0.02152754  0.33083800  0.69745897 1.05    84
-## s(season).10        -0.21389836  0.12879638  0.47657140 1.10    81
-## s(season,series).1  -0.50862694 -0.21191317  0.05523811 1.03   163
-## s(season,series).2  -0.19853312  0.08823503  0.38759533 1.02   393
-## s(season,series).3  -0.48327001 -0.15718235  0.15770678 1.07    59
-## s(season,series).4  -0.16625336 -0.01591696  0.13847319 1.06    82
-## s(season,series).5  -0.04938682  1.00474966  2.03881186 1.16    26
-## s(season,series).6  -0.51082952 -0.08447597  0.30861135 1.01  1919
-## s(season,series).7  -0.30250823  0.11827328  0.59979666 1.00  3091
-## s(season,series).8  -0.53247244 -0.13633950  0.23349896 1.03   411
-## s(season,series).9  -0.36520151 -0.14722658  0.06716595 1.02   429
-## s(season,series).10 -2.60869409 -1.24241792 -0.07117206 1.06    80
-## s(season,series).11 -0.09001946  0.22511721  0.53633512 1.02   254
-## s(season,series).12 -0.48211404 -0.14816937  0.17816446 1.02   572
-## s(season,series).13 -0.15984700  0.17875202  0.54271671 1.05    78
-## s(season,series).14 -0.12794784  0.03663710  0.20266260 1.04   109
-## s(season,series).15 -0.78301294  0.33847446  1.40796493 1.14    40
-## s(season,series).16 -0.29764854  0.01661297  0.31683453 1.02   269
-## s(season,series).17 -0.18421426  0.13999092  0.47509273 1.02   511
-## s(season,series).18 -0.31109705  0.04546471  0.37519762 1.05    91
-## s(season,series).19 -0.27198085 -0.10385901  0.05526921 1.04   105
-## s(season,series).20 -0.27692348  0.79292014  1.90755358 1.11    34
+
##                     2.5%         50%       97.5% Rhat n.eff
+## s(series).1   2.25161814  2.37192720  2.48814851 1.00  2279
+## s(series).2   0.28514880  0.60863907  0.90712282 1.00  2189
+## s(series).3   1.78364063  1.92803432  2.06067658 1.01  2393
+## s(series).4   2.27025820  2.39829491  2.50733730 1.02  1812
+## s(season).1  -0.38112106 -0.24595730 -0.11420910 1.01   832
+## s(season).2  -0.78453442 -0.58071520 -0.40324507 1.00   551
+## s(season).3  -0.72207575 -0.53878642 -0.36014771 1.01   624
+## s(season).4  -0.53344796 -0.34992147 -0.17822407 1.00   654
+## s(season).5  -0.31621849 -0.14552370  0.02206627 1.00   769
+## s(season).6  -0.15291451  0.02440621  0.19703077 1.01   633
+## s(season).7   0.15123269  0.33797101  0.52024561 1.01   609
+## s(season).8   0.54205552  0.72080789  0.90541571 1.01   588
+## s(season).9   0.38081157  0.55161726  0.71413119 1.00   679
+## s(season).10  0.08775866  0.22736269  0.36602647 1.00   800
## 
## GAM smoothing parameter (rho) estimates:
-
##                        2.5%       50%      97.5% Rhat n.eff
-## s(season)          4.657417  6.199455  7.4003621 1.02   263
-## s(season,series)   1.758397  2.628169  3.3530152 1.01  1122
-## s(season,series)2 -3.430563 -1.962096 -0.9573349 1.00  8000
+
##                2.5%       50%      97.5% Rhat n.eff
+## s(series) -4.869349 -1.716592 -0.1090149    1  7992
+## s(season)  4.672467  5.955537  6.9891581    1   969
## 
## Latent trend drift (phi) and AR parameter estimates:
-
##              2.5%        50%     97.5% Rhat n.eff
-## phi[1] -0.5265406 0.10233094 0.7476648 1.01   338
-## phi[2] -0.6032245 0.04908621 0.6435254 1.02   471
-## phi[3] -0.5121290 0.06920593 0.6654636 1.01   466
-## phi[4] -0.5657808 0.02903801 0.6027817 1.01   473
-## ar1[1] -0.1759793 0.12167447 0.4145937 1.00  2500
-## ar1[2] -0.3409213 0.09740945 0.4674631 1.00  1529
-## ar1[3] -0.2583167 0.12947142 0.4960456 1.00  2043
-## ar1[4] -0.3687174 0.12033481 0.5132579 1.00  1375
-## ar2[1]  0.0000000 0.00000000 0.0000000  NaN     0
-## ar2[2]  0.0000000 0.00000000 0.0000000  NaN     0
-## ar2[3]  0.0000000 0.00000000 0.0000000  NaN     0
-## ar2[4]  0.0000000 0.00000000 0.0000000  NaN     0
-## ar3[1]  0.0000000 0.00000000 0.0000000  NaN     0
-## ar3[2]  0.0000000 0.00000000 0.0000000  NaN     0
-## ar3[3]  0.0000000 0.00000000 0.0000000  NaN     0
-## ar3[4]  0.0000000 0.00000000 0.0000000  NaN     0
+
##              2.5%         50%     97.5% Rhat n.eff
+## phi[1]  0.0000000  0.00000000 0.0000000  NaN     0
+## phi[2]  0.0000000  0.00000000 0.0000000  NaN     0
+## phi[3]  0.0000000  0.00000000 0.0000000  NaN     0
+## phi[4]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar1[1] -0.5268512 -0.01852659 0.5489168 1.00  1420
+## ar1[2] -0.5316807  0.07457614 0.7712737 1.01   765
+## ar1[3] -0.5569900  0.04383589 0.6329749 1.00   953
+## ar1[4] -0.5827366  0.02283613 0.6315647 1.00  1092
+## ar2[1]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar2[2]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar2[3]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar2[4]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar3[1]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar3[2]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar3[3]  0.0000000  0.00000000 0.0000000  NaN     0
+## ar3[4]  0.0000000  0.00000000 0.0000000  NaN     0
## 

Look at Dunn-Smyth residuals for some series from this preferred model to ensure that our dynamic factor process has captured most of the temporal dependencies in the observations

plot_mvgam_resids(trends_mod3, 1)
-

+

plot_mvgam_resids(trends_mod3, 2)
-

+

plot_mvgam_resids(trends_mod3, 3)
-

+

plot_mvgam_resids(trends_mod3, 4)
-

-

Perform posterior predictive checks to see if the model is able to simulate data that looks realistic and unbiased by examining simulated kernel densities for posterior predictions (yhat) compared to the density of the observations (y). This will be particularly useful for examining whether the shared overdispersion parameter is able to produce realistic looking simulations for each individual series.

+

+

Perform posterior predictive checks to see if the model is able to simulate data that looks realistic and unbiased by examining simulated kernel densities for posterior predictions (yhat) compared to the density of the observations (y). This will be particularly useful for examining whether the Negative Binomial observation model is able to produce realistic looking simulations for each individual series.

plot_mvgam_ppc(trends_mod3, series = 1, type = "density")
-

+

plot_mvgam_ppc(trends_mod3, series = 2, type = "density")
-

+

plot_mvgam_ppc(trends_mod3, series = 3, type = "density")
-

+

plot_mvgam_ppc(trends_mod3, series = 4, type = "density")
-

+

Look at traceplots for the smoothing parameters (rho)

plot_mvgam_trace(object = trends_mod3, param = "rho")
-

+

Plot posterior predictive distributions for the training and testing periods for each series

plot_mvgam_fc(object = trends_mod3, series = 1, 
     data_test = trends_data$data_test)
-

+

plot_mvgam_fc(object = trends_mod3, series = 2, 
     data_test = trends_data$data_test)
-

+

plot_mvgam_fc(object = trends_mod3, series = 3, 
     data_test = trends_data$data_test)
-

+

plot_mvgam_fc(object = trends_mod3, series = 4, 
     data_test = trends_data$data_test)
-

+

Plot posterior distributions for the latent trend estimates, again for the training and testing periods

plot_mvgam_trend(object = trends_mod3, series = 1, 
     data_test = trends_data$data_test)
-

+

plot_mvgam_trend(object = trends_mod3, series = 2, 
     data_test = trends_data$data_test)
-

+

plot_mvgam_trend(object = trends_mod3, series = 3, 
     data_test = trends_data$data_test)
-

+

plot_mvgam_trend(object = trends_mod3, series = 4, 
     data_test = trends_data$data_test)
-

-

Given that we fit a model with hierarchical seasonality, the seasonal smooths are allowed to differ somewhat

+

+

Given that we fit a model with sharedseasonality, the seasonal smooths will not differ

plot_mvgam_smooth(object = trends_mod3, series = 1, 
     smooth = "season")
-

+

plot_mvgam_smooth(object = trends_mod3, series = 2, 
     smooth = "season")
-

+

plot_mvgam_smooth(object = trends_mod3, series = 3, 
     smooth = "season")
-

+

plot_mvgam_smooth(object = trends_mod3, series = 4, 
     smooth = "season")
-

+

Plot posterior mean estimates of latent trend correlations. These correlations are more useful than looking at latent factor loadings, for example to inspect ordinations. This is because the orders of the loadings (although constrained for identifiability purposes) can vary from chain to chain

correlations <- lv_correlations(object = trends_mod3)
library(ggplot2)
@@ -739,45 +726,45 @@ 

Nicholas Clark (n.clark@uq labs(x = "", y = "") + theme_dark() + theme(axis.text.x = element_text(angle = 45, hjust = 1))

-

+

There is certainly some evidence of positive trend correlations for a few of these search terms, which is not surprising given how similar some of them are and how closely linked they should be to interest about tick paralysis in Queensland. Plot some STL decompositions of these series to see if these trends are noticeable in the data

plot(stl(ts(as.vector(series$`tick paralysis`), 
     frequency = 12), "periodic"))
-

+

plot(stl(ts(as.vector(series$`paralysis tick dog`), 
     frequency = 12), "periodic"))
-

+

plot(stl(ts(as.vector(series$`dog tick`), 
     frequency = 12), "periodic"))
-

+

plot(stl(ts(as.vector(series$`tick bite`), 
     frequency = 12), "periodic"))
-

-

A logical next step for this analysis would be to trial varying overdispersion parameters for each series. This may be necessary as the residual diagnostic plots and forecast period posterior predictive checks suggest that the estimated Negative Binomial overdispersion parameter is more suitable for some series than for others:

+

+

Forecast period posterior predictive checks suggest that the model still has room for improvement:

plot_mvgam_ppc(trends_mod3, series = 1, type = "density", 
     data_test = trends_data$data_test)
-

+

plot_mvgam_ppc(trends_mod3, series = 1, type = "mean", 
     data_test = trends_data$data_test)
-

+

plot_mvgam_ppc(trends_mod3, series = 2, type = "density", 
     data_test = trends_data$data_test)
-

+

plot_mvgam_ppc(trends_mod3, series = 2, type = "mean", 
     data_test = trends_data$data_test)
-

+

plot_mvgam_ppc(trends_mod3, series = 3, type = "density", 
     data_test = trends_data$data_test)
-

+

plot_mvgam_ppc(trends_mod3, series = 3, type = "mean", 
     data_test = trends_data$data_test)
-

+

plot_mvgam_ppc(trends_mod3, series = 4, type = "density", 
     data_test = trends_data$data_test)
-

+

plot_mvgam_ppc(trends_mod3, series = 4, type = "mean", 
     data_test = trends_data$data_test)
-

+

Other next steps could involve devising a more goal-specific set of posterior predictive checks (see this paper by Gelman et al and relevant works by Betancourt for examples) and compare out of sample Discrete Rank Probability Scores for this model and other versions for the latent trends (i.e. AR2, AR3, Random Walk)

diff --git a/NEON_manuscript/Case studies/rsconnect/.DS_Store b/NEON_manuscript/Case studies/rsconnect/.DS_Store new file mode 100644 index 00000000..6ca3a5fd Binary files /dev/null and b/NEON_manuscript/Case studies/rsconnect/.DS_Store differ diff --git a/NEON_manuscript/Case studies/rsconnect/documents/.DS_Store b/NEON_manuscript/Case studies/rsconnect/documents/.DS_Store new file mode 100644 index 00000000..15248328 Binary files /dev/null and b/NEON_manuscript/Case studies/rsconnect/documents/.DS_Store differ diff --git a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/.DS_Store b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/.DS_Store new file mode 100644 index 00000000..b4aee0ce Binary files /dev/null and b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/.DS_Store differ diff --git a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/rpubs.com/.DS_Store b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/rpubs.com/.DS_Store new file mode 100644 index 00000000..9f1da32b Binary files /dev/null and b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/rpubs.com/.DS_Store differ diff --git a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/rpubs.com/rpubs/Publish Document.dcf b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/rpubs.com/rpubs/Publish Document.dcf index a97daeaf..a3927fcc 100644 --- a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/rpubs.com/rpubs/Publish Document.dcf +++ b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study1.Rmd/rpubs.com/rpubs/Publish Document.dcf @@ -1,10 +1,10 @@ -name: Publish Document -title: -username: -account: rpubs -server: rpubs.com -hostUrl: rpubs.com -appId: https://api.rpubs.com/api/v1/document/856585/540d5299dc584aff9095b986b978429b -bundleId: https://api.rpubs.com/api/v1/document/856585/540d5299dc584aff9095b986b978429b -url: http://rpubs.com/NickClark47/856585 -when: 1643680227.13467 +name: Publish Document +title: +username: +account: rpubs +server: rpubs.com +hostUrl: rpubs.com +appId: https://api.rpubs.com/api/v1/document/856585/540d5299dc584aff9095b986b978429b +bundleId: https://api.rpubs.com/api/v1/document/856585/540d5299dc584aff9095b986b978429b +url: http://rpubs.com/NickClark47/856585 +when: 1643680227.13467 diff --git a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/.DS_Store b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/.DS_Store new file mode 100644 index 00000000..b4aee0ce Binary files /dev/null and b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/.DS_Store differ diff --git a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/.DS_Store b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/.DS_Store new file mode 100644 index 00000000..9f1da32b Binary files /dev/null and b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/.DS_Store differ diff --git a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf index fa24e0ee..735a9b35 100644 --- a/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf +++ b/NEON_manuscript/Case studies/rsconnect/documents/mvgam_case_study2.Rmd/rpubs.com/rpubs/Publish Document.dcf @@ -7,4 +7,4 @@ hostUrl: rpubs.com appId: https://api.rpubs.com/api/v1/document/856733/5be2f57c523545a694d50c31f9a87205 bundleId: https://api.rpubs.com/api/v1/document/856733/5be2f57c523545a694d50c31f9a87205 url: http://rpubs.com/NickClark47/856733 -when: 1643695525.11005 +when: 1643793559.32827 diff --git a/NEON_manuscript/Figures/.DS_Store b/NEON_manuscript/Figures/.DS_Store new file mode 100644 index 00000000..210c3da0 Binary files /dev/null and b/NEON_manuscript/Figures/.DS_Store differ diff --git a/R/eval_mvgam.R b/R/eval_mvgam.R index b6d70bc6..c20d31f7 100644 --- a/R/eval_mvgam.R +++ b/R/eval_mvgam.R @@ -148,8 +148,7 @@ eval_mvgam = function(object, ar3s <- MCMCvis::MCMCchains(object$jags_output, 'ar3') # Negative binomial size estimate - sizes <- as.matrix(rep(hpd(MCMCvis::MCMCchains(object$jags_output, 'r'))[2], - dim(betas)[1])) + sizes <- MCMCvis::MCMCchains(object$jags_output, 'r') # Generate sample sequence for n_samples if(n_samples < dim(phis)[1]){ @@ -249,7 +248,7 @@ eval_mvgam = function(object, trunc_preds <- rnbinom(fc_horizon, mu = exp(as.vector((Xp[which(as.numeric(data_assim$series) == series),] %*% betas)) + (trend_preds)), - size = size) + size = size[series]) trunc_preds }) @@ -289,7 +288,7 @@ eval_mvgam = function(object, fc <- rnbinom(fc_horizon, mu = exp(as.vector((Xp[which(as.numeric(data_assim$series) == series),] %*% betas)) + (trend_preds)), - size = size) + size = size[series]) fc }) } @@ -344,7 +343,7 @@ eval_mvgam = function(object, trunc_preds <- rnbinom(fc_horizon, mu = exp(as.vector((Xp[which(as.numeric(data_assim$series) == series),] %*% betas)) + (trend_preds)), - size = size) + size = size[series]) trunc_preds }) @@ -384,7 +383,7 @@ eval_mvgam = function(object, fc <- rnbinom(fc_horizon, mu = exp(as.vector((Xp[which(as.numeric(data_assim$series) == series),] %*% betas)) + (trend_preds)), - size = size) + size = size[series]) fc }) } diff --git a/R/mvjagam.R b/R/mvjagam.R index 3ba34ef7..6897d7ee 100644 --- a/R/mvjagam.R +++ b/R/mvjagam.R @@ -62,7 +62,9 @@ #'@param ar_prior \code{character} specifying (in JAGS syntax) the prior distribution for the AR terms #'in the latent trends #'@param r_prior \code{character} specifying (in JAGS syntax) the prior distribution for the Negative Binomial -#'overdispersion parameter. Ignored if \code{family = 'poisson'} +#'overdispersion parameters. Note that this prior acts on the INVERSE of \code{r}, which is convenient +#'for inducing a complexity-penalising prior model whereby the observation process reduces to a Poisson +#'as the sampled parameter approaches \code{0}. Ignored if \code{family = 'poisson'} #'@param tau_prior \code{character} specifying (in JAGS syntax) the prior distributions for the independent gaussian #'precisions used for the latent trends (ignored if \code{use_lv == TRUE}) #'@param upper_bounds Optional \code{vector} of \code{integer} values specifying upper limits for each series. If supplied, @@ -80,9 +82,7 @@ #'factors' variances to zero. This is done to help protect against selecting too many latent factors than are needed to #'capture dependencies in the data, so it can often be advantageous to set n_lv to a slightly larger number. However #'larger numbers of factors do come with additional computational costs so these should be balanced as well. -#'For the time being, all series are assumed to have the same overdispersion -#'parameter when using a negative binomial distribution, though this will be relaxed in future versions. For each -#'series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics using +#' For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics using #'medians of posterior predictions. If the fitted model is appropriate then Dunn-Smyth residuals will be #'standard normal in distribution and no autocorrelation will be evident #' @@ -327,17 +327,25 @@ for (s in 1:n_series){ ## Negative binomial likelihood functions for (i in 1:n) { for (s in 1:n_series) { - y[i, s] ~ dnegbin(rate[i, s], r)T(, upper_bound[s]); - rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps, - (r / (r + mu[i, s]))) + y[i, s] ~ dnegbin(rate[i, s], r[s])T(, upper_bound[s]); + rate[i, s] <- ifelse((r[s] / (r[s] + mu[i, s])) < min_eps, min_eps, + (r[s] / (r[s] + mu[i, s]))) } } -r ~ dexp(0.001) + +## Complexity penalising prior for the overdispersion parameter; +## where the likelihood reduces to a 'base' model (Poisson) unless +## the data support overdispersion +for(s in 1:n_series){ + r[s] <- 1 / r_raw[s] + r_raw[s] ~ dexp(0.5) +} + ## Posterior predictions for (i in 1:n) { for (s in 1:n_series) { - ypred[i, s] ~ dnegbin(rate[i, s], r)T(, upper_bound[s]) + ypred[i, s] ~ dnegbin(rate[i, s], r[s])T(, upper_bound[s]) } } ") @@ -379,8 +387,8 @@ for (i in 1:n) { } else { if(missing(upper_bounds)){ - model_file[grep('y\\[i, s\\] ~', model_file)] <- ' y[i, s] ~ dnegbin(rate[i, s], r)' - model_file[grep('ypred\\[i, s\\] ~', model_file)] <- ' ypred[i, s] ~ dnegbin(rate[i, s], r)' + model_file[grep('y\\[i, s\\] ~', model_file)] <- ' y[i, s] ~ dnegbin(rate[i, s], r[s])' + model_file[grep('ypred\\[i, s\\] ~', model_file)] <- ' ypred[i, s] ~ dnegbin(rate[i, s], r[s])' } } @@ -478,13 +486,6 @@ for (i in 1:n) { ar3[s] ~ dnorm(0, 10) } - ## Latent factor loadings are penalized using a regularized horseshoe prior - ## to allow loadings for entire factors to be 'dropped', reducing overfitting. Still - ## need to impose identifiability constraints - - ## Global shrinkage penalty (T distribution on the standard deviation) - lv_tau ~ dscaled.gamma(0.5, 5) - ## Shrinkage penalties for each factor squeeze the factor to a flat line and squeeze ## the entire factor's set of coefficients toward zero if supported by ## the data. The prior for individual factor penalties allows each factor to possibly @@ -510,6 +511,7 @@ for (i in 1:n) { penalty[t] <- theta.prime[t] * l.var[t] } + ## Latent factor loadings: standard normal with identifiability constraints ## Upper triangle of loading matrix set to zero for(j in 1:(n_lv - 1)){ for(j2 in (j + 1):n_lv){ @@ -519,20 +521,20 @@ for (i in 1:n) { ## Positive constraints on loading diagonals for(j in 1:n_lv) { - lv_coefs[j, j] ~ dnorm(0, lv_tau * penalty[j])T(0, 1); + lv_coefs[j, j] ~ dnorm(0, 1)T(0, 1); } ## Lower diagonal free for(j in 2:n_lv){ for(j2 in 1:(j - 1)){ - lv_coefs[j, j2] ~ dnorm(0, lv_tau * penalty[j2])T(-1, 1); + lv_coefs[j, j2] ~ dnorm(0, 1)T(-1, 1); } } ## Other elements also free for(j in (n_lv + 1):n_series) { for(j2 in 1:n_lv){ - lv_coefs[j, j2] ~ dnorm(0, lv_tau * penalty[j2])T(-1, 1); + lv_coefs[j, j2] ~ dnorm(0, 1)T(-1, 1); } } @@ -546,12 +548,19 @@ for (i in 1:n) { ## Negative binomial likelihood functions for (i in 1:n) { for (s in 1:n_series) { - y[i, s] ~ dnegbin(rate[i, s], r)T(, upper_bound[s]); - rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps, - (r / (r + mu[i, s]))) + y[i, s] ~ dnegbin(rate[i, s], r[s])T(, upper_bound[s]); + rate[i, s] <- ifelse((r[s] / (r[s] + mu[i, s])) < min_eps, min_eps, + (r[s] / (r[s] + mu[i, s]))) + } } + + ## Complexity penalising prior for the overdispersion parameter; + ## where the likelihood reduces to a 'base' model (Poisson) unless + ## the data support overdispersion + for(s in 1:n_series){ + r[s] <- 1 / r_raw[s] + r_raw[s] ~ dexp(0.5) } - r ~ dexp(0.001) ## Posterior predictions for (i in 1:n) { @@ -594,8 +603,8 @@ for (i in 1:n) { } else { if(missing(upper_bounds)){ - model_file[grep('y\\[i, s\\] ~', model_file)] <- ' y[i, s] ~ dnegbin(rate[i, s], r)' - model_file[grep('ypred\\[i, s\\] ~', model_file)] <- ' ypred[i, s] ~ dnegbin(rate[i, s], r)' + model_file[grep('y\\[i, s\\] ~', model_file)] <- ' y[i, s] ~ dnegbin(rate[i, s], r[s])' + model_file[grep('ypred\\[i, s\\] ~', model_file)] <- ' ypred[i, s] ~ dnegbin(rate[i, s], r[s])' } } diff --git a/R/pfilter_mvgam_fc.R b/R/pfilter_mvgam_fc.R index 2a5796a7..75802430 100644 --- a/R/pfilter_mvgam_fc.R +++ b/R/pfilter_mvgam_fc.R @@ -144,7 +144,7 @@ pfilter_mvgam_fc = function(file_path = 'pfilter', mu = exp(as.vector((Xp[which(as.numeric(series_test$series) == series),] %*% particles[[x]]$betas)) + (trend_preds)), - size = particles[[x]]$size) + size = particles[[x]]$size[series]) trunc_preds }) @@ -161,7 +161,7 @@ pfilter_mvgam_fc = function(file_path = 'pfilter', fc <- rnbinom(fc_horizon, mu = exp(as.vector((Xp[which(as.numeric(series_test$series) == series),] %*% particles[[x]]$betas)) + (trend_preds)), - size = particles[[x]]$size) + size = particles[[x]]$size[series]) fc }) } diff --git a/R/pfilter_mvgam_init.R b/R/pfilter_mvgam_init.R index 2721f749..db4e453f 100644 --- a/R/pfilter_mvgam_init.R +++ b/R/pfilter_mvgam_init.R @@ -209,7 +209,7 @@ particles <- pbapply::pblapply(sample_seq, function(x){ if(is.na(truth[series])){ weight <- 1 } else { - weight <- dnbinom(truth[series], size = size, + weight <- dnbinom(truth[series], size = size[series], mu = exp(((Xp[which(as.numeric(series_test$series) == series),] %*% betas)) + (trends[series]))) } @@ -259,7 +259,7 @@ particles <- pbapply::pblapply(sample_seq, function(x){ if(is.na(truth[series])){ weight <- 1 } else { - weight <- dnbinom(truth[series], size = size, + weight <- dnbinom(truth[series], size = size[series], mu = exp(((Xp[which(as.numeric(series_test$series) == series),] %*% betas)) + (trends[series]))) } diff --git a/R/pfilter_mvgam_smooth.R b/R/pfilter_mvgam_smooth.R index ba391d10..07ab2b80 100644 --- a/R/pfilter_mvgam_smooth.R +++ b/R/pfilter_mvgam_smooth.R @@ -113,7 +113,7 @@ pfilter_mvgam_smooth = function(particles, series_weight <- 1 } else { series_weight <- dnbinom(next_assim$y[series], - size = particles[[x]]$size, + size = particles[[x]]$size[series], mu = exp(( (Xp[which(as.numeric(next_assim$series) == series),] %*% particles[[x]]$betas)) + diff --git a/R/plot_mvgam_factors.R b/R/plot_mvgam_factors.R index 1a1d227f..af6f87fe 100644 --- a/R/plot_mvgam_factors.R +++ b/R/plot_mvgam_factors.R @@ -10,9 +10,8 @@ #'contributed to the estimated trends. This is due to the regularisation penalty that acts independently on each #'factor's Gaussian precision, which will squeeze un-needed factors to a white noise process (effectively dropping #'that factor from the model). In this function, each factor is tested against a null hypothesis of white noise by -#'calculating the sum of the factor's 1st derivatives. A factor that has a larger contribution will have a larger -#'sum, both because that factor's absolute magnitudes will be larger (due to the weaker penalty on the factor's -#'precision) and because the factor will move around more. If +#'calculating the sum of the factor's 2nd derivatives. A factor that has a larger contribution will have a larger +#'sum due to the weaker penalty on the factor's precision. If #'\code{plot == TRUE}, the factors are also plotted. #'@return A \code{dataframe} of factor contributions and, optionally, a series of base \code{R} plots #'@export @@ -57,13 +56,12 @@ plot_mvgam_factors = function(object, plot = TRUE){ # Keep only the in-sample observations for testing against the null of white noise preds <- preds[,1:(NROW(object$obs_data) / NCOL(object$ytimes))] + cred <- sapply(1:NCOL(preds), + function(n) quantile(preds[,n], + probs = probs)) # If plot = TRUE, plot the LVs if(plot){ preds_last <- preds[1,] - cred <- sapply(1:NCOL(preds), - function(n) quantile(preds[,n], - probs = probs)) - ylim <- c(min(cred) - 1, max(cred) + 1) ylab <- paste0('Factor ', x) pred_vals <- seq(1:length(preds_last)) @@ -84,14 +82,15 @@ plot_mvgam_factors = function(object, plot = TRUE){ } - uppers <- cred[9,] - cred_deriv = vector(length = length(uppers) + 1) - cred_deriv[1] <- 0 - for(i in 2:length(uppers)){ - cred_deriv[i] <- abs(uppers[i] - uppers[i - 1]) - } - - data.frame('Contribution' = sum(cred_deriv)) + # Calculate second derivatives of empirical medians and upper / lower intervals; + # factors with small second derivatives are moving in roughly a straight line and not + # likely contributing much (or at all) to the latent trend estimates + meds <- cred[5,] + uppers <- cred[8,] + lowers <- cred[2,] + data.frame('Contribution' = sum(abs(diff(diff(meds)) + + diff(diff(uppers)) + + diff(diff(lowers))))) })) rownames(lv_estimates) <- paste0('Factor', 1:object$n_lv) diff --git a/R/plot_mvgam_uncertainty.R b/R/plot_mvgam_uncertainty.R index 9c3c1b24..9e5345e6 100644 --- a/R/plot_mvgam_uncertainty.R +++ b/R/plot_mvgam_uncertainty.R @@ -59,7 +59,7 @@ plot_mvgam_uncertainty = function(object, series, data_test, legend_position = ' } n_samples <- NROW(trend) - size <- MCMCvis::MCMCsummary(object$jags_output, 'r')$mean + size <- MCMCvis::MCMCsummary(object$jags_output, 'r')[,series]$mean # Full uncertainty interval for the mean if(class(data_test) == 'list'){ diff --git a/R/predict_mvgam.R b/R/predict_mvgam.R index 7a8f7bb9..17675e45 100644 --- a/R/predict_mvgam.R +++ b/R/predict_mvgam.R @@ -56,7 +56,7 @@ predict_mvgam = function(object, series = 1, newdata, type = 'link'){ as.vector(((Xp[which(as.numeric(newdata$series) == series),] %*% betas[x,])) + ( lv_preds %*% lv_coefs[[series]][x,])) } else { - rnbinom(n = NROW(newdata), size = sizes[x,], + rnbinom(n = NROW(newdata), size = sizes[x, series], mu = exp(((Xp[which(as.numeric(newdata$series) == series),] %*% betas[x,])) + ( lv_preds %*% lv_coefs[[series]][x,]))) } @@ -66,7 +66,7 @@ predict_mvgam = function(object, series = 1, newdata, type = 'link'){ as.vector(((Xp[which(as.numeric(newdata$series) == series),] %*% betas[x,])) + (rnorm(NROW(newdata), 0, sqrt(1 / taus[x,series])))) } else { - rnbinom(n = NROW(newdata), size = sizes[x,], + rnbinom(n = NROW(newdata), size = sizes[x, series], mu = exp(((Xp[which(as.numeric(newdata$series) == series),] %*% betas[x,])) + (rnorm(NROW(newdata), 0, sqrt(1 / taus[x,series]))))) } diff --git a/R/summary_mvgam.R b/R/summary_mvgam.R index 660a8eba..90b07c79 100644 --- a/R/summary_mvgam.R +++ b/R/summary_mvgam.R @@ -293,7 +293,7 @@ if(class(object$obs_data) == 'list'){ message() if(object$family == 'Negative Binomial'){ - message("Dispersion parameter estimate:") + message("Dispersion parameter estimates:") print(MCMCvis::MCMCsummary(object$jags_output, 'r')[,c(3:7)]) message() } diff --git a/docs/README.md b/docs/README.md index 26c1668d..bfe5f7d1 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,294 +1,294 @@ - - -# *mvgam* - -The goal of `mvgam` is to use a Bayesian framework to estimate parameters of Generalised Additive Models for discrete time series with dynamic trend components. - -## Installation - -Install the development version from `GitHub` using: `devtools::install_github("nicholasjclark/mvgam")` - -## Brief introduction to the package - -We can explore the model’s primary functions using a test dataset that is available with all `R` installations. We introduce dynamic Generalised Additive Models and some of the key utility functions provided in `mvgam`. First, load the `lynx` data and plot the series as well as its estimated autocorrelation function - -``` r -library(mvgam) -#> Loading required package: mgcv -#> Loading required package: nlme -#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'. -#> Loading required package: rjags -#> Loading required package: coda -#> Linked to JAGS 4.3.0 -#> Loaded modules: basemod,bugs -#> Loading required package: parallel -#> Loading required package: runjags -#> Warning: package 'runjags' was built under R version 4.0.5 -data(lynx) -lynx_full = data.frame(year = 1821:1934, - population = as.numeric(lynx)) -plot(lynx_full$population, type = 'l', ylab = 'Lynx trappings', - xlab = 'Time') -``` - - - -``` r -acf(lynx_full$population, main = '') -``` - - - -Along with serial autocorrelation, there is a clear ~19-year cyclic pattern to the data. Create a `season` term that can be used to model this effect and give a better representation of the data generating process than we would likely get with a linear model - -``` r -plot(stl(ts(lynx_full$population, frequency = 19), s.window = 'periodic')) -``` - - - -``` r -lynx_full$season <- (lynx_full$year %%19) + 1 -``` - -For `mvgam` models, the response needs to be labelled `y` and we also need an indicator of the series name as a `factor` variable - -``` r -lynx_full$y <- lynx_full$population -lynx_full$series <- factor('series1') -``` - -Split the data into training (first 50 years) and testing (next 10 years of data) to evaluate multi-step ahead forecasts - -``` r -lynx_train = lynx_full[1:50, ] -lynx_test = lynx_full[51:60, ] -``` - -Now fit an `mvgam` model; it fits a GAM in which a cyclic smooth function for `season` is estimated jointly with a full time series model for the errors (in this case an `AR3` process), rather than relying on smoothing splines that do not incorporate a concept of the future. We assume the outcome follows a Poisson distribution and estimate the model in `JAGS` using MCMC sampling (Note that `JAGS` 4.3.0 is required; installation links are found [here](https://sourceforge.net/projects/mcmc-jags/files/)) - -``` r -lynx_mvgam <- mvjagam(data_train = lynx_train, - data_test = lynx_test, - formula = y ~ s(season, bs = 'cc', k = 19), - knots = list(season = c(0.5, 19.5)), - family = 'poisson', - trend_model = 'AR3', - drift = F, - burnin = 10000, - chains = 4) -#> Compiling rjags model... -#> Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host -#> 'localhost' -#> Simulation complete -#> Note: Summary statistics were not produced as there are >50 monitored -#> variables -#> [To override this behaviour see ?add.summary and ?runjags.options] -#> FALSEFinished running the simulation -#> NOTE: Stopping adaptation -``` - -Perform a series of posterior predictive checks to see if the model is able to simulate data for the training period that looks realistic and unbiased. First, examine simulated kernel densities for posterior predictions (`yhat`) and compare to the density of the observations (`y`) - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'density') -``` - - - -Now plot the distribution of predicted means compared to the observed mean - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'mean') -``` - - - -Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions (`yhat`) and compare to the CDF of the observations (`y`) - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'cdf') -``` - - - -Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'pit') -``` - - - -All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Have a look at this model's summary to see what is being estimated (note that longer MCMC runs would probably be needed to increase effective sample sizes) - -``` r -summary_mvgam(lynx_mvgam) -#> GAM formula: -#> y ~ s(season, bs = "cc", k = 19) -#> -#> Family: -#> Poisson -#> -#> N series: -#> 1 -#> -#> N observations per series: -#> 50 -#> -#> GAM smooth term approximate significances: -#> edf Ref.df Chi.sq p-value -#> s(season) 16.19 17.00 159.7 <2e-16 *** -#> --- -#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 -#> -#> GAM coefficient (beta) estimates: -#> 2.5% 50% 97.5% Rhat n.eff -#> (Intercept) 6.2770901 6.68528234 6.89327788 1.43 28 -#> s(season).1 -1.1564530 -0.64458777 -0.11872920 1.27 132 -#> s(season).2 -0.2683529 0.25073954 0.94169775 1.14 70 -#> s(season).3 0.3308783 1.09336242 1.79402756 1.07 28 -#> s(season).4 0.8341058 1.67841585 2.18185213 1.34 47 -#> s(season).5 1.0001588 1.80584298 2.44659910 2.01 59 -#> s(season).6 0.1578135 1.00852951 1.69310142 1.55 50 -#> s(season).7 -0.8748761 -0.21508874 0.54066675 1.12 77 -#> s(season).8 -1.3830175 -0.69447020 0.05220082 1.11 97 -#> s(season).9 -1.5430305 -0.81352062 0.06923342 1.29 150 -#> s(season).10 -1.1355176 -0.43732912 0.37036136 1.19 110 -#> s(season).11 -0.4439438 0.30838553 0.97091844 1.26 78 -#> s(season).12 0.3253495 1.18666706 1.95251673 1.75 45 -#> s(season).13 0.3082855 1.36908045 2.13585080 2.87 55 -#> s(season).14 0.2477303 1.12496685 1.82336145 2.17 56 -#> s(season).15 -0.4690048 0.02562886 0.55971985 1.13 139 -#> s(season).16 -1.2397070 -0.68863456 -0.14501644 1.02 166 -#> s(season).17 -1.4897623 -1.00618670 -0.42730783 1.18 140 -#> -#> GAM smoothing parameter (rho) estimates: -#> 2.5% 50% 97.5% Rhat n.eff -#> s(season) 3.092496 3.911626 4.617757 1.01 1668 -#> -#> Latent trend drift (phi) and AR parameter estimates: -#> 2.5% 50% 97.5% Rhat n.eff -#> phi 0.0000000 0.00000000 0.0000000 NaN 0 -#> ar1 0.4241680 0.74162051 1.0587950 1.16 1333 -#> ar2 -0.4899124 -0.14435058 0.2108217 1.02 2999 -#> ar3 -0.3483214 0.04447393 0.3626631 1.32 646 -#> -``` - -Inspect the model's estimated smooth for the 19-year cyclic pattern. Note that the `mvgam` smooth plot is on a different scale compared to `mgcv` plots, but interpretation is similar - -``` r -plot_mvgam_smooth(lynx_mvgam, 1, 'season') -``` - - - -We can also view the mvgam's posterior retrodictions and predictions for the entire series (testing and training) - -``` r -plot_mvgam_fc(lynx_mvgam, data_test = lynx_test) -``` - - - -And the estimated latent trend component - -``` r -plot_mvgam_trend(lynx_mvgam, data_test = lynx_test) -``` - - - -We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better insight into how the model is performing and whether it is able to simulate realistic and unbiased future values - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'density', data_test = lynx_test) -``` - - - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'mean', data_test = lynx_test) -``` - - - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'cdf', data_test = lynx_test) -``` - - - -``` r -plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'pit', data_test = lynx_test) -``` - - - -A key aspect of ecological forecasting is to understand [how different components of a model contribute to forecast uncertainty](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/eap.1589). We can estimate relative contributions to forecast uncertainty for the GAM component and the latent trend component using `mvgam` - -``` r -plot_mvgam_uncertainty(lynx_mvgam, data_test = lynx_test) -``` - - - -Both components contribute to forecast uncertainty, suggesting we would still need some more work to learn about factors driving the dynamics of the system. But we will leave the model as-is for this example. Diagnostics of the model can also be performed using `mvgam`. Have a look at the model's residuals, which are posterior medians of Dunn-Smyth randomised quantile residuals so should follow approximate normality. We are primarily looking for a lack of autocorrelation, which would suggest our AR3 model is appropriate for the latent trend - -``` r -plot_mvgam_resids(lynx_mvgam) -``` - - - -Another useful utility of `mvgam` is the ability to use rolling window forecasts to evaluate competing models that may represent different hypotheses about the series dynamics. Here we will fit a poorly specified model to showcase how this evaluation works. In this model, we ignore the cyclic pattern of seasonality and force it to be fairly non-wiggly. We also use a random walk process for the trend - -``` r -lynx_mvgam_poor <- mvjagam(data_train = lynx_train, - data_test = lynx_test, - formula = y ~ s(season, bs = 'gp', k = 3), - family = 'poisson', - trend_model = 'RW', - drift = FALSE, - burnin = 10000, - chains = 4) -#> Compiling rjags model... -#> Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host -#> 'localhost' -#> Simulation complete -#> Note: Summary statistics were not produced as there are >50 monitored -#> variables -#> [To override this behaviour see ?add.summary and ?runjags.options] -#> FALSEFinished running the simulation -#> NOTE: Stopping adaptation -``` - -We choose a set of timepoints within the training data to forecast from, allowing us to simulate a situation where the model's parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts from the latent trends. Here we use year 10 as our last observation and forecast ahead for the next 10 years. - -``` r -mod1_eval <- eval_mvgam(lynx_mvgam, eval_timepoint = 10, fc_horizon = 10) -mod2_eval <- eval_mvgam(lynx_mvgam_poor, eval_timepoint = 10, fc_horizon = 10) -``` - -Summary statistics of the two models' out of sample Discrete Rank Probability Score (DRPS) indicate that the well-specified model performs markedly better (far lower DRPS) for this evaluation timepoint - -``` r -summary(mod1_eval$series1$drps) -#> Min. 1st Qu. Median Mean 3rd Qu. Max. -#> 5.537 20.306 87.331 83.300 119.164 212.100 -summary(mod2_eval$series1$drps) -#> Min. 1st Qu. Median Mean 3rd Qu. Max. -#> 56.86 76.82 298.87 301.55 433.57 720.72 -``` - -Nominal coverages for both models' 90% prediction intervals - -``` r -mean(mod1_eval$series1$in_interval) -#> [1] 1 -mean(mod2_eval$series1$in_interval) -#> [1] 1 -``` - -The `compare_mvgams` function automates this process by rolling along a set of timepoints for each model, ensuring a more in-depth evaluation of each competing model at the same set of timepoints. There are many more extended uses for `mvgam` models, including the ability to fit dynamic factor processes for analysing and forecasting sets of multivariate discrete time series + + +# *mvgam* + +The goal of `mvgam` is to use a Bayesian framework to estimate parameters of Generalised Additive Models for discrete time series with dynamic trend components. + +## Installation + +Install the development version from `GitHub` using: `devtools::install_github("nicholasjclark/mvgam")` + +## Brief introduction to the package + +We can explore the model’s primary functions using a test dataset that is available with all `R` installations. We introduce dynamic Generalised Additive Models and some of the key utility functions provided in `mvgam`. First, load the `lynx` data and plot the series as well as its estimated autocorrelation function + +``` r +library(mvgam) +#> Loading required package: mgcv +#> Loading required package: nlme +#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'. +#> Loading required package: rjags +#> Loading required package: coda +#> Linked to JAGS 4.3.0 +#> Loaded modules: basemod,bugs +#> Loading required package: parallel +#> Loading required package: runjags +#> Warning: package 'runjags' was built under R version 4.0.5 +data(lynx) +lynx_full = data.frame(year = 1821:1934, + population = as.numeric(lynx)) +plot(lynx_full$population, type = 'l', ylab = 'Lynx trappings', + xlab = 'Time') +``` + + + +``` r +acf(lynx_full$population, main = '') +``` + + + +Along with serial autocorrelation, there is a clear ~19-year cyclic pattern to the data. Create a `season` term that can be used to model this effect and give a better representation of the data generating process than we would likely get with a linear model + +``` r +plot(stl(ts(lynx_full$population, frequency = 19), s.window = 'periodic')) +``` + + + +``` r +lynx_full$season <- (lynx_full$year %%19) + 1 +``` + +For `mvgam` models, the response needs to be labelled `y` and we also need an indicator of the series name as a `factor` variable + +``` r +lynx_full$y <- lynx_full$population +lynx_full$series <- factor('series1') +``` + +Split the data into training (first 50 years) and testing (next 10 years of data) to evaluate multi-step ahead forecasts + +``` r +lynx_train = lynx_full[1:50, ] +lynx_test = lynx_full[51:60, ] +``` + +Now fit an `mvgam` model; it fits a GAM in which a cyclic smooth function for `season` is estimated jointly with a full time series model for the errors (in this case an `AR3` process), rather than relying on smoothing splines that do not incorporate a concept of the future. We assume the outcome follows a Poisson distribution and estimate the model in `JAGS` using MCMC sampling (Note that `JAGS` 4.3.0 is required; installation links are found [here](https://sourceforge.net/projects/mcmc-jags/files/)) + +``` r +lynx_mvgam <- mvjagam(data_train = lynx_train, + data_test = lynx_test, + formula = y ~ s(season, bs = 'cc', k = 19), + knots = list(season = c(0.5, 19.5)), + family = 'poisson', + trend_model = 'AR3', + drift = F, + burnin = 10000, + chains = 4) +#> Compiling rjags model... +#> Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host +#> 'localhost' +#> Simulation complete +#> Note: Summary statistics were not produced as there are >50 monitored +#> variables +#> [To override this behaviour see ?add.summary and ?runjags.options] +#> FALSEFinished running the simulation +#> NOTE: Stopping adaptation +``` + +Perform a series of posterior predictive checks to see if the model is able to simulate data for the training period that looks realistic and unbiased. First, examine simulated kernel densities for posterior predictions (`yhat`) and compare to the density of the observations (`y`) + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'density') +``` + + + +Now plot the distribution of predicted means compared to the observed mean + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'mean') +``` + + + +Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions (`yhat`) and compare to the CDF of the observations (`y`) + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'cdf') +``` + + + +Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'pit') +``` + + + +All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Have a look at this model's summary to see what is being estimated (note that longer MCMC runs would probably be needed to increase effective sample sizes) + +``` r +summary_mvgam(lynx_mvgam) +#> GAM formula: +#> y ~ s(season, bs = "cc", k = 19) +#> +#> Family: +#> Poisson +#> +#> N series: +#> 1 +#> +#> N observations per series: +#> 50 +#> +#> GAM smooth term approximate significances: +#> edf Ref.df Chi.sq p-value +#> s(season) 16.19 17.00 159.7 <2e-16 *** +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> GAM coefficient (beta) estimates: +#> 2.5% 50% 97.5% Rhat n.eff +#> (Intercept) 6.2770901 6.68528234 6.89327788 1.43 28 +#> s(season).1 -1.1564530 -0.64458777 -0.11872920 1.27 132 +#> s(season).2 -0.2683529 0.25073954 0.94169775 1.14 70 +#> s(season).3 0.3308783 1.09336242 1.79402756 1.07 28 +#> s(season).4 0.8341058 1.67841585 2.18185213 1.34 47 +#> s(season).5 1.0001588 1.80584298 2.44659910 2.01 59 +#> s(season).6 0.1578135 1.00852951 1.69310142 1.55 50 +#> s(season).7 -0.8748761 -0.21508874 0.54066675 1.12 77 +#> s(season).8 -1.3830175 -0.69447020 0.05220082 1.11 97 +#> s(season).9 -1.5430305 -0.81352062 0.06923342 1.29 150 +#> s(season).10 -1.1355176 -0.43732912 0.37036136 1.19 110 +#> s(season).11 -0.4439438 0.30838553 0.97091844 1.26 78 +#> s(season).12 0.3253495 1.18666706 1.95251673 1.75 45 +#> s(season).13 0.3082855 1.36908045 2.13585080 2.87 55 +#> s(season).14 0.2477303 1.12496685 1.82336145 2.17 56 +#> s(season).15 -0.4690048 0.02562886 0.55971985 1.13 139 +#> s(season).16 -1.2397070 -0.68863456 -0.14501644 1.02 166 +#> s(season).17 -1.4897623 -1.00618670 -0.42730783 1.18 140 +#> +#> GAM smoothing parameter (rho) estimates: +#> 2.5% 50% 97.5% Rhat n.eff +#> s(season) 3.092496 3.911626 4.617757 1.01 1668 +#> +#> Latent trend drift (phi) and AR parameter estimates: +#> 2.5% 50% 97.5% Rhat n.eff +#> phi 0.0000000 0.00000000 0.0000000 NaN 0 +#> ar1 0.4241680 0.74162051 1.0587950 1.16 1333 +#> ar2 -0.4899124 -0.14435058 0.2108217 1.02 2999 +#> ar3 -0.3483214 0.04447393 0.3626631 1.32 646 +#> +``` + +Inspect the model's estimated smooth for the 19-year cyclic pattern. Note that the `mvgam` smooth plot is on a different scale compared to `mgcv` plots, but interpretation is similar + +``` r +plot_mvgam_smooth(lynx_mvgam, 1, 'season') +``` + + + +We can also view the mvgam's posterior retrodictions and predictions for the entire series (testing and training) + +``` r +plot_mvgam_fc(lynx_mvgam, data_test = lynx_test) +``` + + + +And the estimated latent trend component + +``` r +plot_mvgam_trend(lynx_mvgam, data_test = lynx_test) +``` + + + +We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better insight into how the model is performing and whether it is able to simulate realistic and unbiased future values + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'density', data_test = lynx_test) +``` + + + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'mean', data_test = lynx_test) +``` + + + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'cdf', data_test = lynx_test) +``` + + + +``` r +plot_mvgam_ppc(lynx_mvgam, series = 1, type = 'pit', data_test = lynx_test) +``` + + + +A key aspect of ecological forecasting is to understand [how different components of a model contribute to forecast uncertainty](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/eap.1589). We can estimate relative contributions to forecast uncertainty for the GAM component and the latent trend component using `mvgam` + +``` r +plot_mvgam_uncertainty(lynx_mvgam, data_test = lynx_test) +``` + + + +Both components contribute to forecast uncertainty, suggesting we would still need some more work to learn about factors driving the dynamics of the system. But we will leave the model as-is for this example. Diagnostics of the model can also be performed using `mvgam`. Have a look at the model's residuals, which are posterior medians of Dunn-Smyth randomised quantile residuals so should follow approximate normality. We are primarily looking for a lack of autocorrelation, which would suggest our AR3 model is appropriate for the latent trend + +``` r +plot_mvgam_resids(lynx_mvgam) +``` + + + +Another useful utility of `mvgam` is the ability to use rolling window forecasts to evaluate competing models that may represent different hypotheses about the series dynamics. Here we will fit a poorly specified model to showcase how this evaluation works. In this model, we ignore the cyclic pattern of seasonality and force it to be fairly non-wiggly. We also use a random walk process for the trend + +``` r +lynx_mvgam_poor <- mvjagam(data_train = lynx_train, + data_test = lynx_test, + formula = y ~ s(season, bs = 'gp', k = 3), + family = 'poisson', + trend_model = 'RW', + drift = FALSE, + burnin = 10000, + chains = 4) +#> Compiling rjags model... +#> Starting 4 rjags simulations using a PSOCK cluster with 4 nodes on host +#> 'localhost' +#> Simulation complete +#> Note: Summary statistics were not produced as there are >50 monitored +#> variables +#> [To override this behaviour see ?add.summary and ?runjags.options] +#> FALSEFinished running the simulation +#> NOTE: Stopping adaptation +``` + +We choose a set of timepoints within the training data to forecast from, allowing us to simulate a situation where the model's parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts from the latent trends. Here we use year 10 as our last observation and forecast ahead for the next 10 years. + +``` r +mod1_eval <- eval_mvgam(lynx_mvgam, eval_timepoint = 10, fc_horizon = 10) +mod2_eval <- eval_mvgam(lynx_mvgam_poor, eval_timepoint = 10, fc_horizon = 10) +``` + +Summary statistics of the two models' out of sample Discrete Rank Probability Score (DRPS) indicate that the well-specified model performs markedly better (far lower DRPS) for this evaluation timepoint + +``` r +summary(mod1_eval$series1$drps) +#> Min. 1st Qu. Median Mean 3rd Qu. Max. +#> 5.537 20.306 87.331 83.300 119.164 212.100 +summary(mod2_eval$series1$drps) +#> Min. 1st Qu. Median Mean 3rd Qu. Max. +#> 56.86 76.82 298.87 301.55 433.57 720.72 +``` + +Nominal coverages for both models' 90% prediction intervals + +``` r +mean(mod1_eval$series1$in_interval) +#> [1] 1 +mean(mod2_eval$series1$in_interval) +#> [1] 1 +``` + +The `compare_mvgams` function automates this process by rolling along a set of timepoints for each model, ensuring a more in-depth evaluation of each competing model at the same set of timepoints. There are many more extended uses for `mvgam` models, including the ability to fit dynamic factor processes for analysing and forecasting sets of multivariate discrete time series diff --git a/man/mvjagam.Rd b/man/mvjagam.Rd index 57779696..4abf8e2f 100644 --- a/man/mvjagam.Rd +++ b/man/mvjagam.Rd @@ -103,7 +103,9 @@ in the latent trends} in the latent trends} \item{r_prior}{\code{character} specifying (in JAGS syntax) the prior distribution for the Negative Binomial -overdispersion parameter. Ignored if \code{family = 'poisson'}} +overdispersion parameters. Note that this prior acts on the INVERSE of \code{r}, which is convenient +for inducing a complexity-penalising prior model whereby the observation process reduces to a Poisson +as the sampled parameter approaches \code{0}. Ignored if \code{family = 'poisson'}} \item{tau_prior}{\code{character} specifying (in JAGS syntax) the prior distributions for the independent gaussian precisions used for the latent trends (ignored if \code{use_lv == TRUE})} @@ -137,9 +139,7 @@ regularized penalty priors to theoretically allow some factors to be dropped fro factors' variances to zero. This is done to help protect against selecting too many latent factors than are needed to capture dependencies in the data, so it can often be advantageous to set n_lv to a slightly larger number. However larger numbers of factors do come with additional computational costs so these should be balanced as well. -For the time being, all series are assumed to have the same overdispersion -parameter when using a negative binomial distribution, though this will be relaxed in future versions. For each -series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics using +For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics using medians of posterior predictions. If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident } diff --git a/man/plot_mvgam_factors.Rd b/man/plot_mvgam_factors.Rd index c8669a4d..7757870a 100644 --- a/man/plot_mvgam_factors.Rd +++ b/man/plot_mvgam_factors.Rd @@ -23,9 +23,8 @@ If the model in \code{object} was estimated using dynamic factors, it is possibl contributed to the estimated trends. This is due to the regularisation penalty that acts independently on each factor's Gaussian precision, which will squeeze un-needed factors to a white noise process (effectively dropping that factor from the model). In this function, each factor is tested against a null hypothesis of white noise by -calculating the sum of the factor's 1st derivatives. A factor that has a larger contribution will have a larger -sum, both because that factor's absolute magnitudes will be larger (due to the weaker penalty on the factor's -precision) and because the factor will move around more. If +calculating the sum of the factor's 2nd derivatives. A factor that has a larger contribution will have a larger +sum due to the weaker penalty on the factor's precision. If \code{plot == TRUE}, the factors are also plotted. } \author{