Name	Name	Last commit message	Last commit date
Latest commit nicholasjclark add lambda inits; update readme Jul 28, 2022 5c8316a · Jul 28, 2022 History 131 Commits
NEON_manuscript	NEON_manuscript	add cmdstanr support	Jul 28, 2022
R	R	add lambda inits; update readme	Jul 28, 2022
data	data	first addition of code	Oct 19, 2021
docs	docs	add lambda inits; update readme	Jul 28, 2022
man	man	add cmdstanr support	Jul 28, 2022
.Rbuildignore	.Rbuildignore	add vignette 1	Jan 11, 2022
.gitignore	.gitignore	add vignette 1	Jan 11, 2022
DESCRIPTION	DESCRIPTION	Remove strict jags dependencies	Jun 14, 2022
LICENSE	LICENSE	first addition of code	Oct 19, 2021
LICENSE.md	LICENSE.md	first addition of code	Oct 19, 2021
NAMESPACE	NAMESPACE	move all variable declarations to top of blocks in stan; add plot_ser…	Jul 20, 2022
mvgam.Rproj	mvgam.Rproj	update portal example	Feb 9, 2022

mvgam

The goal of mvgam is to use a Bayesian framework to estimate parameters of Generalized Additive Models for discrete time series with dynamic trend components. The motivation for the package and some of its primary objectives are described in detail by Clark & Wells 2022, with additional inspiration on the use of Bayesian probabilistic modelling to quantify uncertainty and advise principled decision making coming from Michael Betancourt, Michael Dietze and Emily Fox, among many others.

Installation

Install the development version from GitHub using: devtools::install_github("nicholasjclark/mvgam")

Brief introduction to the package

We can explore the model’s primary functions using a test dataset that is available with all R installations. We introduce dynamic Generalised Additive Models and some of the key utility functions provided in mvgam. First, load the lynx data and plot the series as well as its estimated autocorrelation function

library(mvgam)
#> Loading required package: mgcv
#> Loading required package: nlme
#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
#> Loading required package: parallel
data(lynx)
lynx_full = data.frame(year = 1821:1934, 
                       population = as.numeric(lynx))
plot(lynx_full$population, type = 'l', ylab = 'Lynx trappings',
     xlab = 'Time')

acf(lynx_full$population, main = '')

Along with serial autocorrelation, there is a clear ~19-year cyclic pattern to the data. Create a season term that can be used to model this effect and give a better representation of the data generating process than we would likely get with a linear model

plot(stl(ts(lynx_full$population, frequency = 19), s.window = 'periodic'))

lynx_full$season <- (lynx_full$year %%19) + 1

For mvgam models, the response needs to be labelled y and we also need an indicator of the series name as a factor variable (if the column series is missing, this will be added automatically by assuming that all observations are from a single time series). Finally, a time column is needed to index time

lynx_full$y <- lynx_full$population
lynx_full$time <- 1:NROW(lynx_full)
lynx_full$series <- factor('series1')

Split the data into training (first 50 years) and testing (next 10 years of data) to evaluate multi-step ahead forecasts

lynx_train = lynx_full[1:50, ]
lynx_test = lynx_full[51:60, ]

Inspect the series in a bit more detail using mvgam’s plotting utility

plot_mvgam_series(data_train = lynx_train)

Now fit an mvgam model; it fits a GAM in which a cyclic smooth function for season is estimated jointly with a full time series model for the errors (in this case an AR3 process), rather than relying on smoothing splines that do not incorporate a concept of the future. We assume the outcome follows a Poisson distribution and estimate the model in Stan using MCMC sampling (installation links for rstan and cmdstanr are found here and here).

lynx_mvgam <- mvgam(data_train = lynx_train,
               data_test = lynx_test,
               formula = y ~ s(season, bs = 'cc', k = 19),
               knots = list(season = c(0.5, 19.5)),
               family = 'poisson',
               trend_model = 'AR3',
               use_stan = TRUE,
               burnin = 1000,
               chains = 4)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 43.4 seconds.
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 44.5 seconds.
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 44.8 seconds.
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 45.5 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 44.5 seconds.
#> Total execution time: 45.8 seconds.

Inspect the resulting model file, which is written in the Stan probabilistic programming language

lynx_mvgam$model_file
#>   [1] ""                                                                                                                     
#>   [2] "//Stan model code generated by package mvgam"                                                                         
#>   [3] "data {"                                                                                                               
#>   [4] "int<lower=0> total_obs; // total number of observations"                                                              
#>   [5] "int<lower=0> n; // number of timepoints per series"                                                                   
#>   [6] "int<lower=0> n_sp; // number of smoothing parameters"                                                                 
#>   [7] "int<lower=0> n_series; // number of series"                                                                           
#>   [8] "int<lower=0> num_basis; // total number of basis coefficients"                                                        
#>   [9] "vector[num_basis] zero; // prior locations for basis coefficients"                                                    
#>  [10] "real p_taus[1]; // prior precisions for parametric coefficients"                                                      
#>  [11] "real p_coefs[1]; // prior locations for parametric coefficients"                                                      
#>  [12] "matrix[num_basis, total_obs] X; // transposed mgcv GAM design matrix"                                                 
#>  [13] "int<lower=0> ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)"
#>  [14] "matrix[17,17] S1; // mgcv smooth penalty matrix S1"                                                                   
#>  [15] "int<lower=0, upper=1> y_observed[n, n_series]; // indices of missing vs observed"                                     
#>  [16] "int<lower=-1> y[n, n_series]; // time-ordered observations, with -1 indicating missing"                               
#>  [17] "}"                                                                                                                    
#>  [18] ""                                                                                                                     
#>  [19] "parameters {"                                                                                                         
#>  [20] "// raw basis coefficients"                                                                                            
#>  [21] "row_vector[num_basis] b_raw;"                                                                                         
#>  [22] ""                                                                                                                     
#>  [23] "// latent trend AR1 terms"                                                                                            
#>  [24] "vector<lower=-1.5,upper=1.5>[n_series] ar1;"                                                                          
#>  [25] ""                                                                                                                     
#>  [26] "// latent trend AR2 terms"                                                                                            
#>  [27] "vector<lower=-1.5,upper=1.5>[n_series] ar2;"                                                                          
#>  [28] ""                                                                                                                     
#>  [29] "// latent trend AR3 terms"                                                                                            
#>  [30] "vector<lower=-1.5,upper=1.5>[n_series] ar3;"                                                                          
#>  [31] ""                                                                                                                     
#>  [32] "// latent trend variance parameters"                                                                                  
#>  [33] "vector<lower=0.05,upper=5>[n_series] sigma;"                                                                          
#>  [34] ""                                                                                                                     
#>  [35] "// latent trends"                                                                                                     
#>  [36] "matrix[n, n_series] trend;"                                                                                           
#>  [37] ""                                                                                                                     
#>  [38] "// smoothing parameters"                                                                                              
#>  [39] "vector<lower=0.0005>[n_sp] lambda;"                                                                                   
#>  [40] "}"                                                                                                                    
#>  [41] ""                                                                                                                     
#>  [42] "transformed parameters {"                                                                                             
#>  [43] "// GAM contribution to expectations (log scale)"                                                                      
#>  [44] "vector[total_obs] eta;"                                                                                               
#>  [45] ""                                                                                                                     
#>  [46] "// basis coefficients"                                                                                                
#>  [47] "row_vector[num_basis] b;"                                                                                             
#>  [48] ""                                                                                                                     
#>  [49] "for (i in 1:num_basis) {"                                                                                             
#>  [50] "b[i] = b_raw[i];"                                                                                                     
#>  [51] "}"                                                                                                                    
#>  [52] "eta = to_vector(b * X);"                                                                                              
#>  [53] "}"                                                                                                                    
#>  [54] ""                                                                                                                     
#>  [55] "model {"                                                                                                              
#>  [56] "// parametric effect priors (regularised for identifiability)"                                                        
#>  [57] "for (i in 1:1) {"                                                                                                     
#>  [58] "b_raw[i] ~ normal(p_coefs[i], 1 / p_taus[i]);"                                                                        
#>  [59] "}"                                                                                                                    
#>  [60] ""                                                                                                                     
#>  [61] "// prior for s(season)..."                                                                                            
#>  [62] "b_raw[2:18] ~ multi_normal_prec(zero[2:18],S1[1:17,1:17] * lambda[1]);"                                               
#>  [63] ""                                                                                                                     
#>  [64] "// priors for AR parameters"                                                                                          
#>  [65] "ar1 ~ normal(0, 0.5);"                                                                                                
#>  [66] "ar2 ~ normal(0, 0.5);"                                                                                                
#>  [67] "ar3 ~ normal(0, 0.5);"                                                                                                
#>  [68] ""                                                                                                                     
#>  [69] "// priors for smoothing parameters"                                                                                   
#>  [70] "lambda ~ exponential(0.05);"                                                                                          
#>  [71] ""                                                                                                                     
#>  [72] "// priors for latent trend variance parameters"                                                                       
#>  [73] "sigma ~ exponential(1);"                                                                                              
#>  [74] ""                                                                                                                     
#>  [75] "// trend estimates"                                                                                                   
#>  [76] "for (s in 1:n_series) {"                                                                                              
#>  [77] "trend[1, s] ~ normal(0, sigma[s]);"                                                                                   
#>  [78] "}"                                                                                                                    
#>  [79] ""                                                                                                                     
#>  [80] "for (s in 1:n_series) {"                                                                                              
#>  [81] "trend[2, s] ~ normal(trend[1, s] * ar1[s], sigma[s]);"                                                                
#>  [82] "}"                                                                                                                    
#>  [83] ""                                                                                                                     
#>  [84] "for (s in 1:n_series) {"                                                                                              
#>  [85] "trend[3, s] ~ normal(trend[2, s] * ar1[s] + trend[1, s] * ar2[s], sigma[s]);"                                         
#>  [86] "}"                                                                                                                    
#>  [87] ""                                                                                                                     
#>  [88] "for (i in 4:n) {"                                                                                                     
#>  [89] "for (s in 1:n_series) {"                                                                                              
#>  [90] "trend[i, s] ~ normal(ar1[s] * trend[i - 1, s] + ar2[s] * trend[i - 2, s] + ar3[s] * trend[i - 3, s], sigma[s]);"      
#>  [91] "}"                                                                                                                    
#>  [92] "}"                                                                                                                    
#>  [93] ""                                                                                                                     
#>  [94] "// likelihood functions"                                                                                              
#>  [95] "for (i in 1:n) {"                                                                                                     
#>  [96] "for (s in 1:n_series) {"                                                                                              
#>  [97] "if (y_observed[i, s])"                                                                                                
#>  [98] "y[i, s] ~ poisson_log(eta[ytimes[i, s]] + trend[i, s]);"                                                              
#>  [99] "}"                                                                                                                    
#> [100] "}"                                                                                                                    
#> [101] "}"                                                                                                                    
#> [102] ""                                                                                                                     
#> [103] "generated quantities {"                                                                                               
#> [104] "vector[n_sp] rho;"                                                                                                    
#> [105] "vector[n_series] tau;"                                                                                                
#> [106] "matrix[n, n_series] ypred;"                                                                                           
#> [107] "rho = log(lambda);"                                                                                                   
#> [108] "for (s in 1:n_series) {"                                                                                              
#> [109] "tau[s] = pow(sigma[s], -2.0);"                                                                                        
#> [110] "}"                                                                                                                    
#> [111] ""                                                                                                                     
#> [112] "// posterior predictions"                                                                                             
#> [113] "for(i in 1:n){"                                                                                                       
#> [114] "for(s in 1:n_series){"                                                                                                
#> [115] "ypred[i, s] = poisson_log_rng(eta[ytimes[i, s]] + trend[i, s]);"                                                      
#> [116] "}"                                                                                                                    
#> [117] "}"                                                                                                                    
#> [118] "}"                                                                                                                    
#> [119] ""

Perform a series of posterior predictive checks to see if the model is able to simulate data for the training period that looks realistic and unbiased. First, examine histograms for posterior predictions (yhat) and compare to the histogram of the observations (y)

ppc(lynx_mvgam, series = 1, type = 'hist')

Now plot the distribution of predicted means compared to the observed mean

ppc(lynx_mvgam, series = 1, type = 'mean')

Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions (yhat) and compare to the CDF of the observations (y)

ppc(lynx_mvgam, series = 1, type = 'cdf')

Rootograms are becoming popular graphical tools for checking a discrete model’s ability to capture dispersion properties of the response variable. Posterior predictive hanging rootograms can be displayed using the ppc() function in mvgam. In the plot below, we bin the unique observed values into 25 bins to prevent overplotting and help with interpretation. This plot compares the frequencies of observed vs predicted values for each bin, which can help to identify aspects of poor model fit. For example, if the gray bars (representing observed frequencies) tend to stretch below zero, this suggests the model’s simulations predict the values in that particular bin less frequently than they are observed in the data. A well-fitting model that can generate realistic simulated data will provide a rootogram in which the lower boundaries of the grey bars are generally near zero

ppc(lynx_mvgam, series = 1, type = 'rootogram', n_bins = 25)

Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform

ppc(lynx_mvgam, series = 1, type = 'pit')

All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Have a look at this model’s summary to see what is being estimated (note that longer MCMC runs would probably be needed to increase effective sample sizes)

summary(lynx_mvgam)
#> GAM formula:
#> y ~ s(season, bs = "cc", k = 19)
#> 
#> Family:
#> Poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR3
#> 
#> N series:
#> 1
#> 
#> N observations:
#> 50
#> 
#> Status:
#> Fitted using Stan
#> 
#> GAM smooth term estimated degrees of freedom:
#>            edf df
#> s(season) 9799 17
#> 
#> GAM coefficient (beta) estimates:
#>                     2.5%         50%       97.5% Rhat n.eff
#> (Intercept)   6.79015950  6.80218000  6.81431000    1  6808
#> s(season).1  -1.20317400 -0.68312750  0.01746606    1  1039
#> s(season).2  -0.34217647  0.25609150  0.90302028    1  1803
#> s(season).3   0.38097955  1.10338500  1.79252475    1  1762
#> s(season).4   0.80091535  1.72666500  2.44061925    1  1272
#> s(season).5   1.03276175  1.99367500  2.72996700    1  1117
#> s(season).6   0.29843772  1.17278500  1.89230325    1  1392
#> s(season).7  -0.84398567 -0.08065355  0.68787680    1  1811
#> s(season).8  -1.41318800 -0.64321500  0.20129283    1  1323
#> s(season).9  -1.56880575 -0.76334700  0.31739575    1  1094
#> s(season).10 -1.24017525 -0.42983750  0.64996255    1  1280
#> s(season).11 -0.60767142  0.24961800  1.17260975    1  1705
#> s(season).12  0.08591474  1.17380000  2.00801000    1  1208
#> s(season).13 -0.03388641  1.29483500  2.15689075    1   839
#> s(season).14 -0.33679847  0.99392500  1.82216725    1   813
#> s(season).15 -0.96419433 -0.11696200  0.54524333    1  1123
#> s(season).16 -1.42184425 -0.80551700 -0.22757412    1  1959
#> s(season).17 -1.57080350 -1.05019000 -0.41779690    1  1377
#> 
#> GAM smoothing parameter (rho) estimates:
#>               2.5%      50%    97.5% Rhat n.eff
#> s(season) 3.321265 4.190605 4.934154    1  2996
#> 
#> Latent trend parameter estimates:
#>                2.5%         50%     97.5% Rhat n.eff
#> ar1[1]    0.5430310  0.89219950 1.2556305    1  1493
#> ar2[1]   -0.6911293 -0.27678200 0.1547917    1  2967
#> ar3[1]   -0.3486109  0.05792415 0.4070008    1  1333
#> sigma[1]  0.3651319  0.45930700 0.5985543    1  1955
#> 
#> [1] "n_eff / iter looks reasonable for all parameters"
#> [1] "Rhat looks reasonable for all parameters"
#> [1] "0 of 4000 iterations ended with a divergence (0%)"
#> [1] "24 of 4000 iterations saturated the maximum tree depth of 11 (0.6%)"
#> [1] "  Run again with max_treedepth set to a larger value to avoid saturation"
#> [1] "E-FMI indicated no pathological behavior"

The plot_mvgam_...() functions offer more flexibility than the generic S3 plot.mvgam() functions. For example, we can inpsect traceplots when sampling from a posterior with MCMC methods. Here for the GAM component (smoothing parameters).

plot_mvgam_trace(lynx_mvgam, 'rho')

and for the latent trend component parameters

MCMCvis::MCMCtrace(lynx_mvgam$model_output, c('ar1', 'ar2', 'sigma'), pdf = F, n.eff = T, Rhat = T)

Inspect the model’s estimated smooth for the 19-year cyclic pattern, which is shown as a ribbon plot of posterior empirical quantiles. We can also overlay posterior quantiles of partial residuals (shown as ribbon rectangles in red), which represent the leftover variation that the model expects would remain if this smooth term was dropped but all other parameters remained unchanged. Note that these are on a different scale to those from mgcv::plot.gam as these are randomised quantile residuals that are essentially standard normal in distribution. But either way, a strong pattern in the partial residuals suggests there would be strong patterns left unexplained in the model if we were to drop this term, giving us further confidence that this function is important in the model

plot(lynx_mvgam, type = 'smooths', residuals = T)

First derivatives of smooth functions can also be plotted to inspect how the slope of the function changes across its length. To plot these we use the more flexible plot_mvgam_smooth() function

plot_mvgam_smooth(lynx_mvgam, 1, 'season', derivatives = T)

We can also view the mvgam’s posterior retrodictions and predictions for the entire series (testing and training)

plot(lynx_mvgam, type = 'forecast', data_test = lynx_test)
#> Out of sample DRPS:
#> [1] 682.8037
#>

And the estimated latent trend component, again using the more flexible plot_mvgam_...() option to show first derivatives of the estimated trend

plot_mvgam_trend(lynx_mvgam, data_test = lynx_test, derivatives = T)

We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better insight into how the model is performing and whether it is able to simulate realistic and unbiased future values

ppc(lynx_mvgam, series = 1, type = 'rootogram', data_test = lynx_test)

ppc(lynx_mvgam, series = 1, type = 'mean', data_test = lynx_test)

ppc(lynx_mvgam, series = 1, type = 'cdf', data_test = lynx_test)

ppc(lynx_mvgam, series = 1, type = 'pit', data_test = lynx_test)

A key aspect of ecological forecasting is to understand how different components of a model contribute to forecast uncertainty. We can estimate relative contributions to forecast uncertainty for the GAM component and the latent trend component using mvgam

plot_mvgam_uncertainty(lynx_mvgam, data_test = lynx_test, legend_position = 'none')
text(1, 0.2, cex = 1.5, label="GAM component", 
     pos = 4, col="white", family = 'serif')
text(1, 0.8, cex = 1.5, label="Trend component", 
     pos = 4, col="#7C0000", family = 'serif')

Both components contribute to forecast uncertainty, suggesting we would still need some more work to learn about factors driving the dynamics of the system. But we will leave the model as-is for this example. Diagnostics of the model can also be performed using mvgam. Have a look at the model’s residuals, which are posterior medians of Dunn-Smyth randomised quantile residuals so should follow approximate normality. We are primarily looking for a lack of autocorrelation, which would suggest our AR2 model is appropriate for the latent trend

plot(lynx_mvgam, type = 'residuals')

Another useful utility of mvgam is the ability to use rolling window forecasts to evaluate competing models that may represent different hypotheses about the series dynamics. Here we will fit a poorly specified model to showcase how this evaluation works. In this model, we ignore the cyclic pattern of seasonality and force it to be fairly non-wiggly. We also use a random walk process for the trend

lynx_mvgam_poor <- mvgam(data_train = lynx_train,
               data_test = lynx_test,
               formula = y ~ s(season, k = 3),
               family = 'poisson',
               trend_model = 'RW',
               drift = FALSE,
               use_stan = TRUE,
               burnin = 1000,
               chains = 4)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 6.7 seconds.
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 7.1 seconds.
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 7.7 seconds.
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 8.8 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 7.6 seconds.
#> Total execution time: 8.9 seconds.

We choose a set of timepoints within the training data to forecast from, allowing us to simulate a situation where the model’s parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts from the latent trends. Here we use year 10 as our last observation and forecast ahead for the next 10 years.

mod1_eval <- eval_mvgam(lynx_mvgam, eval_timepoint = 10, fc_horizon = 10)
mod2_eval <- eval_mvgam(lynx_mvgam_poor, eval_timepoint = 10, fc_horizon = 10)

Summary statistics of the two models’ out of sample Discrete Rank Probability Score (DRPS) indicate that the well-specified model performs markedly better (far lower DRPS) for this evaluation timepoint

summary(mod1_eval$series1$drps)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   2.815  14.735 111.162 103.530 151.525 244.586
summary(mod2_eval$series1$drps)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   34.93   41.65  311.68  289.32  456.17  675.74

Nominal coverages for both models’ 90% prediction intervals

mean(mod1_eval$series1$in_interval)
#> [1] 0.9
mean(mod2_eval$series1$in_interval)
#> [1] 0.7

The compare_mvgams function automates this process by rolling along a set of timepoints for each model, ensuring a more in-depth evaluation of each competing model at the same set of timepoints. There are many more extended uses for mvgam models, including the ability to fit dynamic factor processes for analysing and forecasting sets of multivariate discrete time series

License

This project is licensed under an MIT open source license

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

mvgam

Installation

Brief introduction to the package

License

About

Licenses found

Releases 13

Sponsor this project

Packages

Contributors 4

Languages

License

nicholasjclark/mvgam

Folders and files

Latest commit

History

Repository files navigation

mvgam

Installation

Brief introduction to the package

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 13

Sponsor this project

Packages 0

Contributors 4

Languages

Packages