From 4683b4050185920278bfc462c51a369521253aa8 Mon Sep 17 00:00:00 2001
From: Nicholas Clark Nicholas J
Clark
- 2024-01-12
+ 2024-01-15
Source: vignettes/mvgam_overview.Rmd
mvgam_overview.Rmd
Dynamic GAMs\(\tilde{\boldsymbol{y}}_{i,t}\) is the
+independently of the observation process. An introduction to the package
+and some worked examples are also shown in this seminar: Ecological Forecasting with Dynamic Generalized Additive
+Models. Briefly, assume \(\tilde{\boldsymbol{y}}_{i,t}\) is the
conditional expectation of response variable \(\boldsymbol{i}\) at time \(\boldsymbol{t}\). Assuming \(\boldsymbol{y_i}\) is drawn from an
exponential distribution with an invertible link function, the linear
predictor for a multivariate Dynamic GAM can be written as:
## y season year series time
-## 1 2 1 1 series_1 1
-## 2 0 1 1 series_2 1
-## 3 1 1 1 series_3 1
-## 4 0 1 1 series_4 1
-## 5 5 2 1 series_1 2
-## 6 4 2 1 series_2 2
-## 7 7 2 1 series_3 2
-## 8 0 2 1 series_4 2
-## 9 0 3 1 series_1 3
-## 10 1 3 1 series_2 3
-## 11 4 3 1 series_3 3
+## 1 1 1 1 series_1 1
+## 2 7 1 1 series_2 1
+## 3 8 1 1 series_3 1
+## 4 1 1 1 series_4 1
+## 5 2 2 1 series_1 2
+## 6 5 2 1 series_2 2
+## 7 1 2 1 series_3 2
+## 8 2 2 1 series_4 2
+## 9 1 3 1 series_1 3
+## 10 0 3 1 series_2 3
+## 11 1 3 1 series_3 3
## 12 0 3 1 series_4 3
Notice how we have four different time series in these simulated data, but we do not spread the outcome values into different columns. @@ -647,8 +649,8 @@
## Rows: 2,000
## Columns: 17
-## $ `s(year_fac).1` <dbl> 2.06435, 2.18026, 2.10259, 2.15858, 2.08945, 2.03449,…
-## $ `s(year_fac).2` <dbl> 2.66297, 2.76049, 2.58599, 2.84501, 2.65126, 2.77843,…
-## $ `s(year_fac).3` <dbl> 2.99574, 3.07175, 3.07301, 3.05926, 3.09694, 2.96840,…
-## $ `s(year_fac).4` <dbl> 3.24798, 3.28896, 3.30628, 3.25715, 3.22578, 3.27270,…
-## $ `s(year_fac).5` <dbl> 2.13139, 2.07544, 1.97331, 2.08521, 2.29660, 2.08273,…
-## $ `s(year_fac).6` <dbl> 1.42739, 2.02722, 1.71548, 1.92531, 1.73130, 1.49182,…
-## $ `s(year_fac).7` <dbl> 1.93186, 2.21940, 2.11423, 2.18194, 2.03794, 2.12441,…
-## $ `s(year_fac).8` <dbl> 2.94445, 3.01923, 3.01388, 2.95011, 3.01846, 2.93042,…
-## $ `s(year_fac).9` <dbl> 3.19531, 3.27323, 3.38481, 3.30585, 3.13491, 3.18199,…
-## $ `s(year_fac).10` <dbl> 2.74548, 2.88965, 2.78905, 2.90565, 2.72434, 2.75425,…
-## $ `s(year_fac).11` <dbl> 3.01341, 3.09803, 3.14241, 3.10061, 3.05058, 3.02510,…
-## $ `s(year_fac).12` <dbl> 3.13730, 3.30920, 3.24322, 3.20736, 3.18905, 3.23454,…
-## $ `s(year_fac).13` <dbl> 2.18607, 2.26127, 2.15356, 2.14097, 2.43450, 2.22882,…
-## $ `s(year_fac).14` <dbl> 2.38431, 2.50610, 2.49737, 2.66538, 2.47568, 2.75505,…
-## $ `s(year_fac).15` <dbl> 2.22367, 2.34576, 2.24644, 2.22469, 2.27891, 2.17492,…
-## $ `s(year_fac).16` <dbl> 2.27256, 2.11722, 1.97474, 2.07540, 2.08201, 1.86904,…
-## $ `s(year_fac).17` <dbl> 1.189810, 1.151540, 0.804486, 1.526610, 0.851250, 1.1…
+## $ `s(year_fac).1` <dbl> 2.30080, 2.19798, 2.04816, 2.01919, 2.19020, 1.91288,…
+## $ `s(year_fac).2` <dbl> 2.72378, 2.69498, 2.64144, 2.73021, 2.80479, 2.63429,…
+## $ `s(year_fac).3` <dbl> 3.04055, 2.94148, 3.27943, 3.05446, 3.13977, 3.14182,…
+## $ `s(year_fac).4` <dbl> 3.31494, 3.28852, 3.19607, 3.15786, 3.12430, 3.15365,…
+## $ `s(year_fac).5` <dbl> 2.24893, 2.16990, 1.99415, 2.11878, 2.20520, 2.09616,…
+## $ `s(year_fac).6` <dbl> 1.69947, 1.60053, 1.92989, 1.49806, 1.58634, 1.98859,…
+## $ `s(year_fac).7` <dbl> 2.05134, 2.03602, 2.13280, 2.11045, 2.26266, 1.83901,…
+## $ `s(year_fac).8` <dbl> 2.99787, 2.97008, 2.99874, 2.91971, 2.91366, 3.02957,…
+## $ `s(year_fac).9` <dbl> 3.33622, 3.20377, 3.32396, 3.24947, 3.20354, 3.23998,…
+## $ `s(year_fac).10` <dbl> 2.82800, 2.76225, 2.76828, 2.72482, 2.75613, 2.79170,…
+## $ `s(year_fac).11` <dbl> 3.10896, 3.01869, 3.16489, 2.99758, 3.12221, 3.17723,…
+## $ `s(year_fac).12` <dbl> 3.21659, 3.04013, 3.37160, 3.30859, 3.26822, 3.30016,…
+## $ `s(year_fac).13` <dbl> 2.36270, 2.33583, 2.26894, 2.07640, 2.18187, 2.36848,…
+## $ `s(year_fac).14` <dbl> 2.69838, 2.64463, 2.66172, 2.59959, 2.59452, 2.64214,…
+## $ `s(year_fac).15` <dbl> 2.25933, 2.25514, 2.07515, 2.15506, 2.17237, 2.20526,…
+## $ `s(year_fac).16` <dbl> 2.09584, 2.10991, 1.83435, 2.20681, 2.26212, 1.92478,…
+## $ `s(year_fac).17` <dbl> 0.4150130, 0.4578770, 0.2064460, 0.9136730, 1.0479400…
With any model fitted in mvgam
, the underlying
Stan
code can be viewed using the code
function:
## [1] -2.24966 3.46014
+## [1] -1.65123 3.47762
Objects of class mvgam_forecast
have an associated plot
function as well:
@@ -940,14 +942,14 @@Automatic forecasting for new dataplot(model1b, type = 'forecast')
## Out of sample DRPS:
-## [1] 180.2378
+## [1] 182.1551
We can also view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set
plot(model1b, type = 'forecast', newdata = data_test)
## Out of sample DRPS:
-## [1] 180.2378
+## [1] 182.1551
As with the hindcast
function, we can use the
forecast
function to automatically extract the posterior
distributions for these predictions. This also returns an object of
@@ -975,12 +977,12 @@
coef(model2)
## 2.5% 50% 97.5% Rhat n_eff
-## ndvi 0.3219343 0.3896115 0.4561312 1 1941
-## s(year_fac).1 1.1213810 1.4014750 1.6763905 1 2704
-## s(year_fac).2 1.8000682 2.0033650 2.2080812 1 2165
-## s(year_fac).3 2.1762575 2.3832750 2.5682495 1 2034
-## s(year_fac).4 2.3162657 2.5098600 2.6993935 1 2187
-## s(year_fac).5 1.1848087 1.4253400 1.6454560 1 2687
-## s(year_fac).6 1.0015992 1.2699300 1.5162433 1 2437
-## s(year_fac).7 1.1421785 1.4162100 1.6830155 1 3079
-## s(year_fac).8 2.0869212 2.2735250 2.4629585 1 2295
-## s(year_fac).9 2.7231583 2.8611750 2.9968915 1 2301
-## s(year_fac).10 1.9826495 2.1869100 2.3888603 1 2581
-## s(year_fac).11 2.2645230 2.4408950 2.6059805 1 2190
-## s(year_fac).12 2.5381555 2.6944350 2.8382668 1 2097
-## s(year_fac).13 1.3885932 1.6143500 1.8470717 1 3245
-## s(year_fac).14 0.5566057 1.9835550 3.3532650 1 1688
-## s(year_fac).15 0.7313315 1.9786150 3.3730625 1 1698
-## s(year_fac).16 0.5209962 1.9957750 3.1762922 1 1581
-## s(year_fac).17 0.6614678 1.9759650 3.2837728 1 1633
+## ndvi 0.3214333 0.3900135 0.4563159 1 1786
+## s(year_fac).1 1.1140800 1.4079600 1.6670795 1 2685
+## s(year_fac).2 1.7910845 2.0025550 2.1936545 1 2607
+## s(year_fac).3 2.1944393 2.3821600 2.5643877 1 2057
+## s(year_fac).4 2.3301122 2.5071400 2.6851682 1 2129
+## s(year_fac).5 1.1836370 1.4268450 1.6446277 1 2567
+## s(year_fac).6 1.0164710 1.2687300 1.5169110 1 2267
+## s(year_fac).7 1.1292298 1.4117000 1.6742532 1 2636
+## s(year_fac).8 2.0858535 2.2694550 2.4555455 1 2082
+## s(year_fac).9 2.7214870 2.8538250 2.9875630 1 2173
+## s(year_fac).10 1.9630840 2.1858000 2.3866162 1 2669
+## s(year_fac).11 2.2738888 2.4383000 2.6029485 1 2050
+## s(year_fac).12 2.5519082 2.6920100 2.8434890 1 2114
+## s(year_fac).13 1.4001263 1.6197550 1.8443220 1 2484
+## s(year_fac).14 0.6821051 1.9962300 3.3376105 1 1519
+## s(year_fac).15 0.7618126 2.0004150 3.3627562 1 1648
+## s(year_fac).16 0.6390487 1.9976450 3.2277897 1 1658
+## s(year_fac).17 0.6054341 1.9780000 3.3044415 1 1370
Look at the estimated effect of ndvi
using
plot.mvgam
with type = 'pterms'
@@ -1118,24 +1120,24 @@Adding predictors as “fixed” eff dplyr::glimpse(beta_post)
## Rows: 2,000
## Columns: 18
-## $ ndvi <dbl> 0.355167, 0.354315, 0.400233, 0.388748, 0.341074, 0.4…
-## $ `s(year_fac).1` <dbl> 1.45330, 1.39405, 1.45254, 1.11547, 1.66909, 1.17323,…
-## $ `s(year_fac).2` <dbl> 2.05506, 2.07679, 1.97119, 1.99876, 2.01560, 2.07635,…
-## $ `s(year_fac).3` <dbl> 2.59536, 2.35567, 2.37899, 2.32072, 2.48880, 2.40466,…
-## $ `s(year_fac).4` <dbl> 2.63194, 2.56439, 2.55642, 2.43839, 2.67447, 2.47027,…
-## $ `s(year_fac).5` <dbl> 1.54866, 1.44661, 1.48310, 1.35734, 1.53211, 1.42122,…
-## $ `s(year_fac).6` <dbl> 1.28962, 1.44256, 1.13824, 1.43169, 1.27521, 1.37027,…
-## $ `s(year_fac).7` <dbl> 1.60251, 1.39342, 1.47103, 1.40323, 1.46742, 1.40967,…
-## $ `s(year_fac).8` <dbl> 2.33166, 2.34704, 2.19045, 2.30872, 2.34791, 2.26251,…
-## $ `s(year_fac).9` <dbl> 2.96146, 2.87900, 2.85271, 2.76788, 2.92536, 2.87675,…
-## $ `s(year_fac).10` <dbl> 2.16140, 2.41945, 1.98341, 2.39724, 2.20351, 2.18907,…
-## $ `s(year_fac).11` <dbl> 2.51912, 2.52120, 2.45377, 2.36322, 2.59581, 2.37504,…
-## $ `s(year_fac).12` <dbl> 2.75224, 2.76068, 2.70921, 2.68268, 2.78127, 2.68005,…
-## $ `s(year_fac).13` <dbl> 1.63046, 1.77451, 1.48580, 1.66704, 1.72238, 1.52178,…
-## $ `s(year_fac).14` <dbl> 1.496390, 1.016750, 1.710010, 2.178540, 2.644870, 2.6…
-## $ `s(year_fac).15` <dbl> 1.871760, 2.004820, 1.210480, 2.188060, 2.624220, 2.5…
-## $ `s(year_fac).16` <dbl> 2.634900, 2.017970, 2.073360, 2.412940, 2.959150, 2.1…
-## $ `s(year_fac).17` <dbl> 1.91701, 1.50308, 1.17690, 1.54644, 1.92542, 2.15811,…
+## $ ndvi <dbl> 0.365016, 0.383852, 0.382832, 0.374714, 0.385262, 0.3…
+## $ `s(year_fac).1` <dbl> 1.26080, 1.58275, 1.32614, 1.34134, 1.36449, 1.43590,…
+## $ `s(year_fac).2` <dbl> 2.11456, 1.97718, 1.83386, 1.89789, 1.95921, 2.05343,…
+## $ `s(year_fac).3` <dbl> 2.30024, 2.37424, 2.42140, 2.45382, 2.38137, 2.43344,…
+## $ `s(year_fac).4` <dbl> 2.58320, 2.48406, 2.51264, 2.51145, 2.41215, 2.49424,…
+## $ `s(year_fac).5` <dbl> 1.65731, 1.21655, 1.27151, 1.29860, 1.40799, 1.44363,…
+## $ `s(year_fac).6` <dbl> 1.11182, 1.40885, 1.23147, 1.32212, 1.30500, 1.24335,…
+## $ `s(year_fac).7` <dbl> 1.55856, 1.20594, 1.60187, 1.65128, 1.18561, 1.65570,…
+## $ `s(year_fac).8` <dbl> 2.31500, 2.17933, 2.19752, 2.27384, 2.21278, 2.26591,…
+## $ `s(year_fac).9` <dbl> 2.86616, 2.85139, 2.92218, 2.84696, 2.80555, 2.81140,…
+## $ `s(year_fac).10` <dbl> 2.25933, 2.12217, 2.14634, 2.24707, 2.19890, 2.17169,…
+## $ `s(year_fac).11` <dbl> 2.52557, 2.49356, 2.62267, 2.52679, 2.43452, 2.40733,…
+## $ `s(year_fac).12` <dbl> 2.72165, 2.70566, 2.67655, 2.70252, 2.68610, 2.71396,…
+## $ `s(year_fac).13` <dbl> 1.55743, 1.70503, 1.48361, 1.55631, 1.58906, 1.62060,…
+## $ `s(year_fac).14` <dbl> 2.566090, 1.447150, 1.984190, 2.192650, 1.231120, 1.9…
+## $ `s(year_fac).15` <dbl> 1.05225, 2.12175, 1.72275, 1.93314, 1.75620, 1.66417,…
+## $ `s(year_fac).16` <dbl> 1.447540, 2.158620, 1.625380, 1.515920, 2.205820, 0.7…
+## $ `s(year_fac).17` <dbl> 0.828068, 0.831492, 2.645610, 2.822590, 0.700252, 2.3…
The posterior distribution for the effect of ndvi
is
stored in the ndvi
column. A quick histogram confirms our
inference that log(counts)
respond positively to increases
@@ -1272,26 +1274,26 @@
mvgam
plot(model3, type = 'forecast', newdata = data_test)
## Out of sample DRPS:
-## [1] 287.0755
+## [1] 286.5788
Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly forever beyond the edge of the training data. Any slight @@ -1512,33 +1514,32 @@
mvgam
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
-## (Intercept) 0.580 1.9000 2.700 1.17 36
-## s(ndvi).1 -0.078 0.0069 0.120 1.01 1388
-## s(ndvi).2 -0.120 0.0150 0.210 1.00 915
-## s(ndvi).3 -0.038 -0.0012 0.038 1.00 2092
-## s(ndvi).4 -0.200 0.1000 0.900 1.01 561
-## s(ndvi).5 -0.049 0.1600 0.350 1.00 489
+## (Intercept) 0.890 1.9000 2.500 1.03 61
+## s(ndvi).1 -0.082 0.0110 0.220 1.00 533
+## s(ndvi).2 -0.160 0.0140 0.400 1.00 454
+## s(ndvi).3 -0.054 -0.0018 0.061 1.01 760
+## s(ndvi).4 -0.300 0.1200 1.700 1.01 268
+## s(ndvi).5 -0.100 0.1500 0.340 1.00 462
##
## Approximate significance of GAM observation smooths:
-## edf Chi.sq p-value
-## s(ndvi) 1.05 89.1 0.055 .
+## edf Chi.sq p-value
+## s(ndvi) 0.978 95.5 0.088 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Latent trend parameter AR estimates:
## 2.5% 50% 97.5% Rhat n_eff
-## ar1[1] 0.70 0.81 0.93 1.05 76
-## sigma[1] 0.67 0.80 0.95 1.00 555
+## ar1[1] 0.69 0.81 0.94 1.01 173
+## sigma[1] 0.67 0.80 0.96 1.01 518
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
-## Rhats above 1.05 found for 131 parameters
-## *Diagnose further to investigate why the chains have not mixed
+## Rhat looks reasonable for all parameters
## 0 of 2000 iterations ended with a divergence (0%)
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior
##
-## Samples were drawn using NUTS(diag_e) at Fri Jan 12 3:54:58 PM 2024.
+## Samples were drawn using NUTS(diag_e) at Mon Jan 15 9:10:06 AM 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
@@ -1553,8 +1554,8 @@ mvgam
plot(model4, type = 'forecast', newdata = data_test)
-## Out of sample CRPS:
-## [1] 150.7294
+## Out of sample DRPS:
+## [1] 150.9452
The trend is evolving as an AR1 process, which we can also view:
plot(model4, type = 'trend', newdata = data_test)
mvgam
## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
## elpd_diff se_diff
## model4 0.0 0.0
-## model3 -560.0 66.5
+## model3 -560.2 66.5
The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data.
@@ -1584,7 +1585,7 @@mvgam
score_mod3 <- score(fc_mod3, score = 'drps')
score_mod4 <- score(fc_mod4, score = 'drps')
sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE)
-## [1] -136.3461
+## [1] -135.6337
A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3)
diff --git a/docs/articles/mvgam_overview_files/figure-html/Histogram of NDVI effects-1.png b/docs/articles/mvgam_overview_files/figure-html/Histogram of NDVI effects-1.png index f099c57676234e4e81ea6faf053b43afe3c62106..56b72cb68712cc9de0c21f867371f661bc68c199 100644 GIT binary patch literal 18717 zcmeHv2T)UKzjqK;U@hQUK)P#aD@8<_NU_0&sz5-RAd2)BIspCbA!Eq793_Q^E4CY>L*7?=jeWf_nv#i{(;Dm$Kigk+E1$b`m`4}WxyAWn(Kx@?t
z3O|5#>C} `x^9^K2_3Z#3+d6*GMTz+)N((CUy#4?G%aHO`dOK$lPBL3h3>V
z+3 Dx)Zbgs{)iUGOwgl~G*brb+1BI6>XDn~ZE%LF7n lM@z7#c+&e6wZ!RXmsnVGWeM;6`t~#7h7)shbHi0@2T8uO-_fCG
z=3V$5on{5-C{r7Z#>Oiv)kr=q
?ly5ElT`EB%|9R?9yJn<>83EO(7XF$urNtp^K{__n(A!__`IZ6J?q`5%-m+o$C
zvszTros!H-Iwa@vHP7bg(v94BQ)_z*La&mFsL&LMf?%qTJSDbklw!Aq^Kc(s*gnm3upL20g
zTcqpXEJ