Skip to content

Commit

Permalink
add predict; update all beta priors
Browse files Browse the repository at this point in the history
  • Loading branch information
nicholasjclark committed Feb 1, 2022
1 parent cdabc12 commit f180a3d
Show file tree
Hide file tree
Showing 80 changed files with 751 additions and 608 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ export(plot_mvgam_smooth)
export(plot_mvgam_trace)
export(plot_mvgam_trend)
export(plot_mvgam_uncertainty)
export(predict_mvgam)
export(roll_eval_mvgam)
export(series_to_mvgam)
export(sim_mvgam)
Expand Down
576 changes: 290 additions & 286 deletions NEON_manuscript/Case studies/mvgam_case_study1.html

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
9 changes: 7 additions & 2 deletions NEON_manuscript/Case studies/mvgam_case_study2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ mod1 <- mvjagam(data_train = dat$data_train,
n_lv = 2,
family = 'nb',
trend_model = 'RW',
chains = 4,
burnin = 2000)
```

Expand Down Expand Up @@ -75,6 +76,7 @@ mod2 <- mvjagam(data_train = dat$data_train,
n_lv = 8,
family = 'nb',
trend_model = 'RW',
chains = 4,
burnin = 2000)
```

Expand Down Expand Up @@ -130,7 +132,7 @@ Plot the series to see how similar their seasonal shapes are over time
plot(series, legend.loc = 'topleft')
```

Now we will fit an `mvgam` model with shared seasonality and random intercepts per series. Our first attempt will ignore any temporal component in the residuals so that we can identidy which GAM predictor combination gives us the best fit, prior to investigating how to deal with any remaining autocorrelation. We assume a Negative Binomial distrubution with shared overdispersion parameter for the response
Now we will fit an `mvgam` model with shared seasonality and random intercepts per series. Our first attempt will ignore any temporal component in the residuals so that we can identidy which GAM predictor combination gives us the best fit, prior to investigating how to deal with any remaining autocorrelation. We assume a Negative Binomial distrubution with shared overdispersion parameter for the response. Also note that any smooths using the random effects basis (`s(series, bs = "re")` below) are automatically re-parameterised to use the [non-centred parameterisation that is necessary to help avoid common posterior degeneracies in hierarchical models](https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html)
```{r, message=FALSE, warning=FALSE}
trends_mod1 <- mvjagam(data_train = trends_data$data_train,
data_test = trends_data$data_test,
Expand All @@ -139,6 +141,7 @@ trends_mod1 <- mvjagam(data_train = trends_data$data_train,
knots = list(season = c(0.5, 12.5)),
trend_model = 'None',
family = 'nb',
chains = 4,
burnin = 5000)
```

Expand All @@ -151,6 +154,7 @@ trends_mod2 <- mvjagam(data_train = trends_data$data_train,
knots = list(season = c(0.5, 12.5)),
trend_model = 'None',
family = 'nb',
chains = 4,
burnin = 5000)
```

Expand Down Expand Up @@ -193,12 +197,13 @@ trends_mod3 <- mvjagam(data_train = trends_data$data_train,
use_lv = T,
n_lv = 4,
family = 'nb',
chains = 4,
drift = T,
burnin = 5000)
```


Have a look at the returned `JAGS` model file to see how the dynamic factors are incorporated. Notice that the precisions of factors, together with each factor's set of loadings, are penalised using a regularised horseshoe prior
Have a look at the returned `JAGS` model file to see how the dynamic factors are incorporated. Notice that the precisions of factors, together with each factor's set of loadings, are penalised using a regularised horseshoe prior
```{r}
trends_mod3$model_file
```
Expand Down
470 changes: 232 additions & 238 deletions NEON_manuscript/Case studies/mvgam_case_study2.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ hostUrl: rpubs.com
appId: https://api.rpubs.com/api/v1/document/856585/540d5299dc584aff9095b986b978429b
bundleId: https://api.rpubs.com/api/v1/document/856585/540d5299dc584aff9095b986b978429b
url: http://rpubs.com/NickClark47/856585
when: 1643596147.89081
when: 1643680227.13467
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ hostUrl: rpubs.com
appId: https://api.rpubs.com/api/v1/document/856733/5be2f57c523545a694d50c31f9a87205
bundleId: https://api.rpubs.com/api/v1/document/856733/5be2f57c523545a694d50c31f9a87205
url: http://rpubs.com/NickClark47/856733
when: 1643603987.43444
when: 1643695525.11005
25 changes: 12 additions & 13 deletions NEON_manuscript/portal_example.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,29 +48,28 @@ data_test <- list(lag = data_all$lag[175:length(data_all$y),],
mintemp = data_all$mintemp[175:length(data_all$y),])

# Fit a dynamic GAM with a distributed lag term for precipitation and
# an AR1 process for the trend; note that we suppress the intercept here as sometimes
# this can be difficult to estimate alongside the latent dynamic process, particularly if that
# process is relatively stationary and if the global intercept is relatively small
# an AR1 process for the trend
test <- mvjagam(formula = y ~ s(season, bs = "cc", k = 12) +
te(mintemp, lag, k = c(8, 3)) +
te(precip, lag, k = c(8, 3)),
drop_intercept = T,
knots = list(season = c(0.5, 12.5)),
data_train = data_train,
data_test = data_test,
family = 'poisson',
chains = 3,
burnin = 5000,
trend_model = 'AR1')
te(mintemp, lag, k = c(8, 4)) +
te(precip, lag, k = c(8, 4)),
knots = list(season = c(0.5, 12.5)),
data_train = data_train,
data_test = data_test,
family = 'poisson',
chains = 4,
burnin = 5000,
trend_model = 'AR1')

plot_mvgam_fc(test, series = 1, data_test = data_test, ylim = c(0, 100))
plot_mvgam_trend(test, series = 1, data_test = data_test)
plot_mvgam_uncertainty(test, series = 1, data_test = data_test)
plot_mvgam_smooth(test, series = 1, smooth ='season')
plot_mvgam_smooth(test, series = 1, smooth = 2)
plot_mvgam_smooth(test, series = 1, smooth = 3)
plot_mvgam_trace(test, 'trend')
summary_mvgam(test)


# Initiate particle filter
pfilter_mvgam_init(test, data_assim = data_test)

Expand Down
82 changes: 54 additions & 28 deletions R/mvjagam.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,6 @@
#'in addition to any other variables included in the linear predictor of \code{formula}. If included, the
#'observations in \code{data_test} will be set to \code{NA} when fitting the model so that posterior
#'simulations can be obtained
#'@param drop_intercept \code{logical}. If \code{TRUE}, the global intercept will be set to zero in the \code{JAGS} model,
#'which may be necessary when estimating models for univariate series with latent trend components to preserve identifiability
#'(otherwise the latent trend may compete with the intercept, especially if it is relatively stationary)
#'@param prior_simulation \code{logical}. If \code{TRUE}, no observations are fed to the model, and instead
#'simulations from prior distributions are returned
#'@param return_jags_data \code{logical}. If \code{TRUE}, the list of data that is needed to fit the \code{JAGS}
Expand Down Expand Up @@ -76,7 +73,9 @@
#'state space trends. Prior distributions for most important model parameters can be altered by the user to inspect model
#'sensitivities to given priors. Note that latent trends are estimated on the log scale so choose tau, AR and phi priors
#'accordingly. The model parameters are esimated in a Bayesian framework using Gibbs sampling
#'in \code{\link[run.jags]{runjags}}. When using a dynamic factor model for the trends, factor precisions are given
#'in \code{\link[run.jags]{runjags}}. For any smooth terms using the random effect basis (\code{\link[mcgv]{smooth.construct.re.smooth.spec}}),
#'a non-centred parameterisation is employed to avoid degeneracies that are common in hierarchical models.
#'When using a dynamic factor model for the trends, factor precisions are given
#'regularized penalty priors to theoretically allow some factors to be dropped from the model by squeezing increasing
#'factors' variances to zero. This is done to help protect against selecting too many latent factors than are needed to
#'capture dependencies in the data, so it can often be advantageous to set n_lv to a slightly larger number. However
Expand All @@ -101,7 +100,6 @@ mvjagam = function(formula,
knots,
data_train,
data_test,
drop_intercept = FALSE,
prior_simulation = FALSE,
return_jags_data = FALSE,
family = 'nb',
Expand Down Expand Up @@ -199,7 +197,8 @@ mvjagam = function(formula,

# Update initial values of lambdas using the full estimates from the
# fitted bam model to speed convergence
ss_jagam$jags.ini$b <- coef(ss_gam)
#ss_jagam$jags.ini$b <- coef(ss_gam)
ss_jagam$jags.ini$b <- NULL

if(length(ss_gam$sp) == length(ss_jagam$jags.ini$lambda)){
ss_jagam$jags.ini$lambda <- ss_gam$sp
Expand All @@ -220,38 +219,66 @@ mvjagam = function(formula,
lines_remove <- c(1:grep('## response', base_model))
base_model <- base_model[-lines_remove]

# Any parametric effects in the gam need to get more sensible priors
# Any parametric effects in the gam (particularly the intercept) need to get more sensible priors to ensure they
# do not directly compete with the latent trends
if(any(grepl('Parametric effect priors', base_model))){

in_parenth <- regmatches(base_model[grep('Parametric effect priors',
base_model) + 1],
gregexpr( "(?<=\\().+?(?=\\))", base_model[grep('Parametric effect priors',
base_model) + 1], perl = T))[[1]][1]
n_terms <- as.numeric(sub(".*:", "", in_parenth))
ss_jagam$jags.data$p_coefs <- coef(ss_gam)[1:n_terms]

if(n_terms == 1 && drop_intercept){
base_model[grep('Parametric effect priors',
base_model) + 1] <- paste0(' for (i in 1:',
n_terms,
') { b[i] ~ dunif(-0.0001, 0.0001) }')
ss_jagam$jags.ini$b[1] <- 0
} else if(n_terms > 1 && drop_intercept){
rmvn <- function(n,mu,sig) {
L <- mroot(sig);m <- ncol(L);
t(mu + L%*%matrix(rnorm(m*n),m,n))
}

base_model[grep('Parametric effect priors',
base_model) + 1] <- paste0(' b[i] ~ dunif(-0.0001, 0.0001)\n for (i in 2:',
n_terms,
') { b[i] ~ dnorm(0, tau_para[i])\n tau_para[i] ~ dexp(0.05) }')
beta_sims <- rmvn(1000, coef(ss_gam), ss_gam$Vp)
ss_jagam$jags.data$p_taus <- apply(as.matrix(beta_sims[,1:n_terms]),
2, function(x) 1 / sd(x))

ss_jagam$jags.ini$b[1] <- 0
} else {
base_model[grep('Parametric effect priors',
base_model[grep('Parametric effect priors',
base_model) + 1] <- paste0(' for (i in 1:',
n_terms,
') { b[i] ~ dnorm(0, tau_para[i])\n tau_para[i] ~ dexp(0.05) }')
') { b[i] ~ dnorm(p_coefs[i], p_taus[i]) }')
base_model[grep('Parametric effect priors',
base_model)] <- c(' ## parametric effect priors (regularised for identifiability)')
}

# For any random effect smooths, use the non-centred parameterisation to avoid degeneracies
smooth_labs <- do.call(rbind, lapply(seq_along(ss_gam$smooth), function(x){
data.frame(label = ss_gam$smooth[[x]]$label, class = class(ss_gam$smooth[[x]])[1])
}))

if(any(smooth_labs$class == 'random.effect')){
re_smooths <- smooth_labs %>%
dplyr::filter(class == 'random.effect') %>%
dplyr::pull(label)

for(i in 1:length(re_smooths)){
in_parenth <- regmatches(base_model[grep(re_smooths[i],
base_model, fixed = T) + 1],
gregexpr( "(?<=\\().+?(?=\\))", base_model[grep(re_smooths[i],
base_model, fixed = T) + 1],
perl = T))[[1]][1]
n_terms <- as.numeric(sub(".*:", "", in_parenth))
n_start <- as.numeric(strsplit(sub(".*\\(", "", in_parenth), ':')[[1]][1])
base_model[grep(re_smooths[i],
base_model, fixed = T) + 1] <- paste0(' for (i in ', n_start, ':',
n_terms,
') { b_raw[i] ~ dnorm(0, 1)\n b[i] <- b_raw[i] * ',
paste0('sigma_raw', i), ' \n }\n ',
paste0('sigma_raw', i), ' ~ dexp(1)')
base_model[grep(re_smooths[i],
base_model, fixed = T)] <- paste0(' ## prior (non-centred) for ', re_smooths[i], '...')
}

}

base_model[grep('smoothing parameter priors',
base_model)] <- c(' ## smoothing parameter priors...')

# Add replacement lines for trends and the linear predictor
if(!use_lv){
Expand Down Expand Up @@ -293,7 +320,8 @@ for (s in 1:n_series){
ar1[s] ~ dnorm(0, 10)
ar2[s] ~ dnorm(0, 10)
ar3[s] ~ dnorm(0, 10)
tau[s] ~ dgamma(0.01, 0.001)
tau[s] <- pow(sigma[s], -2)
sigma[s] ~ dexp(1)
}
## Negative binomial likelihood functions
Expand Down Expand Up @@ -454,9 +482,7 @@ for (i in 1:n) {
## to allow loadings for entire factors to be 'dropped', reducing overfitting. Still
## need to impose identifiability constraints
## Global shrinkage penalty (half cauchy, following Gelman et
## al. 2008: A weakly informative default prior distribution for logistic and other
## regression models)
## Global shrinkage penalty (T distribution on the standard deviation)
lv_tau ~ dscaled.gamma(0.5, 5)
## Shrinkage penalties for each factor squeeze the factor to a flat line and squeeze
Expand Down Expand Up @@ -663,7 +689,7 @@ for (i in 1:n) {
X$series <- as.numeric(rbind(data_train, data_test)$series)

# Add an outcome variable
X$outcome <- c(orig_y, rep(NA, NROW(data_test)))
X$outcome <- c(data_train$y, rep(NA, NROW(data_test)))
}

} else {
Expand Down Expand Up @@ -691,7 +717,7 @@ for (i in 1:n) {
dplyr::pull(time)
}

X$outcome <- c(orig_y)
X$outcome <- c(data_train$y)
X$series <- as.numeric(data_train$series)
}

Expand Down
3 changes: 1 addition & 2 deletions R/pfilter_mvgam_init.R
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,7 @@ ar2s <- MCMCvis::MCMCchains(object$jags_output, 'ar2')
ar3s <- MCMCvis::MCMCchains(object$jags_output, 'ar3')

# Negative binomial size estimate
sizes <- as.matrix(rep(hpd(MCMCvis::MCMCchains(object$jags_output, 'r'))[2],
dim(betas)[1]))
sizes <- MCMCvis::MCMCchains(object$jags_output, 'r')

# Generate sample sequence for n_particles
if(n_particles < dim(phis)[1]){
Expand Down
1 change: 1 addition & 0 deletions R/pfilter_mvgam_online.R
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ pfilter_mvgam_online = function(data_assim,

# Get next observations in line to be assimilated
if(class(data_assim) == 'list'){
n_series <- length(unique(obs_data$series))
all_needed_names <- names(obs_data)
# Find indices of next observation
data_assim_orig <- data_assim
Expand Down
4 changes: 2 additions & 2 deletions R/plot_mvgam_fc.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#'Plot mvjagam posterior predictions for a specified series
#'@param object \code{list} object returned from \code{mvjagam}
#'@param series \code{integer} specifying which series in the set is to be plotted
#'@param data_test Optional \code{dataframe} of test data containing at least 'series', 'season' and 'year'
#'@param data_test Optional \code{dataframe} or \code{list} of test data containing at least 'series', 'season' and 'year'
#'for the forecast horizon, in addition to any other variables included in the linear predictor of \code{formula}. If
#'included, the test values are shown as points on the plot to help visualise the accuracy of the model's forecast.
#'Note this is only useful if the same \code{data_test} was also included when fitting the original model.
Expand All @@ -14,7 +14,7 @@
#'that was used to train the model.
#'@return A base \code{R} graphics plot
#'@export
plot_mvgam_fc = function(object, series, data_test, hide_xlabels = FALSE, ylab, ylim){
plot_mvgam_fc = function(object, series = 1, data_test, hide_xlabels = FALSE, ylab, ylim){

data_train <- object$obs_data
ends <- seq(0, dim(MCMCvis::MCMCchains(object$jags_output, 'ypred'))[2],
Expand Down
6 changes: 5 additions & 1 deletion R/plot_mvgam_ppc.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,12 @@ plot_mvgam_ppc = function(object, data_test, series, type = 'density', legend_po
c_dark_highlight <- c("#7C0000")

if(type == 'mean'){
# Plot observed and predicted location (means)
# Plot observed and predicted means
pred_means <- apply(preds, 1, mean)
lower <- quantile(pred_means, probs = 0.01)
upper <- quantile(pred_means, probs = 0.99)
pred_means <- pred_means[-which(pred_means > upper)]
pred_means <- pred_means[-which(pred_means < lower)]
obs_mean <- mean(truths)
hist(pred_means,
xlim = c(min(min(pred_means), min(obs_mean)),
Expand Down
Loading

0 comments on commit f180a3d

Please sign in to comment.