-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathBBS_trends_analysis.Rmd
230 lines (196 loc) · 8.08 KB
/
BBS_trends_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
title: "Example BBS trends analysis"
subtitle: '83 species from 1992 - 2021'
author: Nicholas Clark, School of Veterinary Science, University of Queensland^[[email protected], https://github.com/nicholasjclark]
output:
html_document:
toc: true
toc_float: true
---
```{css, echo=FALSE}
details > summary {
padding: 4px;
background-color: #8F2727;
color: white;
border: none;
box-shadow: 1px 1px 2px #bbbbbb;
cursor: pointer;
}
details > summary:hover {
background-color: #DCBCBC;
color: #8F2727;
}
.scroll-300 {
max-height: 300px;
overflow-y: auto;
background-color: inherit;
}
h1, #TOC>ul>li {
color: #8F2727;
}
h2, #TOC>ul>ul>li {
color: #8F2727;
}
h3, #TOC>ul>ul>li {
color: #8F2727;
}
.list-group-item.active, .list-group-item.active:focus, .list-group-item.active:hover {
z-index: 2;
color: #fff;
background-color: #DCBCBC;
border-color: #DCBCBC;
}
a {
color: purple;
font-weight: bold;
}
a:hover {
color: #C79999;
}
::selection {
background: #DCBCBC;
color: #8F2727;
}
.button_red {
background-color: #8F2727;
border: #8F2727;
color: white;
}
.button_red:hover {
background-color: #DCBCBC;
color: #8F2727;
}
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
dpi = 150,
fig.asp = 0.8,
fig.width = 6,
out.width = "60%",
fig.align = "center")
```
## Source libraries
```{r include=FALSE}
library(mgcv)
library(dplyr)
library(ggplot2)
library(marginaleffects)
source('Functions/utilities.R')
```
```{r eval=FALSE}
library(mgcv)
library(dplyr)
library(ggplot2)
library(marginaleffects)
source('Functions/utilities.R')
```
```{r, class.output="scroll-300"}
sessionInfo()
```
## Model summary and AIC calculation
```{r}
load("./data/model_objects.rda")
mod <- readRDS("./models/mod.rds")
```
Don't run full `mgcv` summaries; they are extremely computationally expensive for these models
```{r}
summary(mod, re.test = FALSE)
```
AIC is available, if we choose to use it
```{r}
AIC(mod)
```
## Contributions to 'wiggliness'
Calculate phylogenetic contributions to species' nonlinear trend estimates. This works by taking advantage of the `predict.gam(type = 'terms', ...)` functionality in `mgcv`. Essentially we compute the partial contribution of *each* term in the fitted model to the nonlinear trend estimate for each species x region combination. Summing these contributions gives the trend estimate for each row in `newdata`, while taking the squared second derivative of this sum gives the *slope of the slope* for each row. If we sum these squared second derivatives we arrive at a useful definition of "wiggliness". The idea of the `compute_derivs()` function is to do this while either excluding phylogenetic terms or excluding all terms apart from the phylogenetic terms. This allows us to isolate the partial contribution of phylogeny to the overall wigliness (i.e. nonlinearity) of a species' trend estimate. It is computationally expensive so I operate over 8 cores here
```{r}
phylo_derivs <- compute_derivs(mod_data,
model = mod,
type = 'phylogenetic',
n_cores = 8)
```
There is also a work-in-progress plotting routine for the returned object. In this plot, bluer shades show higher contributions of phylogeny, while redder shades show smaller contributions of phylogeny. They are plotted on the log scale, so a value of `0` means phylogeny contributes equally compared to all other non-phylogenetic components (space and non-phylogenetic relationships)
```{r}
plot_trait_conts(trait_derivs = phylo_derivs,
type = 'phylogenetic')
```
We can see that, indeed for some species phylogeny plays a sizeable role. Here I compute the proportion of trends in which phylogenetic information contributes at least 50% to the estimated wiggliness of the function compared to other information (i.e. spatial and non-phylogenetic variation). This is an arbitrary threshold of course
```{r}
phylo_derivs %>%
# Filter out those with very small wiggliness estimates, as
# the contributions don't add any meaningful information here
# because the estimated trend is flat
dplyr::filter(wiggliness >= quantile(wiggliness, probs = 0.15)) %>%
dplyr::summarise(prop = 100 * length(which(trait_conts >= 0.5)) /
dplyr::n())
```
## Plotting trends
Plot some predictions against truth (note, these predictions will not include contributions from the residual AR process). The `plot_sp_trends()` function will take the supplied data and compute predictions that include the offset terms, so what we are seeing is a combination of the estimated trend and the supplied offset. We can supply specific regions we'd want, but I'm just using the default here. These give us a sense of whether the model has done a reasonable job of capturing the complexity of the observations. For some species it is clear that we'd need more complexity (i.e. higher `k` values for some of the smooth terms)
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = 'Hirundo_rustica',
type = 'response',
median_records = FALSE)
```
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = levels(mod_data$sp_latin)[8],
type = 'response',
median_records = FALSE)
```
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = levels(mod_data$sp_latin)[9],
type = 'response',
median_records = FALSE)
```
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = levels(mod_data$sp_latin)[30],
type = 'response',
median_records = FALSE)
```
We can also plot some expected trends by fixing the offsets and asking how the model would expect species' abundances to change over time. This is a more useful way of asking what the estimated trend is for each species x region combination, because it gives what the model would expect to see if we had a standardized sampling effort per year
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = 'Hirundo_rustica',
type = 'expected',
median_records = TRUE)
```
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = levels(mod_data$sp_latin)[8],
type = 'expected',
median_records = TRUE)
```
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = levels(mod_data$sp_latin)[9],
type = 'expected',
median_records = TRUE)
```
```{r}
plot_sp_trends(model = mod,
data = mod_data,
species = levels(mod_data$sp_latin)[30],
type = 'expected',
median_records = TRUE)
```
Finally, we can compute some attempt at an "average" temporal trend for the last 25 years of data. I use only 65% of species to save computational time, but of course including more species would give a more accurate estimate of the "average" expected trend. Note that this has been zero-centred for interpretability
```{r}
plot_av_trend(model = mod, mod_data = mod_data,
prop_species = 0.65)
```
## To do:
1. fit simpler models that ignore all relationship information, as well as phylogenetic/functional slopes (linear) models for comparisons
2. leave 10% of combos (strata x species) out and generate predictions; compute CRPS from each model; run several CV folds
3. assess variation in prediction accuracy across space and across the trees
4. visualise the proportional change in variance explained (as calculated above) on the tree to see if there are clusters of species that depend more heavily on phylogenetic relationships
5. assess support for nonlinearity of trends; determine when trends were accelerating / decelerating most rapidly