-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathindex.Rmd
662 lines (540 loc) · 18.8 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
---
title: "Working with spatial data in R"
subtitle: "https://github.com/ecodatasci-tlv/spatial"
date: "`r Sys.Date()`"
output:
xaringan::moon_reader:
lib_dir: libs
css: ["default", "custom-fonts.css"]
nature:
highlightStyle: github
highlightLines: true
ratio: '16:9'
beforeInit: "https://platform.twitter.com/widgets.js"
---
```{r setup, include=FALSE}
if(!require(pacman))install.packages("pacman")
pacman::p_load(knitr)
pacman::p_load(tidyverse)
pacman::p_load(rgdal)
pacman::p_load(sp)
countries <- readOGR("Data", layer = "countries")
conflicted::conflict_prefer("filter", "dplyr")
options(htmltools.dir.version = FALSE,
tibble.width = 70,
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis")
opts_chunk$set(
fig.width = 6.5,
fig.height = 4.5,
fig.align = "center",
cache = TRUE
)
theme_set(theme_minimal() +
theme(text = element_text(size = 20)))
```
# Topics we'll cover today:
* Types of spatial data
* Importing/Exporting spatial data
* Projections, extents, and units
* Comparison of different GIS tools out there
* Spatial data manipulation
* Intro to spatial analysis
---
class: inverse, middle, center
#Data types
.small[
https://www.gislounge.com/geodatabases-explored-vector-and-raster-data/
]
---
.center[
.img-med[
![](pres_pictures/Screenshot 2019-05-22 13.08.45.png)
]]
---
class: middle, center
# **Vector data**
Equivalent to a spreadsheet with a geometry column that describes how to plot the data that represents that feature
---
### **Points**
Used to represent nonadjacent features
.center[
.img-small[
![](pres_pictures/points-vector.png)
]]
---
### **Lines**
One dimensional and are only used to measure length
.center[
.img-small[
![](pres_pictures/line-vector.png)
]]
---
### **Polygons**
Two dimensional and are used for measurement of area and parameters of geographical feature.
.center[
.img-small[
![](pres_pictures/polygon-vector.png)
]]
---
# File type
* Polygon and lines come in a shapefile format, point usually are stored in XY table (a spreadsheet with latitude and longitude columns)
* A shapefile is structured from 4 main files:
- **shp** - the geometry of each feature
- **dbf** - a dBase file with the attribute data for each feature (can be opened in excel; stores the metadata of each feature)
- **prj** - The projection file
- **shx** - a spatial index that helps a GIS program find the features in the .shp file quickly
**IMPORTANT:** each shapefile can only store one type of geometry - point/ polygon/ line
---
class: middle, center
# **Raster data**
Cell-based data, represents surfaces. Each cell represents a value
---
## **Raster can represent two types of data**
### **Continuous** - temperature, elevation, etc.
### **Discrete** - population density, species richness, etc.
---
# **Rasters have three types of datasets:**
### **Thematic**
Used to represent discrete data, such as soil type or land use
.center[
.img-small[
![](pres_pictures/thematic_map.png)
]]
---
### **Spectral**
Aerial or setalite imagery used during digital map generation
.center[
.img-med[
![](pres_pictures/mx.jpg)
]]
---
### **Surface**
Represents continuous changes across a landscape (for example elevation)
.center[
.img-med[
![](pres_pictures/gmted2010.png)
]]
---
class: center
# Vector and raster comparison
```{r echo=FALSE}
pacman::p_load(flextable,officer)
my_table<- data.frame(Data_type = c(rep("Vector",3),rep("Raster",3)),Pros = c("High geographical accuracy",
"Follows topology rules which increases integrity",
"Used for measurements of proximity","Very good for representing remote sensing data","Performs well with map algebra",""),Cons = c("Doesn’t work well with continuous data","Processing intensive due to topology checks","","Less aesthetic pleasing","Lacks attribute data flexibility
","Data size increases with the increase in resolution"))
myft <- flextable(
head(my_table),
col_keys = c("Data_type", "Pros", "Cons"))
myft<- bold(merge_v(theme_vanilla(myft),j="Data_type"),j = "Data_type")
align(width(fontsize(x = myft,size = 20,part = "all"),width = 4),align = 'left',part = 'all')
```
---
class: center
# Importing Spatial Data
.center[
.img-med[
![](pres_pictures/dims.jpeg)
]]
---
Now that we've learned about different types of spatial data, let's load some. We'll use the `rgdal` package for this:
```{r rgdal, message = F}
pacman::p_load(rgdal)
pkmng_countries <- readOGR(dsn = "Data",
layer = "pkmng_polygons")
```
---
```{r rgdal 2, message = F}
head(pkmng_countries)
```
---
As usual, there is a `tidyverse` alternative to view spatial objects and edit the associated data
.small[
```{r spdplyr}
pacman::p_load(spdplyr)
pkmng_countries
```
]
---
With this package, you can use `tidyverse` functions to manipulate your spatial data:
.small[
```{r spdplyr 2}
#filter out all countries without any Pokemon sightings, and let's rename the column while we're at it:
pkmng_countries %>%
filter(pkmn_rc > 0) %>%
rename(Country = NAME)
```
]
---
Now let's load our point data:
.small[
```{r points, message = F}
pkmng_points_dat <- read_csv("Data/pkmng_points.csv")
pkmng_points_dat
```
]
---
We are going to create a shapefile from this table.
```{r points 2}
pacman::p_load(sp)
xy <- tibble(longitude = pkmng_points_dat$longitude,
latitude = pkmng_points_dat$latitude)
pkmng_points <- SpatialPointsDataFrame(coords = xy,
data = pkmng_points_dat)
pkmng_points
```
---
# GeoJSON
So far we've been working with ESRI shapefiles. This is a format to encode geographic data, used widely in GIS systems.
Alternatively, there is the java-based GeoJSON format for encoding geographic data, which can support the features we've discussed (points, polygons).
There are advantages to GeoJSON, including reduced file sizes. Compare:
---
```{r geojson}
pacman::p_load(geojsonio)
pacman::p_load(rmapshaper)
#we now convert our ESRI shapefile (a SpatialPoylgonsDataFrame object) to GeoJSON format:
pkmng_countries_json <- geojson_json(pkmng_countries) #this can be VERY time-consuming
```
Compare the file sizes: 6.4 Mb for the ESRI format, compared to 1.1 Mb for the GeoJSON format!
Read more about GeoJSON here: https://macwright.org/2015/03/23/geojson-second-bite.html
---
# Rasters
Finally, importing rasters is easily done with the `raster` package:
```{r raster, message = F, warnings = F}
pacman::p_load(raster)
pkmng_raster <- raster("Data/pkmng_raster.tif")
pkmng_raster
```
---
# Projections
One of the most important things to remember when dealing with spatial data is the projection.
A projection is a mathematical transformation of coordinates from the surface of a sphere on a 2D plane.
.center[![](img/m-c.jpg)]
---
# Types of projections
## Projections by surface
Classification based on the type of surface onto which the globe is projected:
* Cylindrical
* Conic
* Azimuthal
etc.
---
## Projections by preservation of a metric property
Classification based on which property is maintained in the projection:
* Conformal (preserves shape, usually distorts area)
* Equal-area (preserves area, usually distorts shape)
* Equidistant (preserves distance from some point or line)
etc.
---
You can view the Coordinate Reference System of your spatial objects:
```{r projections, message = F}
crs(pkmng_countries)
```
You can see different values here, including the datum, units, etc.
---
Notice our points shapefile has no projection, because we didn't set it when creating the object:
```{r projections 2, message=F}
crs(pkmng_points)
```
---
We can set the projection for this object, either by writing a string of our desired projection, or extracting the projection from an existing object:
```{r projections 3, message=F}
proj1 <- crs("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
proj1
proj2 <- crs(pkmng_countries)
proj2
```
---
```{r projections 4, message=F}
proj4string(pkmng_points) <- proj1
pkmng_points
```
---
You can also transform already existing objects to have different projections:
```{r projections 5, message=F}
pkmng_points_p <- spTransform(pkmng_points,
crs("+proj=laea +lat_0=0 +lon_0=0"))
pkmng_points_p
```
---
Why are projections so important?
Here's our points mapped on top of a shapefile of countries:
```{r map, message=F, fig.height=5, fig.width=9}
plot(countries, axes = T)
plot(pkmng_points, add = T, pch = 1, col = "red")
```
---
Now, when the projections do not match:
```{r map 2, messages=F, fig.height=5, fig.width=9}
countries_p <- spTransform(countries,
crs("+proj=laea +lat_0=0 +lon_0=0"))
plot(countries_p, axes = T)
plot(pkmng_points, add = T, pch = 1, col = "red")
```
---
Whereas matching projections give us all the coordinates in the right place:
```{r map 3, message=F, fig.height=5, fig.width=9}
plot(countries_p, axes = T)
plot(pkmng_points_p, add = T, pch = 1, col = "red")
```
---
# Extent
Spatial objects also have extents - these are the minimum and maximum X and Y coordinates of the object. They don't have to have data in them. The coordinates of course depend on your projection and units:
``` {r extent, message=F}
countries@bbox
countries_p@bbox
```
---
For raster objects, you find the extent with this syntax:
```{r extent 2, message=F}
pkmng_raster@extent
```
---
You can change the extent of objects using the `crop()` or `extend()` functions, depending on whether you want to decrease or increase the extent:
``` {r extent 3, message=F}
pkmng_raster_c <- crop(pkmng_raster,
extent(-50,
100,
0,
80))
```
---
```{r extent 4, message=F, fig.height=5, fig.width=9}
plot(pkmng_raster)
plot(pkmng_raster_c, add = T, col = "red")
```
All the red cells are in the new, limited extent!
---
## Why is this important?
If you try to do spatial arithmetics on objects with mismatching extents and projections, you'll get errors. For example:
```{r extent 5, error = T}
raster::intersect(countries_p, pkmng_raster_c)
```
---
class: center, middle
# Data manipulation
---
class: center, middle
# Geoprocessing
a GIS operation used to manipulate spatial data. A typical geoprocessing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset
---
### **Buffer points** - creating a polygon based on proximity
**For example:** buffer some of the pokemon points to 100km radius
We'll first subset the data to have less points
```{r fig.height=6, fig.width=10}
sub_pkmng_p<- subset(pkmng_points,pokemonId %in%c(8:10))
plot(countries)
plot(sub_pkmng_p,add = T, col = "red")
```
---
Now we'll buffer the points in a 100km radius to make them into polygons
```{r fig.height=6, fig.width=10}
buffer_pkmng_points<- raster::buffer(sub_pkmng_p,width = 100000)
plot(countries)
plot(buffer_pkmng_points,add = T, col = "red")
```
---
### **Clip** - an overlay function that cuts out an input layer with the extent of a defined feature boundary. The result of this tool is a new clipped output layer.
There are a few ways of clipping polygons in R, we will look at using the rgdal package.
---
Lets clip the buffered points to to have the ones that fall in Italy and to fit the boundaries of the Italian coastline.
First, well filter the countries shapefile to have a shapefile of Italy
```{r}
Italy <- countries %>%
filter(NAME == "Italy")
plot(Italy)
plot(buffer_pkmng_points,add = T, col = "red")
```
---
# `rgeos` package example
Now, lets clip the the buffered points based on the Italian shapefile we created
```{r}
clip_pkmng_rgeos<- rgeos::gIntersection(Italy, buffer_pkmng_points, byid = TRUE, drop_lower_td = TRUE)
plot(Italy)
plot(clip_pkmng_rgeos,add = T, col = "red")
```
---
### **Intersect** - similar to the clip tool because the extents of input features defines the output. The only exception is that it preserves attributes from all the data sets that overlap each other in the output
Lets intersect the points with the country shapefile to find which points belong to which country
```{r}
intersect_pkmg<- raster::intersect(pkmng_points,countries)
```
---
### **Merge** - combines data sets that are the same data type
```{r}
#subset the countries
sub_count<- countries %>% filter(NAME==c("France","Italy"))
sub_count_sf<- sf::st_as_sf(sub_count)
merge_count<- sf::st_combine(sub_count_sf)
plot(merge_count)
```
---
### **Dissolve** - The Dissolve Tool unifies boundaries based on common attribute values
```{r}
diss_sub_count<- rgeos::gUnaryUnion(sub_count)
plot(diss_sub_count)
```
---
# Zonal statistics
Vector-Raster analysis. Allows you to calculate statistics from raster data for each feature in the vector data
**For Example:** We want to calculate the temperature in each point that a pokemon was collected
---
Download and plot the mean annual temperature from the web using the getData function
```{r fig.height=6, fig.width=10}
climate <- raster::getData('worldclim', var='bio', res=10)
annual_prec<- climate$bio12
plot(annual_prec)
```
---
Now lets calculate the mean temperature in each point
```{r}
poke_annual_prec<- raster::extract(annual_prec, pkmng_points,fun = mean,na.rm=T,df=T)
head(poke_annual_prec)
```
---
The data has no identification of which point it belongs to. To have that we need to merge it with the points identifiers
```{r}
poke_annual_prec_final<- cbind(pkmng_points@data$pokemonId,poke_annual_prec$bio12)
names(poke_annual_prec_final)<- c("pokemonId","annual_prec")
head(poke_annual_prec_final)
```
---
# Spatial autocorrelation
Spatial data are not independent - violating assumptions of statistical tests.
Spatial autocorrelation decreases with distance - but the distance depends on your data.
So how do we test for spatial autocorrelation and what can we do about it?
---
## Testing for spatial autocorrelation
We'll use the `pgirmess` package:
```{r sp ac, message=F}
pacman::p_load(pgirmess)
#first we'll take a small sample of the points data, because this test is very time-consuming
pkmng_points_dat_s <- sample_frac(pkmng_points_dat, 0.05)
pgi.cor <- correlog(coords = pkmng_points_dat_s[,c("longitude","latitude")], #longitude and latitude, in that order
z = pkmng_points_dat_s$temperature, #the variable we want to test for autocorrelation in
method = "Moran")
```
---
The X axis is distance, and the Y axis is our measure of autocorrelation. Values higher than 0 mean there is spatial autocorrelation in that distance class.
```{r sp ac 2, message=F}
plot(pgi.cor)
```
---
And this is what it looks like when there's no spatial autocorrelation:
```{r sp ac 3, message=F}
pgi.cor2 <- correlog(coords = pkmng_points_dat_s[,c("longitude","latitude")], #longitude and latitude, in that order
z = pkmng_points_dat_s$pokemonId, #the variable we want to test for autocorrelation in
method = "Moran")
plot(pgi.cor2)
```
---
## Dealing with spatial autocorrelation
There are various methods to "eliminate" spatial autocorrelation for hypothesis testing and statistical inference. All are flawed, each in their own way.
We won't talk about them here, but you can read more in the literature:
https://onlinelibrary.wiley.com/doi/10.1111/j.2007.0906-7590.05171.x
https://onlinelibrary.wiley.com/doi/10.1111/jbi.12953
---
# Spatial non-stationarity
Often, relationships between variables can shift in space. So global coefficients of models can be very poor at describing the actual behavior of your data.
For example, let's look for a relationship between wind speed and temperature in our Pokemon Go sightings data:
.small[
```{r gwr, message=F}
lm.model <- lm(windSpeed ~
temperature,
data = pkmng_points_dat_s)
summary(lm.model)
```
]
---
Now, we'll map the residuals from this model:
```{r gwr 2, message=F}
resids<-residuals(lm.model)
colours <- c("dark blue", "blue", "red", "dark red")
map.resids <- SpatialPointsDataFrame(data = data.frame(resids),
coords = cbind(pkmng_points_dat_s$longitude,
pkmng_points_dat_s$latitude)
)
```
---
```{r gwr 3, message=F}
spplot(map.resids, cuts=quantile(resids), col.regions=colours, cex=1)
```
Notice how the red and blue dots are clustered together, and not randomly distributed - showing a spatial pattern in the residuals.
---
We can use a method called GWR (Globally Weighted Regression) to account for this - basically, you calculate a regression coefficient for each datum, weighing the contribution of all data based on their spatial distance from your focal datum.
.small[
```{r gwr 4, message=F}
pacman::p_load(spgwr)
GWRbandwidth <- gwr.sel(windSpeed ~
temperature,
data = pkmng_points_dat_s,
coords = cbind(pkmng_points_dat_s$longitude,
pkmng_points_dat_s$latitude),
adapt = T)
```
]
---
```{r gwr 5, message=F}
gwr.model <- gwr(windSpeed ~
temperature,
data = pkmng_points_dat_s,
coords = cbind(pkmng_points_dat_s$longitude,
pkmng_points_dat_s$latitude),
adapt = GWRbandwidth,
hatmatrix = T,
se.fit = T
)
gwr.results <- as_tibble(gwr.model$SDF)
```
---
Now we can view the table of the results, and see the coefficient for each datum in our dataset - i.e. for each point locality. We can see they vary quite a bit.
.small[
```{r gwr 6, message=F}
gwr.results
```
]
---
Now let's plot this and see exactly what this non-stationarity looks like:
.small[
```{r gwr 7, message=F}
pkmng_points_dat_s$coef <- gwr.results$temperature
countries_outline <- fortify(countries,
region = "NAME")
mapWorld <- borders("world", colour=NA, fill="gray")
gwr.map <- ggplot(data = pkmng_points_dat_s,
aes(x = longitude,
y = latitude,
colour = coef)
) +
mapWorld +
geom_point() +
scale_colour_gradient2(low = "red",
mid = "white",
high = "blue",
midpoint = 0,
space = "rgb",
na.value = "grey50",
guide = "colourbar",
guide_legend(title="Coefs")) +
coord_equal()
```
]
---
And see how the regression coefficients change with space - blue for positive coefficients and red for negative ones:
```{r gwr 8, message=F,fig.width=14}
gwr.map
```
---
class: center, middle
# Different GIS tools out there
.center[
.img-med[
![](pres_pictures/Screenshot 2019-05-21 20.53.18.png)
]]
---
# TAKE HOME MESSAGES
* Be mindful of projections!
* Be mindful of spatial autocorrelation!
* Visualize spatial patterns!
* No one tool that fits them all