-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChicagoFinalProject.Rmd
243 lines (104 loc) · 6.86 KB
/
ChicagoFinalProject.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
title: "ChicagoFinalProject"
output: html_document
date: "2023-04-10"
---
```{r}
# install packages
library(devtools)
library(usethis)
library(rcrimeanalysis)
```
```{r}
data("crimes")
dim(crimes)
head(crimes)
```
```{r}
# Geocoding is an essential task in crime analysis/policing and is commonly needed since location data is collected upon report (call for service or statement/affidavit) as an address from the caller or person involved. Once the data is transformed into a geospatial coordinate (latitude, longitude), analysis can take place.
```
```{r}
library(ggmap)
ggmap::register_google("AIzaSyDAzfQTKegUkuRuoUIk3u7ka5YaZTnowVo")
addresses <- c("The White House, Washington DC", "Capitol Building, Washington DC")
geocode_address(addresses)
```
```{r}
# kernel density estimation
narcotics <- subset(crimes, crimes$primary_type == "NARCOTICS")
# Plot without Points
p1 <- narcotics %>% kde_map(pts = FALSE)
# Plot with Incident Points
p2 <- narcotics %>% kde_map()
leafsync::sync(p1,p2)
```
```{r}
# The above maps illustrate hot spots and concentrations of criminal activities by place. The pts parameter was included to facilitate the visualization of the data. The population of the pop-up boxes from the data is automated, given that the data is structured as is the example crimes data.
```
```{r}
# The id_repeat() function identifies crime incidents which occur at the same location. The identification of repeat crimes can be vital to the linkage of crime incidents and assessment of places.
# This information could be used to investigate whether this is a location where the narcotics are being dealt, or if there is another rationale for these incidents.
narco_repeats <- id_repeat(narcotics)
narco_repeats[2]
```
```{r}
# The following code detects 168 repeat crime series. The second detected repeat series is printed which shows that 4 different possession incidents occurred on the same sidewalk throughout 2019.
```
```{r}
# Temporal Functionality
# The ts_daily_decomp() and ts_month_decomp() functions perform time series decomposition of crime incident data collected over time for pattern detection. Each function is optimized for a certain frequency interval of data in the time series.
# The decomposition functions transform the raw crime data into a time series, perform locally weighted regression for smoothing, and plot the resultant decomposed components of the time series into identified seasonal, trend and irregular components.
narco_ts <- ts_month_decomp(narcotics, start = (2017))
plot(narco_ts)
```
```{r}
# The different components can be useful for both the development and evaluation of strategic action. For example, seasonality can be used for administrative planning through personnel and resource allocation. Changepoints in the trend component can be used to identify changes in the crime patterns. For example, if a policy change took place mid 2017, this change may have stimulated narcotics activity or have increased detection and closure rates of narcotics incidents.
```
```{r}
# Crime Forecasting
thefts <- subset(crimes, crimes$primary_type == "THEFT")
ts_forecast(thefts, start = c(2017, 1, 1))
```
```{r}
# The practice of predictive modeling is the process of developing a framework or model which enables the understanding and quantification of the prediction accuracy on future, yet-to-be-seen data. This can be useful in a forward-looking sense to gain an understanding of what future criminal activity could look like. The ts_forecast() function uses the crime data to predict the future crime trend and daily frequency with different confidence levels for the next 365 days.
```
```{r}
# Spatio-Temporal Functionality
# The kde_int_comp() function extends the traditional heat map (as seen with kde_map()) to perform a comparsion across time intervals. Combining the distribution of incidents with time intervals effectively visualizes the time series data in space, which can be very useful in identifying displacements in criminal activity.
# The kde_int_comp() function was used to evaluate the narcotics incidents from the beginning of 2017 and 2018. Returned is a net difference raster and then 3 leaflet (Interval 1, Interval 2, Net Difference) widgets for detailed comparison.
interval <- kde_int_comp(narcotics,
start1="1/1/2017",
end1="3/1/2017",
start2="1/1/2018",
end2="3/1/2018")
```
```{r}
# Traditionally, space and time have been treated as isolated entities in crime analysis. Combination of these data into a spatio-temporal crime analysis workflow can facilitate a more robust analysis because an incident occurs as an interaction between persons or objects within both space and time domains.
```
```{r}
interval
```
```{r}
# The kernel density bandwidth is self calculated so that the levels can be kept consistent. The net difference plot is in terms of levels from the above interval maps. Note the multiple plots are synced such that a translation on one image also occurs on the other images. The interval map headers populate based on the user specified intervals. Since the purpose of the function is to compare the general trend across intervals, there is no option which can specify the plotting of crime incidents as in the kde_map() function.
```
```{r}
# A contemporary method for near repeat analysis is provided. The near_repeat_analysis() function searches crime data for spatio-temporal clustering of crime. The function identifies patterns under specified thresholds (distance and time) and determines whether incident x is within this distance and time for all other incidents in the dataset. An adjacency matrix method is employed to link these incidents in igraph networks.
```
```{r}
crime_samp <- head(crimes, n = 1000)
out <- near_repeat_analysis(data = crime_samp,
tz = "America/Chicago",
epsg = "32616")
igraph::plot.igraph(out[[26]])
```
```{r}
# Each vertex of the graph is a crime incident and each edge represents the linkage of these incidents within the specified spatial and temporal parameters. The linkage networks provide potential crime networks discovered under the near repeat premise for intelligence driven police investigation and intervention.
```
```{r}
# If the thresholds for near repeat analysis are unknown to the analyst, the near_repeat_eval() function recommends optimal parameters for an area through three dimensional modelling of the spatio-temporal clustering present in the data. The function returns a 2x2 table with the optimal distance and time threshold parameters. This function is computationally intensive, therefore, a progress bar is integrated for the user to track completion percentage.
```
```{r}
near_repeat_eval(data = crime_samp,
tz = "America/Chicago",
epsg = "32616")
```