You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this tutorial is to demonstrate joining individual/payer level health information to environmental data. In this tutorial, individual-level health data from the Healthcare Utilization Project (HCUP) state emergency department database (SEDD) will be joined to environmental data from the `amadeus`[@r-amadeus] package. The SEDD provides individual-level emergency department encounter data with variables including the month of the encounter and the zip code of the patient.
32
32
33
-
As an illustrative example, emergency department (ED) vists for rheumatoid arthritis (RA) were extracted from the SEDD for the state of Utah from 2016 - 2020. RA is a chronic autoimmune disease in which the immune system attacks the joints. This can cause symptoms including swelling, pain, and joint stiffness. ICD-10 codes (M05.X, M06.X) for the primary discharge diagnosis variable (I10_DX1) in the SEDD database were used to extract RA ED encounters.
33
+
As an example, emergency department (ED) vists for rheumatoid arthritis (RA) were extracted from the SEDD for the state of Utah from 2016 - 2020. RA is a chronic autoimmune disease in which the immune system attacks the joints. This can cause symptoms including swelling, pain, and joint stiffness. ICD-10 codes (M05.X, M06.X) for the primary discharge diagnosis variable (*I10_DX1*) in the SEDD database were used to extract RA ED encounters.
34
34
35
35
### Outline
36
36
@@ -53,7 +53,7 @@ The exploratory analyses in this tutorial are for educational purposes only.
In this step, the analyst will read in individual-level health data that is spatially referenced (i.e., contains information on the location of the patient such as county or zip code of residence). For the purposes of this tutorial, the individual-level health data are RA ED encounters for the state of Utah from 2016 - 2020 that are spatially referenced at the zip code level.
Data cleaning needs to be completed before joining the environmental and health data. In the code chunk below, we create variables for month and year from the environmental data.
104
-
This data cleaning steps needs to be completed in order to join the environmental data to the health data.
103
+
Data cleaning must be completed before joining the environmental and health data. In the code chunk below, we create variables for month and year from the environmental data.
104
+
This data cleaning steps is necessary in order to join the environmental data to the health data.
105
105
106
-
```{r, echo = TRUE}
106
+
```{r, echo = TRUE, eval = FALSE}
107
107
# create a column for month and year based on the source_file variable
108
108
env_dat$year <- str_sub(env_dat$source_file, start = 5, end = 8)
109
109
@@ -118,7 +118,7 @@ In this example, we will focus on monthly mean, daily maximum temperature data f
118
118
119
119
We will also explore the notion of delayed effects - environmental exposures from the recent past may be associated with health outcomes. To reflect this, we will calculate a 2-month rolling mean for our environmental variables.
# add a leading zero to month column in patient data
140
140
ra_dat$AMONTH <- str_pad(ra_dat$AMONTH, width = 2, side = "left", pad = "0")
141
141
142
142
```
143
143
144
144
We will join the environmental data to the health data using both temporal and spatial information that is common between the two datasets. For our health data, we have information on the month, year and zip code of the ED visits. For our environmental data, we have information on the month, year and zip code for our temperature and surface pressure variables. Our join, therefore, will be based on year, month, and zip code.
145
145
146
-
```{r echo=TRUE}
146
+
```{r echo=TRUE, eval = FALSE}
147
147
# Join to environmental data based on month, year and zip
148
148
res_df <- ra_dat %>%
149
149
left_join(env_cleaned,
@@ -155,25 +155,13 @@ res_df <- ra_dat %>%
155
155
156
156
## Visualizing data
157
157
158
-
What do the first few rows of our combined dataset look like?
Now lets' zoom in on visualizing changes in environmental variables over time for a select group of RA patients who had multiple ED encounters over the study period.
191
+
Now lets' zoom in to visualize changes in environmental variables over time for a select group of RA patients who had multiple ED encounters over the study period.
202
192
Let's calculate the number of visits each person had and restrict the data to patients who had 10+ RA ED visits from 2016 - 2020.
203
193
204
-
```{r, echo = TRUE}
194
+
```{r, echo = TRUE, eval = FALSE}
205
195
# Count number of encounters over time
206
196
res_df <- res_df %>%
207
197
group_by(person_id) %>%
208
198
mutate(n_visits = n())
209
199
```
210
200
211
201
212
-
```{r, echo = TRUE}
202
+
```{r, echo = TRUE, eval = FALSE}
213
203
res_long <- res_df %>%
214
204
# Filter to only individuals who had 10+ encounters for visualization purposes
215
205
dplyr::filter(n_visits > 9) %>%
@@ -220,7 +210,7 @@ res_long <- res_df %>%
220
210
221
211
#### Spaghetti plot (temperature)
222
212
223
-
```{r}
213
+
```{r, eval = FALSE}
224
214
225
215
# 2-month rolling tmax
226
216
ggplot(data = res_long, aes(x = pseudo_date, y = roll_tmmx,

231
+
240
232
Here we can observe the change in the 2-month rolling mean monthly max temperature across RA ED visits for 3 patients. Each encounter is marked by a point on the plot.
241
233
242
234
#### Spaghetti plot (surface pressure)
243
235
244
236
This same plot can be created to examine trends in surface pressure for this subset of patients.
245
237
246
-
```{r, warning = FALSE}
238
+
```{r, warning = FALSE, eval = FALSE}
247
239
ggplot(data = res_long, aes(x = pseudo_date, y = ps,
248
240
color = person_id, group = person_id)) +
249
241
geom_line(linewidth = 1) +
250
242
geom_point(size = 2) +
251
243
labs(x = "Year", y = "Surface pressure (Pa)", color = "Person ID",
252
244
caption = "Each point represents an ED visit for a given patient.
253
-
\nPerson ID 228 had missing surface pressure data for 1
245
+
\nPerson ID 228 had missing surface pressure data for\n1
254
246
encounter-month leading to discontinuity in line chart.") +
255
247
theme_minimal() +
256
248
theme(axis.title = element_text(size = 12),
257
249
axis.text = element_text(size = 11),
258
250
legend.text = element_text(size = 11))
259
251
```
260
252
253
+

0 commit comments