Skip to content

Commit 2d4924b

Browse files
authored
Merge pull request #91 from NIEHS/ar-indvdata-0929
Added static images
2 parents cd876c1 + fb27588 commit 2d4924b

File tree

5 files changed

+24
-31
lines changed

5 files changed

+24
-31
lines changed

chapters/06-01-hcup-individual-usecase.Rmd

Lines changed: 24 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ output:
2020

2121
### "Linking Individual-level Health Data to Environmental Variables: a Case Study of Rheumatoid Arthritis in Utah"
2222

23-
**Date Modified**: September 29, 2025
23+
**Date Modified**: October 1, 2025
2424

2525
**Author(s)**: Austin Rau [![author-lpc](images/orcid.png){width=10}](https://orcid.org/0000-0002-5818-4864)
2626

@@ -30,7 +30,7 @@ output:
3030

3131
The purpose of this tutorial is to demonstrate joining individual/payer level health information to environmental data. In this tutorial, individual-level health data from the Healthcare Utilization Project (HCUP) state emergency department database (SEDD) will be joined to environmental data from the `amadeus` [@r-amadeus] package. The SEDD provides individual-level emergency department encounter data with variables including the month of the encounter and the zip code of the patient.
3232

33-
As an illustrative example, emergency department (ED) vists for rheumatoid arthritis (RA) were extracted from the SEDD for the state of Utah from 2016 - 2020. RA is a chronic autoimmune disease in which the immune system attacks the joints. This can cause symptoms including swelling, pain, and joint stiffness. ICD-10 codes (M05.X, M06.X) for the primary discharge diagnosis variable (I10_DX1) in the SEDD database were used to extract RA ED encounters.
33+
As an example, emergency department (ED) vists for rheumatoid arthritis (RA) were extracted from the SEDD for the state of Utah from 2016 - 2020. RA is a chronic autoimmune disease in which the immune system attacks the joints. This can cause symptoms including swelling, pain, and joint stiffness. ICD-10 codes (M05.X, M06.X) for the primary discharge diagnosis variable (*I10_DX1*) in the SEDD database were used to extract RA ED encounters.
3434

3535
### Outline
3636

@@ -53,7 +53,7 @@ The exploratory analyses in this tutorial are for educational purposes only.
5353

5454
### Load R Packages
5555

56-
```{r, echo = TRUE, warning = FALSE, message = FALSE}
56+
```{r, echo = TRUE, warning = FALSE, message = FALSE, eval = FALSE}
5757
# load libraries
5858
library(tidyverse)
5959
library(slider)
@@ -77,15 +77,15 @@ wd <- "/ddn/gs1/home/rauat/PCOR_bookdown_tools/"
7777

7878
In this step, the analyst will read in individual-level health data that is spatially referenced (i.e., contains information on the location of the patient such as county or zip code of residence). For the purposes of this tutorial, the individual-level health data are RA ED encounters for the state of Utah from 2016 - 2020 that are spatially referenced at the zip code level.
7979

80-
```{r, echo = TRUE}
80+
```{r, echo = TRUE, eval = FALSE}
8181
8282
ra_dat <- readRDS(paste0(wd, "dataset/ra_utah_px.rds"))
8383
8484
```
8585

8686
Next, we will load pre-processed environmental data from the `amadeus` package at the zip code level.
8787

88-
```{r, echo = TRUE, message = FALSE}
88+
```{r, echo = TRUE, message = FALSE, eval = FALSE}
8989
# read in entire folder of environmental data
9090
# at zip code level as single dataframe
9191
env_dat <- list.files(paste0(wd, "dataset/env_data"),
@@ -100,10 +100,10 @@ env_dat <- env_dat %>%
100100
```
101101

102102

103-
Data cleaning needs to be completed before joining the environmental and health data. In the code chunk below, we create variables for month and year from the environmental data.
104-
This data cleaning steps needs to be completed in order to join the environmental data to the health data.
103+
Data cleaning must be completed before joining the environmental and health data. In the code chunk below, we create variables for month and year from the environmental data.
104+
This data cleaning steps is necessary in order to join the environmental data to the health data.
105105

106-
```{r, echo = TRUE}
106+
```{r, echo = TRUE, eval = FALSE}
107107
# create a column for month and year based on the source_file variable
108108
env_dat$year <- str_sub(env_dat$source_file, start = 5, end = 8)
109109
@@ -118,7 +118,7 @@ In this example, we will focus on monthly mean, daily maximum temperature data f
118118

119119
We will also explore the notion of delayed effects - environmental exposures from the recent past may be associated with health outcomes. To reflect this, we will calculate a 2-month rolling mean for our environmental variables.
120120

121-
```{r, echo = TRUE}
121+
```{r, echo = TRUE, eval = FALSE}
122122
123123
env_cleaned <- env_dat %>%
124124
select(geoid, tmmx, ps, pseudo_date, month, year) %>%
@@ -135,15 +135,15 @@ env_cleaned <- env_dat %>%
135135

136136
## Joining data
137137

138-
```{r, echo = FALSE}
138+
```{r, echo = FALSE, eval = FALSE}
139139
# add a leading zero to month column in patient data
140140
ra_dat$AMONTH <- str_pad(ra_dat$AMONTH, width = 2, side = "left", pad = "0")
141141
142142
```
143143

144144
We will join the environmental data to the health data using both temporal and spatial information that is common between the two datasets. For our health data, we have information on the month, year and zip code of the ED visits. For our environmental data, we have information on the month, year and zip code for our temperature and surface pressure variables. Our join, therefore, will be based on year, month, and zip code.
145145

146-
```{r echo=TRUE}
146+
```{r echo=TRUE, eval = FALSE}
147147
# Join to environmental data based on month, year and zip
148148
res_df <- ra_dat %>%
149149
left_join(env_cleaned,
@@ -155,25 +155,13 @@ res_df <- ra_dat %>%
155155

156156
## Visualizing data
157157

158-
What do the first few rows of our combined dataset look like?
159-
160-
```{r, message = FALSE}
161-
res_df %>%
162-
select(person_id, AMONTH, AYEAR, contains("tmmx"), contains("ps")) %>%
163-
head(n = 5)
164-
165-
```
166-
167-
We can see that each person in our dataset has associated environmental data.
168-
169158
### Histograms
170159

171160
Using histograms, we will explore the distribution of the environmental data for all RA ED visits in the dataset.
172161

173-
174162
#### Histogram (temperature)
175163

176-
```{r, echo = TRUE, warning = FALSE, message = FALSE}
164+
```{r, echo = TRUE, warning = FALSE, message = FALSE, eval = FALSE}
177165
ggplot(data = res_df, aes(x = roll_tmmx)) +
178166
geom_histogram() +
179167
theme_minimal() +
@@ -183,10 +171,11 @@ ggplot(data = res_df, aes(x = roll_tmmx)) +
183171
axis.text = element_text(size = 11))
184172
```
185173

174+
![Temperature histogram](/ddn/gs1/home/rauat/PCOR_bookdown_tools/images/hcup_ra_usecase/temp_hist.png)
186175

187176
#### Histogram (surface pressure)
188177

189-
```{r, echo = TRUE, message = FALSE, warning = FALSE}
178+
```{r, echo = TRUE, message = FALSE, warning = FALSE, eval = FALSE}
190179
ggplot(data = res_df, aes(x = ps)) +
191180
geom_histogram() +
192181
theme_minimal() +
@@ -195,21 +184,22 @@ ggplot(data = res_df, aes(x = ps)) +
195184
axis.text = element_text(size = 11))
196185
```
197186

187+
![Surface pressure histogram](/ddn/gs1/home/rauat/PCOR_bookdown_tools/images/hcup_ra_usecase/pressure_hist.png)
198188

199189
### Spaghetti plots
200190

201-
Now lets' zoom in on visualizing changes in environmental variables over time for a select group of RA patients who had multiple ED encounters over the study period.
191+
Now lets' zoom in to visualize changes in environmental variables over time for a select group of RA patients who had multiple ED encounters over the study period.
202192
Let's calculate the number of visits each person had and restrict the data to patients who had 10+ RA ED visits from 2016 - 2020.
203193

204-
```{r, echo = TRUE}
194+
```{r, echo = TRUE, eval = FALSE}
205195
# Count number of encounters over time
206196
res_df <- res_df %>%
207197
group_by(person_id) %>%
208198
mutate(n_visits = n())
209199
```
210200

211201

212-
```{r, echo = TRUE}
202+
```{r, echo = TRUE, eval = FALSE}
213203
res_long <- res_df %>%
214204
# Filter to only individuals who had 10+ encounters for visualization purposes
215205
dplyr::filter(n_visits > 9) %>%
@@ -220,7 +210,7 @@ res_long <- res_df %>%
220210

221211
#### Spaghetti plot (temperature)
222212

223-
```{r}
213+
```{r, eval = FALSE}
224214
225215
# 2-month rolling tmax
226216
ggplot(data = res_long, aes(x = pseudo_date, y = roll_tmmx,
@@ -237,27 +227,30 @@ ggplot(data = res_long, aes(x = pseudo_date, y = roll_tmmx,
237227
legend.title = element_text(size = 12))
238228
```
239229

230+
![Temperature line chart](/ddn/gs1/home/rauat/PCOR_bookdown_tools/images/hcup_ra_usecase/temp_line.png)
231+
240232
Here we can observe the change in the 2-month rolling mean monthly max temperature across RA ED visits for 3 patients. Each encounter is marked by a point on the plot.
241233

242234
#### Spaghetti plot (surface pressure)
243235

244236
This same plot can be created to examine trends in surface pressure for this subset of patients.
245237

246-
```{r, warning = FALSE}
238+
```{r, warning = FALSE, eval = FALSE}
247239
ggplot(data = res_long, aes(x = pseudo_date, y = ps,
248240
color = person_id, group = person_id)) +
249241
geom_line(linewidth = 1) +
250242
geom_point(size = 2) +
251243
labs(x = "Year", y = "Surface pressure (Pa)", color = "Person ID",
252244
caption = "Each point represents an ED visit for a given patient.
253-
\nPerson ID 228 had missing surface pressure data for 1
245+
\nPerson ID 228 had missing surface pressure data for\n1
254246
encounter-month leading to discontinuity in line chart.") +
255247
theme_minimal() +
256248
theme(axis.title = element_text(size = 12),
257249
axis.text = element_text(size = 11),
258250
legend.text = element_text(size = 11))
259251
```
260252

253+
![Surface pressure line chart](/ddn/gs1/home/rauat/PCOR_bookdown_tools/images/hcup_ra_usecase/pressure_line.png)
261254

262255
## Concluding Remarks
263256

66.3 KB
Loading
197 KB
Loading
74.3 KB
Loading
294 KB
Loading

0 commit comments

Comments
 (0)