txcf-elections/01-combine.qmd at main · utdata/txcf-elections · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: "Combine results"
author: "Media Innovation Group"
code-fold: true
code-summary: "Expand this to see code"
---

This notebook compiles Voting Tabulation Districts election returns from different years, processes them into state-wide results, then filters them for State Rep.

Originally created by MIG data fellow Isabella Zeff, it has since been refactored by Christian McDonald.

The data some from the Texas Legislative Council's [data portal](https://data.capitol.texas.gov/dataset/comprehensive-election-datasets-compressed-format). The [documentation is available here](https://data.capitol.texas.gov/dataset/vtds). We used the 2024 General VTDs Election Data CSV version that includes 2012 - 2024 election data reported by 2024 primary election VTDs.

## Setup

```{r}
#| label: setup
#| message: FALSE

library(tidyverse)
library(janitor)
```

## Functions

Function to create totals from the county-by-county results

```{r}
fun_totals_all <- function(.data){
  .data |>
    group_by(year, election, office, name, party, incumbent) |>
    summarize(candvotes = sum(votes), .groups = "drop") |>
    arrange(year, office, candvotes |> desc())
}
```

## Import and combine

```{r}
all_files <- list.files(
  "data-original/vdt-returns",
  pattern = ".csv",
  full.names = TRUE)

# all_files
```

### Check if we have all files

This makes sure that we have the main results for the time period that we are interested in (very specific to the Texas House spending analysis.) This does not take special elections into account. At one point we were missing runoff results for 2024.

```{r}
main_races <- c(
  "Democratic_Primary",
  "Democratic_Runoff",
  "Republican_Primary",
  "Republican_Runoff",
  "General"
)

all_files |>
  as_tibble() |>
  mutate(
    value = str_remove(value, "data-original/vdt-returns/"),
    value = str_remove(value, "_Election_Returns.csv"),
    year = str_sub(value, 1, 4) |> as.numeric(),
    election = str_remove(value, "^\\d{4}_")
  ) |>
  filter(year >= 2016) |>
  filter(election %in% main_races) |>
  count(year, sort = T)
```

### Combine the files

```{r}
#| message: false
#| warning: false

all_raw <- all_files |>
  set_names(basename) |>
  map(\(x) read_csv(x, col_types = cols(.default = col_character()))) |>
  list_rbind(names_to = "source") |>
  clean_names()
```

## Clean source

Here we use the name of the file to find the election year and name. We also turn votes into a number.

```{r}
all_returns <- all_raw |>
  mutate(
    year = str_sub(source, 1, 4),
    election = str_sub(source, 6, -22) |> str_replace_all("_", " "),
    .before = county
  ) |>
  mutate(votes = votes |> as.numeric()) |>
  select(-source)

all_returns |> head()
```

## Tally votes

```{r}
all_totals <- all_returns |>
  fun_totals_all() |>
  arrange(year, election, office, candvotes |> desc())

all_totals |> head()
```

### Filter for State Reps

Find the state reps in the data.

```{r}
rep_totals <- all_totals |>
  filter(str_detect(office, "State Rep")) |>
  mutate(district = parse_number(office), .after = office)
```

## Export results

I am exporting just the rep results at this point..

```{r}
all_totals |>
  write_rds("data-processed/01-all-totals.rds")

rep_totals |>
  write_rds("data-processed/01-house-totals.rds")
```


## Checking unopposed races

This is just to confirm that the reason (or at least a reason) why we don't have results from every house district is because of unopposed races.

### 2024 General

Of all the Texas House results we have, here are the districts for the 2024 general election.

```{r}
rep_totals |>
  distinct(year, election, district) |>
  arrange(year, election, district) |>
  filter(year == 2024, election == "General")
```

We are missing 1, 3, 9, 11 for starters.

If we look at results for this election on [ballotpedia](https://ballotpedia.org/Texas_House_of_Representatives_elections,_2024#General_election) we can see those same races did not have more than one candidate on the ballot. Even a race with a write-in made it (Dist 5.)

### Republican primary

And then if we look at the same for the Republican Primary:

```{r}
rep_totals |>
  distinct(year, election, district) |>
  arrange(year, election, district) |>
  filter(year == 2024, election == "Republican Primary")
```

If we compare the list above with [ballotpedia's primary election list](https://ballotpedia.org/Texas_House_of_Representatives_elections,_2024#Primary), we see the first district missing is 3, which tracks with Cecil Bell Jr. being the only candidate. District 6 is also unopposed, etc.

### Democratic Primary

For the Dems, you can look at that same list and see there was not a valid primary race for the first 18 districts. Those all had unopposed or zero candidates and the primary was canceled.

```{r}
rep_totals |>
  distinct(year, election, district) |>
  arrange(year, election, district) |>
  filter(year == 2024, election == "Democratic Primary")
```