Scrape and enrich US political election data from Wikipedia and Ballotpedia into a normalized SQLite database.
---
config:
theme: neutral
---
erDiagram
elections ||--o{ candidates : has
candidates ||--o{ contact_links : has
elections {
int election_id PK
text state
text race_type
int year
text district
text election_stage
text wikipedia_url
}
candidates {
int candidate_id PK
int election_id FK
text party
text candidate_name
text wikipedia_url
text ballotpedia_url
real vote_pct
int is_winner
}
contact_links {
int contact_link_id PK
int candidate_id FK
text link_type
text url
text source
}
link_type values: campaign_site, campaign_site_archived, campaign_facebook, campaign_x, campaign_instagram, personal_website, personal_facebook, personal_linkedin
source values: wikipedia, ballotpedia, web_search, wayback, csv_import
# Install
uv sync
# Scrape 2024 House races and enrich with contact info
python -m camplinks --year 2024 --race house
# Scrape 2024 Senate races
python -m camplinks --year 2024 --race senate
# Scrape 2025 gubernatorial races
python -m camplinks --year 2025 --race governor
# Scrape 2025 mayoral elections (Wikipedia, 62+ cities)
python -m camplinks --year 2025 --race municipal
# Scrape 2023-2026 mayoral elections (Ballotpedia, top-100 cities)
python -m camplinks --year 2023 --race bp_municipal --stage scrape
# Scrape gubernatorial elections from Ballotpedia (all 50 states)
python -m camplinks --year 2026 --race bp_governor --stage scrape
# Run all registered race types
python -m camplinks --year 2024 --race all| Key | Description |
|---|---|
house |
US House of Representatives |
senate |
US Senate |
governor |
Governor (statewide) |
attorney_general |
Attorney General (statewide) |
special_house |
House special elections |
state_leg |
State legislature (regular sessions) |
state_leg_special |
State legislature special elections |
municipal |
Mayoral elections (Wikipedia) |
bp_municipal |
Mayoral elections (Ballotpedia, top-100 cities) |
bp_governor |
Gubernatorial elections (Ballotpedia, all states) |
judicial |
State Supreme Court elections |
all |
Run all of the above |
The database is written to camplinks.db by default. Override with --db path/to/db.
The pipeline runs four stages in order. Each stage is idempotent (safe to re-run).
| Stage | What it does | Data source |
|---|---|---|
| scrape | Fetch election results from Wikipedia | Wikipedia state election pages |
| enrich | Extract campaign websites from candidate Wikipedia pages | Wikipedia candidate infoboxes |
| search | Find missing contact info via Ballotpedia and web search | Ballotpedia + DuckDuckGo |
| validate | Check campaign site accessibility, archive dead links | Wayback Machine API |
Run individual stages with --stage:
python -m camplinks --year 2024 --race house --stage scrape
python -m camplinks --year 2024 --race house --stage enrich
python -m camplinks --year 2024 --race house --stage search
python -m camplinks --year 2024 --race house --stage validateimport sqlite3
conn = sqlite3.connect("camplinks.db")
conn.row_factory = sqlite3.Row
# All 2024 House winners with their campaign sites
rows = conn.execute("""
SELECT c.candidate_name, c.party, e.state, e.district, cl.url
FROM candidates c
JOIN elections e ON c.election_id = e.election_id
LEFT JOIN contact_links cl ON c.candidate_id = cl.candidate_id
AND cl.link_type = 'campaign_site'
WHERE c.is_winner = 1 AND e.year = 2024 AND e.race_type = 'US House'
ORDER BY e.state, e.district
""").fetchall()
for r in rows:
print(f"{r['state']}-{r['district']}: {r['candidate_name']} ({r['party']}) - {r['url']}")Or with Polars:
import polars as pl
df = pl.read_database(
"SELECT * FROM candidates c JOIN elections e ON c.election_id = e.election_id",
"sqlite:///camplinks.db",
)See USAGE.md for a walkthrough with examples.
If you have an existing house_races_2024.csv from the old wide-format pipeline:
python convert_to_tidy.py --csv house_races_2024.csv --db camplinks.dbuv sync
uv run pytest tests/
uv run mypy camplinks/
uv run ruff check .See CONTRIBUTING.md for setup instructions and guidelines.