Skip to content

WeberLab-UW/campLinks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

camplinks

Scrape and enrich US political election data from Wikipedia and Ballotpedia into a normalized SQLite database.

Data Model

---
config:
  theme: neutral
---
erDiagram
    elections ||--o{ candidates : has
    candidates ||--o{ contact_links : has

    elections {
        int election_id PK
        text state
        text race_type
        int year
        text district
        text election_stage
        text wikipedia_url
    }

    candidates {
        int candidate_id PK
        int election_id FK
        text party
        text candidate_name
        text wikipedia_url
        text ballotpedia_url
        real vote_pct
        int is_winner
    }

    contact_links {
        int contact_link_id PK
        int candidate_id FK
        text link_type
        text url
        text source
    }
Loading

link_type values: campaign_site, campaign_site_archived, campaign_facebook, campaign_x, campaign_instagram, personal_website, personal_facebook, personal_linkedin

source values: wikipedia, ballotpedia, web_search, wayback, csv_import

Quickstart

# Install
uv sync

# Scrape 2024 House races and enrich with contact info
python -m camplinks --year 2024 --race house

# Scrape 2024 Senate races
python -m camplinks --year 2024 --race senate

# Scrape 2025 gubernatorial races
python -m camplinks --year 2025 --race governor

# Scrape 2025 mayoral elections (Wikipedia, 62+ cities)
python -m camplinks --year 2025 --race municipal

# Scrape 2023-2026 mayoral elections (Ballotpedia, top-100 cities)
python -m camplinks --year 2023 --race bp_municipal --stage scrape

# Scrape gubernatorial elections from Ballotpedia (all 50 states)
python -m camplinks --year 2026 --race bp_governor --stage scrape

# Run all registered race types
python -m camplinks --year 2024 --race all

Available --race keys

Key Description
house US House of Representatives
senate US Senate
governor Governor (statewide)
attorney_general Attorney General (statewide)
special_house House special elections
state_leg State legislature (regular sessions)
state_leg_special State legislature special elections
municipal Mayoral elections (Wikipedia)
bp_municipal Mayoral elections (Ballotpedia, top-100 cities)
bp_governor Gubernatorial elections (Ballotpedia, all states)
judicial State Supreme Court elections
all Run all of the above

The database is written to camplinks.db by default. Override with --db path/to/db.

Pipeline Stages

The pipeline runs four stages in order. Each stage is idempotent (safe to re-run).

Stage What it does Data source
scrape Fetch election results from Wikipedia Wikipedia state election pages
enrich Extract campaign websites from candidate Wikipedia pages Wikipedia candidate infoboxes
search Find missing contact info via Ballotpedia and web search Ballotpedia + DuckDuckGo
validate Check campaign site accessibility, archive dead links Wayback Machine API

Run individual stages with --stage:

python -m camplinks --year 2024 --race house --stage scrape
python -m camplinks --year 2024 --race house --stage enrich
python -m camplinks --year 2024 --race house --stage search
python -m camplinks --year 2024 --race house --stage validate

Querying the Database

import sqlite3

conn = sqlite3.connect("camplinks.db")
conn.row_factory = sqlite3.Row

# All 2024 House winners with their campaign sites
rows = conn.execute("""
    SELECT c.candidate_name, c.party, e.state, e.district, cl.url
    FROM candidates c
    JOIN elections e ON c.election_id = e.election_id
    LEFT JOIN contact_links cl ON c.candidate_id = cl.candidate_id
        AND cl.link_type = 'campaign_site'
    WHERE c.is_winner = 1 AND e.year = 2024 AND e.race_type = 'US House'
    ORDER BY e.state, e.district
""").fetchall()

for r in rows:
    print(f"{r['state']}-{r['district']}: {r['candidate_name']} ({r['party']}) - {r['url']}")

Or with Polars:

import polars as pl

df = pl.read_database(
    "SELECT * FROM candidates c JOIN elections e ON c.election_id = e.election_id",
    "sqlite:///camplinks.db",
)

Adding a New Race Type

See USAGE.md for a walkthrough with examples.

Migrating from Legacy CSV

If you have an existing house_races_2024.csv from the old wide-format pipeline:

python convert_to_tidy.py --csv house_races_2024.csv --db camplinks.db

Development

uv sync
uv run pytest tests/
uv run mypy camplinks/
uv run ruff check .

Contributing

See CONTRIBUTING.md for setup instructions and guidelines.

License

MIT

About

Collect campaign links across federal, state, and local races

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors