This project builds a plain-language score called the Affordability Stress Index (ASI). The score combines a few housing and income signals into one number so you can quickly compare Canadian metro areas. Higher ASI means more pressure on renters.
- Compare metros using a consistent, repeatable score.
- Highlight places where rent is rising faster than incomes or vacancies are tight.
- Support policy, planning, and research discussions with transparent inputs.
- It does not prove cause and effect. It shows patterns, not why they happen.
- It does not replace local analysis or on-the-ground context.
- It does not measure homeownership affordability (mortgages and prices are not included yet).
We combine two trusted public sources:
- Statistics Canada (StatCan): income, population, unemployment, and inflation data.
- Canada Mortgage and Housing Corporation (CMHC): rents, vacancy rates, and housing starts.
See data_sources.md for exact table IDs, links, and download notes.
We take three stress signals, scale them so they can be compared, and then average them:
- Rent-to-income pressure: how much of a typical household's income goes to rent.
- Rent growth: how quickly rent is rising year over year.
- Vacancy stress: low vacancy means tight supply, so we invert the vacancy rate.
A higher combined score means more housing stress for renters. The score is meant to be compared across metros in the same run.
├── data/
│ ├── raw/ # Raw data files (gitignored)
│ │ ├── .gitignore # Excludes raw data from version control
│ │ └── README.md # Download instructions for raw data
│ └── processed/ # Cleaned and processed data
│ ├── metros_master.csv # Master dataset of metropolitan areas
│ └── asi_scores.csv # Affordability Stress Index scores
├── notebooks/
│ ├── 00_ingest_plan.ipynb # Download inventory + ingestion checklist
│ ├── 01_ingest_statcan.ipynb # StatCan metro ingest
│ ├── 02_ingest_cmhc.ipynb # CMHC metro ingest
│ ├── 03_ingest_clean.ipynb # Data ingestion and cleaning
│ ├── 04_features_index.ipynb # Feature engineering and ASI calculation
│ ├── 05_pca.ipynb # PCA diagnostics + interpretability artifacts
│ └── 06_clustering.ipynb # KMeans + HDBSCAN clustering, personas, and checks
├── src/ # Source code (reusable functions)
├── report/
│ └── figures/ # Generated visualizations and figures
├── README.md # Project documentation
└── requirements.txt # Python dependencies
-
Clone the repository:
git clone https://github.com/ajharris/housing-affordability.git cd housing-affordability -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Download raw data:
- See
data/raw/README.mdfor download instructions. - Place downloaded files in
data/raw/.
- See
notebooks/00_ingest_plan.ipynb- list sources and check what is missing.notebooks/01_ingest_statcan.ipynb- bring in StatCan data.notebooks/02_ingest_cmhc.ipynb- bring in CMHC data.notebooks/03_ingest_clean.ipynb- clean and merge sources into a master file.notebooks/04_features_index.ipynb- build features and calculate ASI.notebooks/05_pca.ipynb- explain the features with PCA.notebooks/06_clustering.ipynb- group metros into similar profiles.
Start Jupyter and run each notebook:
jupyter notebookdata/processed/asi_scores.csv: the main ASI results per metro.report/figures/asi_top15.png: quick view of the 15 most stressed metros.report/figures/pca_explained_variance.png: how much each feature contributes.report/figures/pca_clusters_kmeans.pngandreport/figures/pca_clusters_hdbscan.png: grouping maps.report/figures/kmeans_k_sweep.png: shows how the chosen number of clusters was selected.report/final_report_1_4_26.md: short narrative write-up with the headline figures for sharing (as run on 4/1/26).
You can infer:
- Which metros are under relatively higher renter pressure right now.
- Whether that pressure is driven more by rent growth, low vacancy, or rent-to-income.
- Which metros look similar to each other based on those signals.
You should not infer:
- Why a metro is stressed (this needs local context and deeper analysis).
- That the score predicts future migration or health outcomes.
- That small differences in the score are meaningful on their own.
- Data releases lag by months, so the score should be refreshed each cycle.
- The current focus is renters, not owners.
- Some metros have partial data or special cases (see the missingness summary).
Next steps include adding ownership metrics, automating refreshes, and pairing the ASI with migration outcomes.
Contributions are welcome. Please open a pull request with a clear description.
This project is licensed under the MIT License. See the LICENSE file for details.