Skip to content

GFiaMon/data-wrangling-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏠📈 Berlin Rent Index vs. Inflation Analysis

Project Banner

🔍 Project Overview

This project statistically analyzes the relationship between Berlin's official rent index (Mietspiegel) and Germany's inflation rate from 2005-2023. The Mietspiegel surveys and reports Kaltmiete (base rent excluding utilities), serving as a legal benchmark for fair rents in Berlin.

Hypothesis Test: Does rent change mirror inflation?

Hypothesis

  • H₀: Biennial rent index change = Inflation rate (μ_rent = μ_inflation)

    The biennial % change in Berlin's rent index = Germany's inflation rate
    (Mathematically: μ_rent = μ_inflation)

  • H₁: Biennial rent index change ≠ Inflation rate (μ_rent ≠ μ_inflation)

    The biennial % change in Berlin's rent index ≠ Germany's inflation rate
    (Mathematically: μ_rent ≠ μ_inflation)

Core Question:
"Do biennial changes in Berlin's base rent systematically differ from inflation changes?"

📋 Key Deliverables

File Description
analysis.ipynb Main analysis notebook
src/data_cleaning.py Data processing functions
data/processed/ Cleaned datasets
visuals/ Generated plots and charts

📊 Key Findings

1. Long-Term Rent vs. Inflation Trend

Trend Visualization

  • Observation: Rent consistently increased while inflation showed volatility
  • Divergence Peak: 2021-2023 inflation spike vs. moderate rent growth

2. Biennial Change Comparison

Point Plot Version

Point Plot

Bar Chart Version

Bar Chart

  • Key Insight: Rent exceeded inflation in 7/9 periods
  • Notable Divergence: 2021-2023 (rent +5.4% vs inflation +13.2%)

3. Statistical Power Analysis

Single Dataset

Power Analysis 1

Expanded Dataset

Power Analysis 2

  • Critical Finding:
    • Current power (9 periods): 16.6%
    • Required periods for 80% power: 59

What Statistical Power Means:

Statistical power measures our ability to detect a real difference when it exists. With only 9 data points:

  • Our current power is 16.6% - meaning we'd miss a real difference 83.4% of the time. Surprisingly, when we added more data (from 3 to 9 periods):
  • Power increased from 11.2% → 16.6% (slightly better detection) - 118 years of data for 80% power
  • But required periods jumped from 19 → 59 for reliable detection

Why? The new data revealed higher variability in rent-inflation differences, making patterns harder to spot despite more data points."

  • This low power explains why we can't detect significance despite visible differences

Cohen's d Effect Size

Cohen's d = 0.37 quantifies the practical importance of differences:

  • Measures how many standard deviations separate rent and inflation changes
  • d=0.37 is a small-to-medium effect
  • In practical terms: When inflation changes by 5%, rent typically changes by either 3.2% or 6.8% (about ±1.8% difference)
  • Interpretation: The average difference is noticeable but not dramatic
d Value Effect Size Housing Market Interpretation
<0.2 Negligible Trivial difference
0.2-0.5 Small Noticeable but minor
0.5-0.8 Medium Substantial policy impact
>0.8 Large Market-altering difference

4. Rent-Inflation Change Over Time

Change Timeline

  • Pattern Identification:
    • Pre-2019: Higher rent volatility
    • Post-2019: Stabilization despite economic shocks

📈 Statistical Conclusions

Metric Value Interpretation
p-value 0.298 > 0.05 → Fail to reject H₀
Effect Size (d) 0.37 Small-medium effect
Key Conclusion No systematic difference

Interpretation:

While we observe meaningful differences in specific periods (especially before 2019), statistical tests show no consistent evidence that rent changes systematically differ from inflation over the 2005-2023 period.

🛠️ How to Reproduce

Step 1: Clone Repository

git clone https://github.com/GFiaMon/data-wrangling-project.git
cd berlin-rent-inflation

Step 2: Install Requirements

pip install -r requirements.txt

Step 3: Run Analysis

Execute Jupyter notebook:

jupyter notebook notebooks/01_mietspiegel_inflation_eda_full_dataset.ipynb

Data Processing Workflow

graph LR
    A[Raw PDFs/CSVs] --> B[Data Extraction]
    B --> C[Data Cleaning]
    C --> D[Analysis]
    D --> E[Visualization]
Loading

📂 Repository Structure

berlin-rent-inflation/
├── data/
│   ├── raw/                   # Original datasets
│   └── processed/             # Cleaned data files
├── notebooks/
│   └── analysis.ipynb         # Main analysis notebook
├── src/
│   ├── data_cleaning.py       # Cleaning functions
├── visuals/                   # Generated plots
├── .gitignore
├── requirements.txt
└── README.md

📚 Data Sources

  1. Mietspiegel Reports

  2. Inflation Data

🔮 Future Research

  • District-level analysis
  • Integration of wage growth data
  • Predictive modeling of 2025 Mietspiegel
  • Impact assessment of rent control policies

👥 Contact

For questions or collaboration: Guillermo Fiallo Montero

LinkedIn

GitHub

About

Analysis of Berlin's rental index against Germany's inflation rate

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published