A machine learning web app that predicts crime risk for any location in San Francisco based on historical SFPD incident data (2003–2015).
👉 deployed locally, working on hosting due to model file size constraints
Due to GitHub file size limits, model files are hosted on Google Drive:
Download model.pkl, label_encoder.pkl, and crime_coords.json and place them in the root folder before running locally.
- Click any location on the San Francisco map
- Set the time of day, day of week, and police district
- Get an instant crime risk prediction — HIGH, MEDIUM, or LOW
- View a live crime density heatmap showing historically dangerous vs safer zones
- Dataset — San Francisco crime data (SFPD, 2003–2015) filtered for women-relevant crime categories: Assault, Prostitution, Sex Offenses
- Feature Engineering — Extracted hour, month, year from timestamps. Binary encoded arrest status. One-hot encoded police districts
- Model — Random Forest Classifier trained on 60,000+ filtered incidents
- Hypertuning — RandomizedSearchCV across n_estimators, max_depth, min_samples to find best params
- Deployment — Streamlit frontend with Folium map
| Metric | Score |
|---|---|
| Accuracy | 93% |
| Model | Random Forest (n_estimators=300) |
| Classes | ASSAULT · PROSTITUTION · SEX OFFENSES FORCIBLE |
Note: The model predicts crime type likelihood based on location and time using historical SFPD data. Since training data contains only crime incidents, predictions reflect areas where crimes have historically occurred.
crimedensityseek/
├── app.py # Streamlit frontend
├── model.pkl # Trained Random Forest model (via Drive)
├── label_encoder.pkl # Label encoder (via Drive)
├── crime_coords.json # Crime coordinates for heatmap (via Drive)
├── requirements.txt # Dependencies
└── README.md
- Python — pandas, numpy, scikit-learn
- Model — Random Forest Classifier
- Frontend — Streamlit
- Maps — Folium + streamlit-folium
git clone https://github.com/parinaB/crimedensityseek
cd crimedensityseek
pip install -r requirements.txt
streamlit run app.py- Data is from 2003–2015 — crime patterns may have changed
- Model trained only on women-relevant crime categories
- District must be manually selected to match clicked location
- All predictions reflect historical crime density, not real-time data
parinaB
