Can I have my bike back?

dataset

Dataset is from Toronto Police Service. Details can be found there.

workflow

preprocessing

some columns are dropped because of duplicate info or poor quality

features such as season, speed tier, and time scope are generated based on related columns

to evaluate the interpretability of each feature, feature importance and sharply value are adopted. using different methods to measure is the way to cross compare for a comprehensive and integral evaluation

all plots depicted by feature importance and sharply value indicate weak contributions for season, speed tier, and time scope

modeling

features after the processing feed to the model. records in 2019 are split as test set. catboost is suggested in this context. here is the metric

the model seems to well fit data, but the model decay might be inevitable because of insufficient and unbalanced training set

highlight

KNNImputer is always better than simply filling with mean, median or even dropping null values
non-linear correlation needs to be examined because the Pearson correlation coefficient matrix cannot tell you that
ADASYN is one of the effective method to resolve unbalanced samples

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Bicycle_Thefts.csv		Bicycle_Thefts.csv
EDA.html		EDA.html
SHAP value LR.png		SHAP value LR.png
SHAP value RF.png		SHAP value RF.png
SHAP value lightgbm.png		SHAP value lightgbm.png
SHAP value xgboost.png		SHAP value xgboost.png
feature SHAP lightgbm.png		feature SHAP lightgbm.png
feature SHAP xgboost.png		feature SHAP xgboost.png
knn importance.png		knn importance.png
lightgbm importance.png		lightgbm importance.png
logistic importance.png		logistic importance.png
metric.png		metric.png
model.py		model.py
preprocessing.py		preprocessing.py
random forest importance.png		random forest importance.png
readme.md		readme.md
xgboost importance.png		xgboost importance.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can I have my bike back?

dataset

workflow

preprocessing

modeling

highlight

About

Releases

Packages

Languages

tanjiarui/bicycle-thefts

Folders and files

Latest commit

History

Repository files navigation

Can I have my bike back?

dataset

workflow

preprocessing

modeling

highlight

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages