Will be predicting health insurance cost for individuals using a Regression model using scikit Learn for the data preperation and model .
Attaching 2 approaches :-
-
Without preprocessing of nummerical columns and dropping rows with NaN :- r2_score = 0.71
-
Did imputaion and standardization of numerical columns :- r2_score = 0.18
When you think about this, it feels weird. Cause it is a good practice to do the second approach right ?