I completed the 4 queestions in the homework.#1
Conversation
anjali-deshpande-hub
left a comment
There was a problem hiding this comment.
Please correct this mistake -
Question 4: The knn object will not contain the best-neighbor model.
And your final accuracy calculation is not evaluating the optimized model.
You ran GridSearchCV:
wine_grid = GridSearchCV(estimator=knn, param_grid=ParameterGrid, cv=10)
wine_grid.fit(...)
GridSearchCV internally finds the best model and stores it in:
wine_grid.best_estimator_
Your original knn = KNeighborsClassifier(n_neighbors=5) never changes.
It will always have n_neighbors=5.
So when you run:
knn.score(wine_df_test..., wine_df_test['class'])
You are evaluating the non-optimized model (k=5), not the best model.
Instead you should do:
best_knn = KNeighborsClassifier(
n_neighbors = wine_grid.best_params_['n_neighbors']
)
# Fit on training data
best_knn.fit(X, Y)
# Evaluate on test data
accuracy = best_knn.score( ...)
Minor observations:
Call to cross_validate function is not really required since GridSearchCV does the cross validation.
I pushed a new file. |
anjali-deshpande-hub
left a comment
There was a problem hiding this comment.
The changes work.
Just a few observations about the changes:
- These lines were not required:
knn.fit(X,Y)
wine_df_test['prediction'] = knn.predict(wine_df_test[wine_df_train.columns[:-1]])
#I actually did not understand the subtrain. Is it needed to find out the optimal K-value?
When you run
wine_grid.fit(wine_df_train[wine_df_train.columns[:-1]], wine_df_train['class'])
you are calling .fit() on the GridSearchCV object, which means that you’re training multiple KNN models with different K values and selecting the best one.
- In the future, please add comments to the PR.
What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)
What did you learn from the changes you have made?
Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?
Were there any challenges? If so, what issue(s) did you face? How did you overcome it?
How were these changes tested?
A reference to a related issue in your repository (if applicable)
Checklist