-
Notifications
You must be signed in to change notification settings - Fork 761
Open
Description
Hi,
For Assignment 11, the .ipynb scales the data before train-test splits (screenshot below). However, this is incorrect; scaling and centering should be done after splitting and only on the training set (scaler.fit_transform(X_train)). The parameters derived from the train set should then be applied to the test set (scaler.transform(X_test)) to prevent data leakage and biasing the model. The test set should be treated as completely new/unseen data to the model, or else it's no longer generalizable.
Also wanted to bring up a super minor nitpick for variable conventions. I believe ML and linear algebra typically keep X uppercase and y lowercase, since X is a matrix, while y is (often) a vector.
Thank you for the fun semester so far,
Jerry
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
