Assignment 11 - Incorrect ML Preprocessing Procedure

Hi,

For **Assignment 11**, the .ipynb scales the data before train-test splits (screenshot below). However, this is incorrect; scaling and centering should be done after splitting and only on the training set (`scaler.fit_transform(X_train)`). The parameters derived from the train set should then be applied to the test set (`scaler.transform(X_test)`) to prevent data leakage and biasing the model. The test set should be treated as completely new/unseen data to the model, or else it's no longer generalizable.

![image](https://github.com/BME1478H/Fall2022class/assets/114617121/b2634a94-0b5c-4b45-9d1b-88c35cef90a6)

Also wanted to bring up a super minor nitpick for variable conventions. I believe ML and linear algebra typically keep X uppercase and y lowercase, since X is a matrix, while y is (often) a vector.

Thank you for the fun semester so far,
Jerry


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assignment 11 - Incorrect ML Preprocessing Procedure #748

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assignment 11 - Incorrect ML Preprocessing Procedure #748

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions