Skip to content

Commit

Permalink
Update Stratified Sampling.py
Browse files Browse the repository at this point in the history
  • Loading branch information
guochen-code authored Sep 20, 2021
1 parent fa19714 commit c1885ea
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions Stratified Sampling.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(data_set, test_size=0.2, random_state=42)
# alternatively, [X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)], if x/y splitted.
# Random sampling is fine given a large enough data set.
# Otherwise, it will introduce a significant sampling bias, not representative of the whole population.
# Solutions: stratified sampling. The population is divided into homogeneous subgroups called strata, and the right number of instances is sampled from each stratum to guarantee that the test set is representative of the overall population.
Expand Down

0 comments on commit c1885ea

Please sign in to comment.