@@ -414,7 +414,7 @@ <h2> Contents </h2>
414414 < nav aria-label ="Page ">
415415 < ul class ="visible nav section-nav flex-column ">
416416< li class ="toc-h1 nav-item toc-entry "> < a class ="reference internal nav-link " href ="# "> Exercises week 43</ a > </ li >
417- < li class ="toc-h1 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#overarching-aims-of-the-exercises-weeks-43-and-44 "> Overarching aims of the exercises weeks 43 and 44 </ a > < ul class ="visible nav section-nav flex-column ">
417+ < li class ="toc-h1 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#overarching-aims-of-the-exercises-for-week-43 "> Overarching aims of the exercises for week 43 </ a > < ul class ="visible nav section-nav flex-column ">
418418< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#confusion-matrix "> Confusion Matrix</ a > </ li >
419419< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#roc-curve "> ROC Curve</ a > </ li >
420420< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#cumulative-gain "> Cumulative Gain</ a > </ li >
@@ -446,15 +446,15 @@ <h1>Exercises week 43<a class="headerlink" href="#exercises-week-43" title="Link
446446< p > < strong > October 20-24, 2025</ strong > </ p >
447447< p > Date: < strong > Deadline Friday October 24 at midnight</ strong > </ p >
448448</ section >
449- < section class ="tex2jax_ignore mathjax_ignore " id ="overarching-aims-of-the-exercises-weeks-43-and-44 ">
450- < h1 > Overarching aims of the exercises weeks 43 and 44 < a class ="headerlink " href ="#overarching-aims-of-the-exercises-weeks-43-and-44 " title ="Link to this heading "> #</ a > </ h1 >
449+ < section class ="tex2jax_ignore mathjax_ignore " id ="overarching-aims-of-the-exercises-for-week-43 ">
450+ < h1 > Overarching aims of the exercises for week 43 < a class ="headerlink " href ="#overarching-aims-of-the-exercises-for-week-43 " title ="Link to this heading "> #</ a > </ h1 >
451451< p > The aim of the exercises this week is to gain some confidence with
452452ways to visualize the results of a classification problem. We will
453453target three ways of setting up the analysis. The first and simplest
454454one is the</ p >
455455< ol class ="arabic simple ">
456- < li > < p > so-called confusion matrix, and the next is the</ p > </ li >
457- < li > < p > ROC curve and finally the</ p > </ li >
456+ < li > < p > so-called confusion matrix. The next one is the so-called </ p > </ li >
457+ < li > < p > ROC curve. Finally we have the</ p > </ li >
458458< li > < p > Cumulative gain curve.</ p > </ li >
459459</ ol >
460460< p > We will use Logistic Regression as method for the classification in
@@ -615,41 +615,41 @@ <h2>Exercises<a class="headerlink" href="#exercises" title="Link to this heading
615615Feel free to use these functionalities (we don’t expect you to write your own code for say the confusion matrix).</ p >
616616< div class ="cell docutils container ">
617617< div class ="cell_input docutils container ">
618- < div class ="highlight-none notranslate "> < div class ="highlight "> < pre > < span > </ span > % matplotlib inline
619-
620- import matplotlib.pyplot as plt
621- import numpy as np
622- from sklearn.model_selection import train_test_split
623- # from sklearn.datasets import fill in the data set
624- from sklearn.linear_model import LogisticRegression
625-
626- # Load the data, fill inn
627- mydata. data = ?
628-
629- X_train, X_test, y_train, y_test = train_test_split( mydata. data, cancer. target, random_state=0)
630- print( X_train. shape)
631- print( X_test. shape)
632- # Logistic Regression
633- # define which type of problem, binary or multiclass
634- logreg = LogisticRegression( solver= 'lbfgs')
635- logreg. fit( X_train, y_train)
636-
637- from sklearn.preprocessing import LabelEncoder
638- from sklearn.model_selection import cross_validate
639- #Cross validation
640- accuracy = cross_validate( logreg, X_test, y_test,cv=10)[ 'test_score']
641- print( accuracy)
642- print( "Test set accuracy with Logistic Regression: {:.2f}". format( logreg. score( X_test, y_test)))
643-
644- import scikitplot as skplt
645- y_pred = logreg. predict( X_test)
646- skplt. metrics. plot_confusion_matrix( y_test, y_pred, normalize= True)
647- plt. show()
648- y_probas = logreg. predict_proba( X_test)
649- skplt. metrics. plot_roc( y_test, y_probas)
650- plt. show()
651- skplt. metrics. plot_cumulative_gain( y_test, y_probas)
652- plt. show()
618+ < div class ="highlight-ipython3 notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class =" o " > % </ span > < span class =" k " > matplotlib</ span > inline
619+
620+ < span class =" kn " > import</ span > < span class =" nn " > matplotlib.pyplot</ span > < span class =" k " > as </ span > < span class =" nn " > plt</ span >
621+ < span class =" kn " > import</ span > < span class =" nn " > numpy</ span > < span class =" k " > as </ span > < span class =" nn " > np </ span >
622+ < span class =" kn " > from</ span > < span class =" nn " > sklearn.model_selection</ span > < span class =" kn " > import</ span > < span class =" n " > train_test_split</ span >
623+ < span class =" c1 " > # from sklearn.datasets import fill in the data set</ span >
624+ < span class =" kn " > from</ span > < span class =" nn " > sklearn.linear_model</ span > < span class =" kn " > import</ span > < span class =" n " > LogisticRegression</ span >
625+
626+ < span class =" c1 " > # Load the data, fill inn</ span >
627+ < span class =" n " > mydata</ span > < span class =" o " > . </ span > < span class =" n " > data</ span > < span class =" o " > = </ span > < span class =" o " > ? </ span >
628+
629+ < span class =" n " > X_train</ span > < span class =" p " > , </ span > < span class =" n " > X_test</ span > < span class =" p " > , </ span > < span class =" n " > y_train</ span > < span class =" p " > , </ span > < span class =" n " > y_test</ span > < span class =" o " > = </ span > < span class =" n " > train_test_split</ span > < span class =" p " > ( </ span > < span class =" n " > mydata</ span > < span class =" o " > . </ span > < span class =" n " > data</ span > < span class =" p " > , </ span > < span class =" n " > cancer</ span > < span class =" o " > . </ span > < span class =" n " > target</ span > < span class =" p " > , </ span > < span class =" n " > random_state</ span > < span class =" o " > = </ span > < span class =" mi " > 0 </ span > < span class =" p " > ) </ span >
630+ < span class =" nb " > print</ span > < span class =" p " > ( </ span > < span class =" n " > X_train</ span > < span class =" o " > . </ span > < span class =" n " > shape</ span > < span class =" p " > ) </ span >
631+ < span class =" nb " > print</ span > < span class =" p " > ( </ span > < span class =" n " > X_test</ span > < span class =" o " > . </ span > < span class =" n " > shape</ span > < span class =" p " > ) </ span >
632+ < span class =" c1 " > # Logistic Regression</ span >
633+ < span class =" c1 " > # define which type of problem, binary or multiclass</ span >
634+ < span class =" n " > logreg</ span > < span class =" o " > = </ span > < span class =" n " > LogisticRegression</ span > < span class =" p " > ( </ span > < span class =" n " > solver</ span > < span class =" o " > = </ span > < span class =" s1 " > 'lbfgs'</ span > < span class =" p " > ) </ span >
635+ < span class =" n " > logreg</ span > < span class =" o " > . </ span > < span class =" n " > fit</ span > < span class =" p " > ( </ span > < span class =" n " > X_train</ span > < span class =" p " > , </ span > < span class =" n " > y_train</ span > < span class =" p " > ) </ span >
636+
637+ < span class =" kn " > from</ span > < span class =" nn " > sklearn.preprocessing</ span > < span class =" kn " > import</ span > < span class =" n " > LabelEncoder</ span >
638+ < span class =" kn " > from</ span > < span class =" nn " > sklearn.model_selection</ span > < span class =" kn " > import</ span > < span class =" n " > cross_validate</ span >
639+ < span class =" c1 " > #Cross validation</ span >
640+ < span class =" n " > accuracy</ span > < span class =" o " > = </ span > < span class =" n " > cross_validate</ span > < span class =" p " > ( </ span > < span class =" n " > logreg</ span > < span class =" p " > , </ span > < span class =" n " > X_test</ span > < span class =" p " > , </ span > < span class =" n " > y_test</ span > < span class =" p " > , </ span > < span class =" n " > cv </ span > < span class =" o " > = </ span > < span class =" mi " > 10 </ span > < span class =" p " > )[ </ span > < span class =" s1 " > 'test_score'</ span > < span class =" p " > ] </ span >
641+ < span class =" nb " > print</ span > < span class =" p " > ( </ span > < span class =" n " > accuracy</ span > < span class =" p " > ) </ span >
642+ < span class =" nb " > print</ span > < span class =" p " > ( </ span > < span class =" s2 " > "Test set accuracy with Logistic Regression: </ span > < span class =" si " > {:.2f}</ span > < span class =" s2 " > "</ span > < span class =" o " > . </ span > < span class =" n " > format</ span > < span class =" p " > ( </ span > < span class =" n " > logreg</ span > < span class =" o " > . </ span > < span class =" n " > score</ span > < span class =" p " > ( </ span > < span class =" n " > X_test</ span > < span class =" p " > , </ span > < span class =" n " > y_test</ span > < span class =" p " > )))</ span >
643+
644+ < span class =" kn " > import</ span > < span class =" nn " > scikitplot</ span > < span class =" k " > as </ span > < span class =" nn " > skplt</ span >
645+ < span class =" n " > y_pred</ span > < span class =" o " > = </ span > < span class =" n " > logreg</ span > < span class =" o " > . </ span > < span class =" n " > predict</ span > < span class =" p " > ( </ span > < span class =" n " > X_test</ span > < span class =" p " > ) </ span >
646+ < span class =" n " > skplt</ span > < span class =" o " > . </ span > < span class =" n " > metrics</ span > < span class =" o " > . </ span > < span class =" n " > plot_confusion_matrix</ span > < span class =" p " > ( </ span > < span class =" n " > y_test</ span > < span class =" p " > , </ span > < span class =" n " > y_pred</ span > < span class =" p " > , </ span > < span class =" n " > normalize</ span > < span class =" o " > = </ span > < span class =" kc " > True</ span > < span class =" p " > ) </ span >
647+ < span class =" n " > plt</ span > < span class =" o " > . </ span > < span class =" n " > show</ span > < span class =" p " > () </ span >
648+ < span class =" n " > y_probas</ span > < span class =" o " > = </ span > < span class =" n " > logreg</ span > < span class =" o " > . </ span > < span class =" n " > predict_proba</ span > < span class =" p " > ( </ span > < span class =" n " > X_test</ span > < span class =" p " > ) </ span >
649+ < span class =" n " > skplt</ span > < span class =" o " > . </ span > < span class =" n " > metrics</ span > < span class =" o " > . </ span > < span class =" n " > plot_roc</ span > < span class =" p " > ( </ span > < span class =" n " > y_test</ span > < span class =" p " > , </ span > < span class =" n " > y_probas</ span > < span class =" p " > ) </ span >
650+ < span class =" n " > plt</ span > < span class =" o " > . </ span > < span class =" n " > show</ span > < span class =" p " > () </ span >
651+ < span class =" n " > skplt</ span > < span class =" o " > . </ span > < span class =" n " > metrics</ span > < span class =" o " > . </ span > < span class =" n " > plot_cumulative_gain</ span > < span class =" p " > ( </ span > < span class =" n " > y_test</ span > < span class =" p " > , </ span > < span class =" n " > y_probas</ span > < span class =" p " > ) </ span >
652+ < span class =" n " > plt</ span > < span class =" o " > . </ span > < span class =" n " > show</ span > < span class =" p " > () </ span >
653653</ pre > </ div >
654654</ div >
655655</ div >
@@ -664,9 +664,9 @@ <h3>Exercise b)<a class="headerlink" href="#exercise-b" title="Link to this head
664664the MNIST data set and just specialize to two numbers. To do so you can use the following code lines</ p >
665665< div class ="cell docutils container ">
666666< div class ="cell_input docutils container ">
667- < div class ="highlight-none notranslate "> < div class ="highlight "> < pre > < span > </ span > from sklearn.datasets import load_digits
668- digits = load_digits( n_class=2) # Load only two classes, e.g., 0 and 1
669- X, y = digits. data, digits. target
667+ < div class ="highlight-ipython3 notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class =" kn " > from</ span > < span class =" nn " > sklearn.datasets</ span > < span class =" kn " > import</ span > < span class =" n " > load_digits</ span >
668+ < span class =" n " > digits</ span > < span class =" o " > = </ span > < span class =" n " > load_digits</ span > < span class =" p " > ( </ span > < span class =" n " > n_class</ span > < span class =" o " > = </ span > < span class =" mi " > 2 </ span > < span class =" p " > ) </ span > < span class =" c1 " > # Load only two classes, e.g., 0 and 1</ span >
669+ < span class =" n " > X </ span > < span class =" p " > , </ span > < span class =" n " > y </ span > < span class =" o " > = </ span > < span class =" n " > digits</ span > < span class =" o " > . </ span > < span class =" n " > data</ span > < span class =" p " > , </ span > < span class =" n " > digits</ span > < span class =" o " > . </ span > < span class =" n " > target</ span >
670670</ pre > </ div >
671671</ div >
672672</ div >
@@ -678,8 +678,8 @@ <h3>Exercise b)<a class="headerlink" href="#exercise-b" title="Link to this head
678678informative features, redundant features, and more.</ p >
679679< div class ="cell docutils container ">
680680< div class ="cell_input docutils container ">
681- < div class ="highlight-none notranslate "> < div class ="highlight "> < pre > < span > </ span > from sklearn.datasets import make_classification
682- X, y = make_classification( n_samples= 1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)
681+ < div class ="highlight-ipython3 notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class =" kn " > from</ span > < span class =" nn " > sklearn.datasets</ span > < span class =" kn " > import</ span > < span class =" n " > make_classification</ span >
682+ < span class =" n " > X </ span > < span class =" p " > , </ span > < span class =" n " > y </ span > < span class =" o " > = </ span > < span class =" n " > make_classification</ span > < span class =" p " > ( </ span > < span class =" n " > n_samples</ span > < span class =" o " > = </ span > < span class =" mi " > 1000</ span > < span class =" p " > , </ span > < span class =" n " > n_features</ span > < span class =" o " > = </ span > < span class =" mi " > 20 </ span > < span class =" p " > , </ span > < span class =" n " > n_informative</ span > < span class =" o " > = </ span > < span class =" mi " > 10 </ span > < span class =" p " > , </ span > < span class =" n " > n_redundant</ span > < span class =" o " > = </ span > < span class =" mi " > 5 </ span > < span class =" p " > , </ span > < span class =" n " > n_classes</ span > < span class =" o " > = </ span > < span class =" mi " > 2 </ span > < span class =" p " > , </ span > < span class =" n " > random_state</ span > < span class =" o " > = </ span > < span class =" mi " > 42 </ span > < span class =" p " > ) </ span >
683683</ pre > </ div >
684684</ div >
685685</ div >
@@ -696,10 +696,10 @@ <h3>Exercise c) week 43<a class="headerlink" href="#exercise-c-week-43" title="L
696696you can set it up using < strong > scikit-learn</ strong > ,</ p >
697697< div class ="cell docutils container ">
698698< div class ="cell_input docutils container ">
699- < div class ="highlight-none notranslate "> < div class ="highlight "> < pre > < span > </ span > from sklearn.datasets import load_iris
700- iris = load_iris()
701- X = iris. data # Features
702- y = iris. target # Target labels
699+ < div class ="highlight-ipython3 notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class =" kn " > from</ span > < span class =" nn " > sklearn.datasets</ span > < span class =" kn " > import</ span > < span class =" n " > load_iris</ span >
700+ < span class =" n " > iris</ span > < span class =" o " > = </ span > < span class =" n " > load_iris</ span > < span class =" p " > () </ span >
701+ < span class =" n " > X </ span > < span class =" o " > = </ span > < span class =" n " > iris</ span > < span class =" o " > . </ span > < span class =" n " > data</ span > < span class =" c1 " > # Features</ span >
702+ < span class =" n " > y </ span > < span class =" o " > = </ span > < span class =" n " > iris</ span > < span class =" o " > . </ span > < span class =" n " > target</ span > < span class =" c1 " > # Target labels</ span >
703703</ pre > </ div >
704704</ div >
705705</ div >
@@ -775,7 +775,7 @@ <h3>Exercise c) week 43<a class="headerlink" href="#exercise-c-week-43" title="L
775775 < nav class ="bd-toc-nav page-toc ">
776776 < ul class ="visible nav section-nav flex-column ">
777777< li class ="toc-h1 nav-item toc-entry "> < a class ="reference internal nav-link " href ="# "> Exercises week 43</ a > </ li >
778- < li class ="toc-h1 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#overarching-aims-of-the-exercises-weeks-43-and-44 "> Overarching aims of the exercises weeks 43 and 44 </ a > < ul class ="visible nav section-nav flex-column ">
778+ < li class ="toc-h1 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#overarching-aims-of-the-exercises-for-week-43 "> Overarching aims of the exercises for week 43 </ a > < ul class ="visible nav section-nav flex-column ">
779779< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#confusion-matrix "> Confusion Matrix</ a > </ li >
780780< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#roc-curve "> ROC Curve</ a > </ li >
781781< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#cumulative-gain "> Cumulative Gain</ a > </ li >
0 commit comments