Skip to content

Commit 6da2fbe

Browse files
committed
typos
1 parent 6b9d1f8 commit 6da2fbe

File tree

7 files changed

+1087
-240
lines changed

7 files changed

+1087
-240
lines changed

doc/pub/week48/html/week48-bs.html

Lines changed: 84 additions & 69 deletions
Large diffs are not rendered by default.

doc/pub/week48/html/week48-reveal.html

Lines changed: 180 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ <h2 id="plan-for-week-47">Plan for week 47 </h2>
209209

210210
<p><li> Lab sessions at usual times.</li>
211211

212-
<p><li> For the week of December 2-6, lab sessions atart at 10am and end 4pm, room F&#216;434, Tuesday and Wednesday</li>
212+
<p><li> For the week of December 2-6, lab sessions start at 10am and end at 4pm, room F&#216;434, Tuesday and Wednesday</li>
213213
</ul>
214214
</div>
215215

@@ -222,8 +222,8 @@ <h2 id="plan-for-week-47">Plan for week 47 </h2>
222222
<p><li> Summary of course</li>
223223
<p><li> Readings and Videos:
224224
<ol type="a"></li>
225-
<p><li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week47/ipynb/week48.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week47/ipynb/week48.ipynb</tt></a></li>
226-
<p><li> See also lecture notes from week 47 at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week46/ipynb/week47.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week46/ipynb/week47.ipynb</tt></a>. The lecture on Monday starts with a repetition on AdaBoost before we move over to gradient boosting with examples
225+
<p><li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week48/ipynb/week48.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week48/ipynb/week48.ipynb</tt></a></li>
226+
<p><li> See also lecture notes from week 47 at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week47/ipynb/week47.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week47/ipynb/week47.ipynb</tt></a>. The lecture on Monday starts with a repetition on AdaBoost before we move over to gradient boosting with examples
227227
<!-- o Video of lecture at <a href="https://youtu.be/RIHzmLv05DA" target="_blank"><tt>https://youtu.be/RIHzmLv05DA</tt></a> -->
228228
<!-- o Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesNovember25.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesNovember25.pdf</tt></a> --></li>
229229
<p><li> Video on Decision trees <a href="https://www.youtube.com/watch?v=RmajweUFKvM&ab_channel=Simplilearn" target="_blank"><tt>https://www.youtube.com/watch?v=RmajweUFKvM&ab_channel=Simplilearn</tt></a></li>
@@ -237,6 +237,183 @@ <h2 id="plan-for-week-47">Plan for week 47 </h2>
237237
</div>
238238
</section>
239239

240+
<section>
241+
<h2 id="random-forest-algorithm-reminder-from-last-week">Random Forest Algorithm, reminder from last week </h2>
242+
243+
<p>The algorithm described here can be applied to both classification and regression problems.</p>
244+
245+
<p>We will grow of forest of say \( B \) trees.</p>
246+
<ol>
247+
<p><li> For \( b=1:B \)
248+
<ol type="a"></li>
249+
<p><li> Draw a bootstrap sample from the training data organized in our \( \boldsymbol{X} \) matrix.</li>
250+
<p><li> We grow then a random forest tree \( T_b \) based on the bootstrapped data by repeating the steps outlined till we reach the maximum node size is reached</li>
251+
<ol>
252+
253+
<p><li> we select \( m \le p \) variables at random from the \( p \) predictors/features</li>
254+
255+
<p><li> pick the best split point among the \( m \) features using for example the CART algorithm and create a new node</li>
256+
257+
<p><li> split the node into daughter nodes</li>
258+
</ol>
259+
<p>
260+
</ol>
261+
<p>
262+
<p><li> Output then the ensemble of trees \( \{T_b\}_1^{B} \) and make predictions for either a regression type of problem or a classification type of problem.</li>
263+
</ol>
264+
</section>
265+
266+
<section>
267+
<h2 id="random-forests-compared-with-other-methods-on-the-cancer-data">Random Forests Compared with other Methods on the Cancer Data </h2>
268+
269+
270+
<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
271+
<div class="cell border-box-sizing code_cell rendered">
272+
<div class="input">
273+
<div class="inner_cell">
274+
<div class="input_area">
275+
<div class="highlight" style="background: #eeeedd">
276+
<pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
277+
<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
278+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
279+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
280+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.svm</span> <span style="color: #8B008B; font-weight: bold">import</span> SVC
281+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
282+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.tree</span> <span style="color: #8B008B; font-weight: bold">import</span> DecisionTreeClassifier
283+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.ensemble</span> <span style="color: #8B008B; font-weight: bold">import</span> BaggingClassifier
284+
285+
<span style="color: #228B22"># Load the data</span>
286+
cancer = load_breast_cancer()
287+
288+
X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
289+
<span style="color: #658b00">print</span>(X_train.shape)
290+
<span style="color: #658b00">print</span>(X_test.shape)
291+
<span style="color: #228B22">#define methods</span>
292+
<span style="color: #228B22"># Logistic Regression</span>
293+
logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
294+
<span style="color: #228B22"># Support vector machine</span>
295+
svm = SVC(gamma=<span style="color: #CD5555">&#39;auto&#39;</span>, C=<span style="color: #B452CD">100</span>)
296+
<span style="color: #228B22"># Decision Trees</span>
297+
deep_tree_clf = DecisionTreeClassifier(max_depth=<span style="color: #8B008B; font-weight: bold">None</span>)
298+
<span style="color: #228B22">#Scale the data</span>
299+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> StandardScaler
300+
scaler = StandardScaler()
301+
scaler.fit(X_train)
302+
X_train_scaled = scaler.transform(X_train)
303+
X_test_scaled = scaler.transform(X_test)
304+
<span style="color: #228B22"># Logistic Regression</span>
305+
logreg.fit(X_train_scaled, y_train)
306+
<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy Logistic Regression with scaled data: {:.2f}&quot;</span>.format(logreg.score(X_test_scaled,y_test)))
307+
<span style="color: #228B22"># Support Vector Machine</span>
308+
svm.fit(X_train_scaled, y_train)
309+
<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy SVM with scaled data: {:.2f}&quot;</span>.format(logreg.score(X_test_scaled,y_test)))
310+
<span style="color: #228B22"># Decision Trees</span>
311+
deep_tree_clf.fit(X_train_scaled, y_train)
312+
<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Decision Trees and scaled data: {:.2f}&quot;</span>.format(deep_tree_clf.score(X_test_scaled,y_test)))
313+
314+
315+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.ensemble</span> <span style="color: #8B008B; font-weight: bold">import</span> RandomForestClassifier
316+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> LabelEncoder
317+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_validate
318+
<span style="color: #228B22"># Data set not specificied</span>
319+
<span style="color: #228B22">#Instantiate the model with 500 trees and entropy as splitting criteria</span>
320+
Random_Forest_model = RandomForestClassifier(n_estimators=<span style="color: #B452CD">500</span>,criterion=<span style="color: #CD5555">&quot;entropy&quot;</span>)
321+
Random_Forest_model.fit(X_train_scaled, y_train)
322+
<span style="color: #228B22">#Cross validation</span>
323+
accuracy = cross_validate(Random_Forest_model,X_test_scaled,y_test,cv=<span style="color: #B452CD">10</span>)[<span style="color: #CD5555">&#39;test_score&#39;</span>]
324+
<span style="color: #658b00">print</span>(accuracy)
325+
<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Random Forests and scaled data: {:.2f}&quot;</span>.format(Random_Forest_model.score(X_test_scaled,y_test)))
326+
327+
328+
<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scikitplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">skplt</span>
329+
y_pred = Random_Forest_model.predict(X_test_scaled)
330+
skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=<span style="color: #8B008B; font-weight: bold">True</span>)
331+
plt.show()
332+
y_probas = Random_Forest_model.predict_proba(X_test_scaled)
333+
skplt.metrics.plot_roc(y_test, y_probas)
334+
plt.show()
335+
skplt.metrics.plot_cumulative_gain(y_test, y_probas)
336+
plt.show()
337+
</pre>
338+
</div>
339+
</div>
340+
</div>
341+
</div>
342+
<div class="output_wrapper">
343+
<div class="output">
344+
<div class="output_area">
345+
<div class="output_subarea output_stream output_stdout output_text">
346+
</div>
347+
</div>
348+
</div>
349+
</div>
350+
</div>
351+
352+
<p>Recall that the cumulative gains curve shows the percentage of the
353+
overall number of cases in a given category <em>gained</em> by targeting a
354+
percentage of the total number of cases.
355+
</p>
356+
357+
<p>Similarly, the receiver operating characteristic curve, or ROC curve,
358+
displays the diagnostic ability of a binary classifier system as its
359+
discrimination threshold is varied. It plots the true positive rate against the false positive rate.
360+
</p>
361+
</section>
362+
363+
<section>
364+
<h2 id="compare-bagging-on-trees-with-random-forests">Compare Bagging on Trees with Random Forests </h2>
365+
366+
<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
367+
<div class="cell border-box-sizing code_cell rendered">
368+
<div class="input">
369+
<div class="inner_cell">
370+
<div class="input_area">
371+
<div class="highlight" style="background: #eeeedd">
372+
<pre style="font-size: 80%; line-height: 125%;">bag_clf = BaggingClassifier(
373+
DecisionTreeClassifier(splitter=<span style="color: #CD5555">&quot;random&quot;</span>, max_leaf_nodes=<span style="color: #B452CD">16</span>, random_state=<span style="color: #B452CD">42</span>),
374+
n_estimators=<span style="color: #B452CD">500</span>, max_samples=<span style="color: #B452CD">1.0</span>, bootstrap=<span style="color: #8B008B; font-weight: bold">True</span>, n_jobs=-<span style="color: #B452CD">1</span>, random_state=<span style="color: #B452CD">42</span>)
375+
</pre>
376+
</div>
377+
</div>
378+
</div>
379+
</div>
380+
<div class="output_wrapper">
381+
<div class="output">
382+
<div class="output_area">
383+
<div class="output_subarea output_stream output_stdout output_text">
384+
</div>
385+
</div>
386+
</div>
387+
</div>
388+
<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
389+
<div class="cell border-box-sizing code_cell rendered">
390+
<div class="input">
391+
<div class="inner_cell">
392+
<div class="input_area">
393+
<div class="highlight" style="background: #eeeedd">
394+
<pre style="font-size: 80%; line-height: 125%;">bag_clf.fit(X_train, y_train)
395+
y_pred = bag_clf.predict(X_test)
396+
<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.ensemble</span> <span style="color: #8B008B; font-weight: bold">import</span> RandomForestClassifier
397+
rnd_clf = RandomForestClassifier(n_estimators=<span style="color: #B452CD">500</span>, max_leaf_nodes=<span style="color: #B452CD">16</span>, n_jobs=-<span style="color: #B452CD">1</span>, random_state=<span style="color: #B452CD">42</span>)
398+
rnd_clf.fit(X_train, y_train)
399+
y_pred_rf = rnd_clf.predict(X_test)
400+
np.sum(y_pred == y_pred_rf) / <span style="color: #658b00">len</span>(y_pred)
401+
</pre>
402+
</div>
403+
</div>
404+
</div>
405+
</div>
406+
<div class="output_wrapper">
407+
<div class="output">
408+
<div class="output_area">
409+
<div class="output_subarea output_stream output_stdout output_text">
410+
</div>
411+
</div>
412+
</div>
413+
</div>
414+
</div>
415+
</section>
416+
240417
<section>
241418
<h2 id="boosting-a-bird-s-eye-view">Boosting, a Bird's Eye View </h2>
242419

0 commit comments

Comments
 (0)