updated pdf and html

numbats · Feb 27, 2024 · a13be89 · a13be89
1 parent bbf75cc
commit a13be89
Show file tree

Hide file tree

Showing 8 changed files with 162 additions and 17 deletions.
diff --git a/docs/search.json b/docs/search.json
@@ -333,7 +333,7 @@
     "href": "week1/slides.html#confusion-misclassification-matrix-computing",
     "title": "ETC3250/5250 Introduction to Machine Learning",
     "section": "Confusion (misclassification) matrix: computing",
-    "text": "Confusion (misclassification) matrix: computing\n\n\nTwo classes\n\n# Write out the confusion matrix in standard form\ncm &lt;- a2 %&gt;% count(y, pred) |&gt;\n  group_by(y) |&gt;\n  mutate(cl_err = n[pred==y]/sum(n)) \ncm |&gt;\n  pivot_wider(names_from = pred, \n              values_from = n) |&gt;\n  select(y, bilby, quokka, cl_err)\n\n# A tibble: 2 × 4\n# Groups:   y [2]\n  y      bilby quokka cl_err\n  &lt;fct&gt;  &lt;int&gt;  &lt;int&gt;  &lt;dbl&gt;\n1 bilby      9      3  0.75 \n2 quokka     5     10  0.667\n\n\n\naccuracy(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.704\n\nbal_accuracy(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.708\n\nsens(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.75\n\nspecificity(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.667\n\n\n\nMore than two classes\n\n# Write out the confusion matrix in standard form\ncm3 &lt;- a3 %&gt;% count(y, pred) |&gt;\n  group_by(y) |&gt;\n  mutate(cl_err = n[pred==y]/sum(n)) \ncm3 |&gt;\n  pivot_wider(names_from = pred, \n              values_from = n, values_fill=0) |&gt;\n  select(y, bilby, quokka, numbat, cl_err)\n\n# A tibble: 3 × 5\n# Groups:   y [3]\n  y      bilby quokka numbat cl_err\n  &lt;fct&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;dbl&gt;\n1 bilby      9      3      0  0.75 \n2 numbat     0      2      6  0.75 \n3 quokka     5     10      0  0.667\n\n\n\naccuracy(a3, y, pred) |&gt; pull(.estimate)\n\n[1] 0.714\n\nbal_accuracy(a3, y, pred) |&gt; pull(.estimate)\n\n[1] 0.783"
+    "text": "Confusion (misclassification) matrix: computing\n\n\nTwo classes\n\n# Write out the confusion matrix in standard form\n#| label: oconfusion-matrix-tidy\ncm &lt;- a2 %&gt;% count(y, pred) |&gt;\n  group_by(y) |&gt;\n  mutate(cl_err = n[pred==y]/sum(n)) \ncm |&gt;\n  pivot_wider(names_from = pred, \n              values_from = n) |&gt;\n  select(y, bilby, quokka, cl_err)\n\n# A tibble: 2 × 4\n# Groups:   y [2]\n  y      bilby quokka cl_err\n  &lt;fct&gt;  &lt;int&gt;  &lt;int&gt;  &lt;dbl&gt;\n1 bilby      9      3  0.75 \n2 quokka     5     10  0.667\n\n\n\naccuracy(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.704\n\nbal_accuracy(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.708\n\nsens(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.75\n\nspecificity(a2, y, pred) |&gt; pull(.estimate)\n\n[1] 0.667\n\n\n\nMore than two classes\n\n# Write out the confusion matrix in standard form\ncm3 &lt;- a3 %&gt;% count(y, pred) |&gt;\n  group_by(y) |&gt;\n  mutate(cl_err = n[pred==y]/sum(n)) \ncm3 |&gt;\n  pivot_wider(names_from = pred, \n              values_from = n, values_fill=0) |&gt;\n  select(y, bilby, quokka, numbat, cl_err)\n\n# A tibble: 3 × 5\n# Groups:   y [3]\n  y      bilby quokka numbat cl_err\n  &lt;fct&gt;  &lt;int&gt;  &lt;int&gt;  &lt;int&gt;  &lt;dbl&gt;\n1 bilby      9      3      0  0.75 \n2 numbat     0      2      6  0.75 \n3 quokka     5     10      0  0.667\n\n\n\naccuracy(a3, y, pred) |&gt; pull(.estimate)\n\n[1] 0.714\n\nbal_accuracy(a3, y, pred) |&gt; pull(.estimate)\n\n[1] 0.783"
   },
   {
     "objectID": "week1/slides.html#receiver-operator-curves-roc",
@@ -733,5 +733,96 @@
     "title": "Week 8: Support vector machines and nearest neighbours",
     "section": "Assignments",
     "text": "Assignments\n\nAssignment 3 is due on Friday 26 April."
+  },
+  {
+    "objectID": "week2/slides.html#next-re-sampling-and-regularisation",
+    "href": "week2/slides.html#next-re-sampling-and-regularisation",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Next: Re-sampling and regularisation",
+    "text": "Next: Re-sampling and regularisation\n\n\n\nETC3250/5250 Lecture 2 | iml.numbat.space"
+  },
+  {
+    "objectID": "week2/slides.html#overview",
+    "href": "week2/slides.html#overview",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Overview",
+    "text": "Overview\nIn this week we will cover:\n\nConceptual framing for visualisation\nCommon methods: scatterplot matrix, parallel coordinates, tours\nDetails on using tours for examining clustering and class structure\nDimension reduction\n\nLinear: principal component analysis\nNon-linear: multidimensional scaling, t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP)\n\nUsing tours to assess dimension reduction"
+  },
+  {
+    "objectID": "week2/slides.html#concepts",
+    "href": "week2/slides.html#concepts",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Concepts",
+    "text": "Concepts"
+  },
+  {
+    "objectID": "week2/slides.html#model-in-the-data-space",
+    "href": "week2/slides.html#model-in-the-data-space",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Model-in-the-data-space",
+    "text": "Model-in-the-data-space\n\n\n\n\n\nFrom XKCD\n\n\n\n We plot the model on the data to assess whether it fits or is a misfit!\n\n\nDoing this in high-dimensions is considered difficult!\n\n\nSo it is common to only plot the data-in-the-model-space."
+  },
+  {
+    "objectID": "week2/slides.html#data-in-the-model-space",
+    "href": "week2/slides.html#data-in-the-model-space",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Data-in-the-model-space",
+    "text": "Data-in-the-model-space"
+  },
+  {
+    "objectID": "week2/slides.html#how-do-you-visualise-beyond-2d",
+    "href": "week2/slides.html#how-do-you-visualise-beyond-2d",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "How do you visualise beyond 2D?",
+    "text": "How do you visualise beyond 2D?"
+  },
+  {
+    "objectID": "week2/slides.html#scatterplot-matrix",
+    "href": "week2/slides.html#scatterplot-matrix",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Scatterplot matrix",
+    "text": "Scatterplot matrix"
+  },
+  {
+    "objectID": "week2/slides.html#parallel-coordinate-plot",
+    "href": "week2/slides.html#parallel-coordinate-plot",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Parallel coordinate plot",
+    "text": "Parallel coordinate plot"
+  },
+  {
+    "objectID": "week2/slides.html#tours",
+    "href": "week2/slides.html#tours",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Tours",
+    "text": "Tours"
+  },
+  {
+    "objectID": "week2/slides.html#dimension-reduction",
+    "href": "week2/slides.html#dimension-reduction",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "Dimension reduction",
+    "text": "Dimension reduction"
+  },
+  {
+    "objectID": "week2/slides.html#pca",
+    "href": "week2/slides.html#pca",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "PCA",
+    "text": "PCA"
+  },
+  {
+    "objectID": "week2/slides.html#t-sne",
+    "href": "week2/slides.html#t-sne",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "t-SNE",
+    "text": "t-SNE"
+  },
+  {
+    "objectID": "week2/slides.html#umap",
+    "href": "week2/slides.html#umap",
+    "title": "ETC3250/5250 Introduction to Machine Learning",
+    "section": "UMAP",
+    "text": "UMAP"
   }
 ]
diff --git a/docs/site_libs/quarto-html/quarto-syntax-highlighting.css b/docs/site_libs/quarto-html/quarto-syntax-highlighting.css
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
@@ -26,7 +26,7 @@
   </url>
   <url>
     <loc>https://iml.numbat.space/week1/slides.html</loc>
-    <lastmod>2024-02-23T03:19:07.461Z</lastmod>
+    <lastmod>2024-02-27T01:22:58.414Z</lastmod>
   </url>
   <url>
     <loc>https://iml.numbat.space/resources.html</loc>
@@ -68,4 +68,8 @@
     <loc>https://iml.numbat.space/week8/index.html</loc>
     <lastmod>2024-02-05T21:28:13.682Z</lastmod>
   </url>
+  <url>
+    <loc>https://iml.numbat.space/week2/slides.html</loc>
+    <lastmod>2024-02-27T01:01:22.011Z</lastmod>
+  </url>
 </urlset>
diff --git a/docs/week1/slides.html b/docs/week1/slides.html
diff --git a/docs/week1/slides.pdf b/docs/week1/slides.pdf
diff --git a/docs/week1/slides.qmd b/docs/week1/slides.qmd
@@ -83,6 +83,7 @@ Tutors:
 2. **Unsupervised learning**: $y_i$ [unavailable]{.monash-orange2} for all $x_i$ 
 
 ```{r fig.width=6, fig.height=4, fig.align='center', echo=FALSE}
+#| label: overview-methods
 library(tidyverse)
 library(gapminder)
 library(gridExtra)
@@ -162,6 +163,7 @@ This is also considered the matrix of predictors, or explanatory or independent
 ::: {.column}
 
 ```{r}
+#| label: generate-mv-data
 library(mvtnorm)
 vc <- matrix(c(1, 0.5, 0.2, 
                0.5, 1, -0.3, 
@@ -180,6 +182,7 @@ What's the dimension of the data?
 ::: {.column}
 
 ```{r}
+#| label: load-penguins
 library(palmerpenguins)
 p_tidy <- penguins |>
   select(species, bill_length_mm:body_mass_g) |>
@@ -531,6 +534,7 @@ When data are reused for multiple tasks, instead of carefully *spent* from the f
 ::: {.column}
 
 ```{r}
+#| label: balanced-data
 d_bal <- tibble(y=c(rep("A", 6), rep("B", 6)),
                 x=c(runif(12)))
 d_bal$y
@@ -544,6 +548,7 @@ testing(d_bal_split)$y
 ::: {.column}
 
 ```{r}
+#| label: unbalanced-data
 d_unb <- tibble(y=c(rep("A", 2), rep("B", 10)),
                 x=c(runif(12)))
 d_unb$y
@@ -557,6 +562,7 @@ testing(d_unb_split)$y
 Always [stratify splitting]{.monash-orange2} by sub-groups, especially response variable classes.
 
 ```{r}
+#| label: unbalanced-split
 d_unb_strata <- initial_split(d_unb, prop = 0.70, strata=y)
 training(d_unb_strata)$y
 testing(d_unb_strata)$y
@@ -625,6 +631,7 @@ Two classes
 
 ```{r}
 #| echo: false
+#| label: predictive-class-example
 a2 <- tibble(y = c(rep("bilby", 12),
                       rep("quokka", 15)),
              pred = c(rep("bilby", 9),
@@ -660,6 +667,7 @@ a3 <- a3 |>
 #| eval: false
 #| echo: false
 # tidymodels has it transposed
+#| label: confusion-matrix
 cm <- conf_mat(a2, y, pred)
 autoplot(cm)
 # Make it show in right direction
@@ -668,6 +676,7 @@ conf_mat(a2, pred, y, dnn=c("Truth", "Pred"))
 
 ```{r}
 # Write out the confusion matrix in standard form
+#| label: oconfusion-matrix-tidy
 cm <- a2 %>% count(y, pred) |>
   group_by(y) |>
   mutate(cl_err = n[pred==y]/sum(n)) 
@@ -723,6 +732,7 @@ Need [predictive probabilities]{.monash-orange2}, probability of being each clas
 ::: {.column}
 
 ```{r}
+#| label: roc-curve
 a2 |> slice_head(n=3)
 roc_curve(a2, y, bilby) |>
   autoplot()
@@ -768,6 +778,7 @@ roc_curve(a2, y, bilby) |>
 ```{r}
 #| echo: false
 #| eval: false
+#| label: sine-curve-data
 # Generate the sine-curve data
 set.seed(1259)
 x1 <- runif(340)
@@ -790,6 +801,7 @@ write_csv(d, file="data/sine-curve-test.csv")
 ```
 
 ```{r}
+#| label: sine-curve-plot
 #| echo: false
 #| fig-width: 4
 #| fig-height: 4
@@ -817,6 +829,7 @@ ggplot(w, aes(x=x1, y=x2, colour = cl)) +
 ::: {.column width=30%}
 
 ```{r}
+#| label: sin-curve-models
 #| echo: false
 #| fig-width: 4
 #| fig-height: 8
@@ -847,7 +860,8 @@ p2 <- ggplot(w_grid, aes(x=x1, y=x2, colour = prf)) +
   ggtitle("Non-parametric") +
   theme(legend.position = "none",
         axis.text = element_blank())
-p1 + p2 + plot_layout(ncol=1)
+#p1 + p2 + plot_layout(ncol=1)
+grid.arrange(p1, p2)
 ```
 
 :::
@@ -861,6 +875,7 @@ p1 + p2 + plot_layout(ncol=1)
 
 ::: {.column width=30%}
 ```{r}
+#| label: model-errors1
 #| echo: false
 #| fig-width: 5
 #| fig-height: 5
@@ -893,6 +908,7 @@ ggplot(w, aes(x=x1, y=x2,
 ::: {.column width=30%}
 
 ```{r}
+#| label: model-errors2
 #| echo: false
 #| fig-width: 5
 #| fig-height: 5
@@ -921,6 +937,7 @@ If the [model form is incorrect]{.monash-umber2}, the error (solid circles) may
 ## Flexible vs inflexible
 
 ```{r}
+#| label: flexible-model
 #| echo: false
 #| fig-width: 9
 #| fig-height: 3
@@ -957,7 +974,8 @@ p3 <- ggplot(w_grid, aes(x=x1, y=x2, colour = prf)) +
         axis.text = element_blank(),
         axis.title = element_blank())
 
-p1 + p2 + p3 + plot_layout(ncol=3)
+#p1 + p2 + p3 + plot_layout(ncol=3)
+grid.arrange(p1, p2, p3, ncol=3)
 ```
 
 [Parametric]{.monash-orange2} models tend to be [less flexible]{.monash-orange2} but [non-parametric]{.monash-blue2}  models can be flexible or less flexible depending on [parameter settings]{.monash-blue2}.
@@ -995,6 +1013,7 @@ refers to how much your estimate would change if you had different training data
 ::: {.column}
 
 ```{r}
+#| label: bias1
 #| echo: false
 #| fig-width: 4
 #| fig-height: 4
@@ -1008,6 +1027,7 @@ p1 + ggtitle("Large bias")
 ::: {.column}
 
 ```{r}
+#| label: bias2
 #| echo: false
 #| fig-width: 4
 #| fig-height: 4
@@ -1026,6 +1046,7 @@ p3 + ggtitle("Small bias")
 ::: {.column}
 
 ```{r}
+#| label: variance1
 #| echo: false
 #| fig-width: 4
 #| fig-height: 4
@@ -1039,6 +1060,7 @@ p1 + ggtitle("Small variance")
 ::: {.column}
 
 ```{r}
+#| label: variance2
 #| echo: false
 #| fig-width: 4
 #| fig-height: 4
@@ -1063,6 +1085,7 @@ Goal: Without knowing what the true structure is, fit the signal and ignore the
 ## Trade-off between accuracy and interpretability
 
 ```{r}
+#| label: tradeoff-acc-interp
 #| echo: false
 #| fig-width: 5
 #| fig-height: 5
@@ -1111,6 +1134,7 @@ Compute and examine the [usual diagnostics]{.monash-blue2}, some methods have mo
 [*Training - plusses; Test - dots*]{.smaller .center}
 
 ```{r}
+#| label: training-test
 #| echo: false
 #| fig-width: 4
 #| fig-height: 4

diff --git a/setup.R b/setup.R
@@ -6,6 +6,7 @@ library(colorspace)
 library(patchwork)
 library(MASS)
 library(randomForest)
+library(gridExtra)
 
 
 # Locations