Skip to content

Commit

Permalink
updated pdf and html
Browse files Browse the repository at this point in the history
  • Loading branch information
dicook committed Feb 27, 2024
1 parent bbf75cc commit a13be89
Show file tree
Hide file tree
Showing 8 changed files with 162 additions and 17 deletions.
93 changes: 92 additions & 1 deletion docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@
"href": "week1/slides.html#confusion-misclassification-matrix-computing",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Confusion (misclassification) matrix: computing",
"text": "Confusion (misclassification) matrix: computing\n\n\nTwo classes\n\n# Write out the confusion matrix in standard form\ncm <- a2 %>% count(y, pred) |>\n group_by(y) |>\n mutate(cl_err = n[pred==y]/sum(n)) \ncm |>\n pivot_wider(names_from = pred, \n values_from = n) |>\n select(y, bilby, quokka, cl_err)\n\n# A tibble: 2 × 4\n# Groups: y [2]\n y bilby quokka cl_err\n <fct> <int> <int> <dbl>\n1 bilby 9 3 0.75 \n2 quokka 5 10 0.667\n\n\n\naccuracy(a2, y, pred) |> pull(.estimate)\n\n[1] 0.704\n\nbal_accuracy(a2, y, pred) |> pull(.estimate)\n\n[1] 0.708\n\nsens(a2, y, pred) |> pull(.estimate)\n\n[1] 0.75\n\nspecificity(a2, y, pred) |> pull(.estimate)\n\n[1] 0.667\n\n\n\nMore than two classes\n\n# Write out the confusion matrix in standard form\ncm3 <- a3 %>% count(y, pred) |>\n group_by(y) |>\n mutate(cl_err = n[pred==y]/sum(n)) \ncm3 |>\n pivot_wider(names_from = pred, \n values_from = n, values_fill=0) |>\n select(y, bilby, quokka, numbat, cl_err)\n\n# A tibble: 3 × 5\n# Groups: y [3]\n y bilby quokka numbat cl_err\n <fct> <int> <int> <int> <dbl>\n1 bilby 9 3 0 0.75 \n2 numbat 0 2 6 0.75 \n3 quokka 5 10 0 0.667\n\n\n\naccuracy(a3, y, pred) |> pull(.estimate)\n\n[1] 0.714\n\nbal_accuracy(a3, y, pred) |> pull(.estimate)\n\n[1] 0.783"
"text": "Confusion (misclassification) matrix: computing\n\n\nTwo classes\n\n# Write out the confusion matrix in standard form\n#| label: oconfusion-matrix-tidy\ncm <- a2 %>% count(y, pred) |>\n group_by(y) |>\n mutate(cl_err = n[pred==y]/sum(n)) \ncm |>\n pivot_wider(names_from = pred, \n values_from = n) |>\n select(y, bilby, quokka, cl_err)\n\n# A tibble: 2 × 4\n# Groups: y [2]\n y bilby quokka cl_err\n <fct> <int> <int> <dbl>\n1 bilby 9 3 0.75 \n2 quokka 5 10 0.667\n\n\n\naccuracy(a2, y, pred) |> pull(.estimate)\n\n[1] 0.704\n\nbal_accuracy(a2, y, pred) |> pull(.estimate)\n\n[1] 0.708\n\nsens(a2, y, pred) |> pull(.estimate)\n\n[1] 0.75\n\nspecificity(a2, y, pred) |> pull(.estimate)\n\n[1] 0.667\n\n\n\nMore than two classes\n\n# Write out the confusion matrix in standard form\ncm3 <- a3 %>% count(y, pred) |>\n group_by(y) |>\n mutate(cl_err = n[pred==y]/sum(n)) \ncm3 |>\n pivot_wider(names_from = pred, \n values_from = n, values_fill=0) |>\n select(y, bilby, quokka, numbat, cl_err)\n\n# A tibble: 3 × 5\n# Groups: y [3]\n y bilby quokka numbat cl_err\n <fct> <int> <int> <int> <dbl>\n1 bilby 9 3 0 0.75 \n2 numbat 0 2 6 0.75 \n3 quokka 5 10 0 0.667\n\n\n\naccuracy(a3, y, pred) |> pull(.estimate)\n\n[1] 0.714\n\nbal_accuracy(a3, y, pred) |> pull(.estimate)\n\n[1] 0.783"
},
{
"objectID": "week1/slides.html#receiver-operator-curves-roc",
Expand Down Expand Up @@ -733,5 +733,96 @@
"title": "Week 8: Support vector machines and nearest neighbours",
"section": "Assignments",
"text": "Assignments\n\nAssignment 3 is due on Friday 26 April."
},
{
"objectID": "week2/slides.html#next-re-sampling-and-regularisation",
"href": "week2/slides.html#next-re-sampling-and-regularisation",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Next: Re-sampling and regularisation",
"text": "Next: Re-sampling and regularisation\n\n\n\nETC3250/5250 Lecture 2 | iml.numbat.space"
},
{
"objectID": "week2/slides.html#overview",
"href": "week2/slides.html#overview",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Overview",
"text": "Overview\nIn this week we will cover:\n\nConceptual framing for visualisation\nCommon methods: scatterplot matrix, parallel coordinates, tours\nDetails on using tours for examining clustering and class structure\nDimension reduction\n\nLinear: principal component analysis\nNon-linear: multidimensional scaling, t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP)\n\nUsing tours to assess dimension reduction"
},
{
"objectID": "week2/slides.html#concepts",
"href": "week2/slides.html#concepts",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Concepts",
"text": "Concepts"
},
{
"objectID": "week2/slides.html#model-in-the-data-space",
"href": "week2/slides.html#model-in-the-data-space",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Model-in-the-data-space",
"text": "Model-in-the-data-space\n\n\n\n\n\nFrom XKCD\n\n\n\n We plot the model on the data to assess whether it fits or is a misfit!\n\n\nDoing this in high-dimensions is considered difficult!\n\n\nSo it is common to only plot the data-in-the-model-space."
},
{
"objectID": "week2/slides.html#data-in-the-model-space",
"href": "week2/slides.html#data-in-the-model-space",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Data-in-the-model-space",
"text": "Data-in-the-model-space"
},
{
"objectID": "week2/slides.html#how-do-you-visualise-beyond-2d",
"href": "week2/slides.html#how-do-you-visualise-beyond-2d",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "How do you visualise beyond 2D?",
"text": "How do you visualise beyond 2D?"
},
{
"objectID": "week2/slides.html#scatterplot-matrix",
"href": "week2/slides.html#scatterplot-matrix",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Scatterplot matrix",
"text": "Scatterplot matrix"
},
{
"objectID": "week2/slides.html#parallel-coordinate-plot",
"href": "week2/slides.html#parallel-coordinate-plot",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Parallel coordinate plot",
"text": "Parallel coordinate plot"
},
{
"objectID": "week2/slides.html#tours",
"href": "week2/slides.html#tours",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Tours",
"text": "Tours"
},
{
"objectID": "week2/slides.html#dimension-reduction",
"href": "week2/slides.html#dimension-reduction",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "Dimension reduction",
"text": "Dimension reduction"
},
{
"objectID": "week2/slides.html#pca",
"href": "week2/slides.html#pca",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "PCA",
"text": "PCA"
},
{
"objectID": "week2/slides.html#t-sne",
"href": "week2/slides.html#t-sne",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "t-SNE",
"text": "t-SNE"
},
{
"objectID": "week2/slides.html#umap",
"href": "week2/slides.html#umap",
"title": "ETC3250/5250 Introduction to Machine Learning",
"section": "UMAP",
"text": "UMAP"
}
]
2 changes: 1 addition & 1 deletion docs/site_libs/quarto-html/quarto-syntax-highlighting.css

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
</url>
<url>
<loc>https://iml.numbat.space/week1/slides.html</loc>
<lastmod>2024-02-23T03:19:07.461Z</lastmod>
<lastmod>2024-02-27T01:22:58.414Z</lastmod>
</url>
<url>
<loc>https://iml.numbat.space/resources.html</loc>
Expand Down Expand Up @@ -68,4 +68,8 @@
<loc>https://iml.numbat.space/week8/index.html</loc>
<lastmod>2024-02-05T21:28:13.682Z</lastmod>
</url>
<url>
<loc>https://iml.numbat.space/week2/slides.html</loc>
<lastmod>2024-02-27T01:01:22.011Z</lastmod>
</url>
</urlset>
21 changes: 11 additions & 10 deletions docs/week1/slides.html

Large diffs are not rendered by default.

Binary file added docs/week1/slides.pdf
Binary file not shown.
28 changes: 26 additions & 2 deletions docs/week1/slides.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Tutors:
2. **Unsupervised learning**: $y_i$ [unavailable]{.monash-orange2} for all $x_i$

```{r fig.width=6, fig.height=4, fig.align='center', echo=FALSE}
#| label: overview-methods
library(tidyverse)
library(gapminder)
library(gridExtra)
Expand Down Expand Up @@ -162,6 +163,7 @@ This is also considered the matrix of predictors, or explanatory or independent
::: {.column}

```{r}
#| label: generate-mv-data
library(mvtnorm)
vc <- matrix(c(1, 0.5, 0.2,
0.5, 1, -0.3,
Expand All @@ -180,6 +182,7 @@ What's the dimension of the data?
::: {.column}

```{r}
#| label: load-penguins
library(palmerpenguins)
p_tidy <- penguins |>
select(species, bill_length_mm:body_mass_g) |>
Expand Down Expand Up @@ -531,6 +534,7 @@ When data are reused for multiple tasks, instead of carefully *spent* from the f
::: {.column}

```{r}
#| label: balanced-data
d_bal <- tibble(y=c(rep("A", 6), rep("B", 6)),
x=c(runif(12)))
d_bal$y
Expand All @@ -544,6 +548,7 @@ testing(d_bal_split)$y
::: {.column}

```{r}
#| label: unbalanced-data
d_unb <- tibble(y=c(rep("A", 2), rep("B", 10)),
x=c(runif(12)))
d_unb$y
Expand All @@ -557,6 +562,7 @@ testing(d_unb_split)$y
Always [stratify splitting]{.monash-orange2} by sub-groups, especially response variable classes.

```{r}
#| label: unbalanced-split
d_unb_strata <- initial_split(d_unb, prop = 0.70, strata=y)
training(d_unb_strata)$y
testing(d_unb_strata)$y
Expand Down Expand Up @@ -625,6 +631,7 @@ Two classes

```{r}
#| echo: false
#| label: predictive-class-example
a2 <- tibble(y = c(rep("bilby", 12),
rep("quokka", 15)),
pred = c(rep("bilby", 9),
Expand Down Expand Up @@ -660,6 +667,7 @@ a3 <- a3 |>
#| eval: false
#| echo: false
# tidymodels has it transposed
#| label: confusion-matrix
cm <- conf_mat(a2, y, pred)
autoplot(cm)
# Make it show in right direction
Expand All @@ -668,6 +676,7 @@ conf_mat(a2, pred, y, dnn=c("Truth", "Pred"))

```{r}
# Write out the confusion matrix in standard form
#| label: oconfusion-matrix-tidy
cm <- a2 %>% count(y, pred) |>
group_by(y) |>
mutate(cl_err = n[pred==y]/sum(n))
Expand Down Expand Up @@ -723,6 +732,7 @@ Need [predictive probabilities]{.monash-orange2}, probability of being each clas
::: {.column}

```{r}
#| label: roc-curve
a2 |> slice_head(n=3)
roc_curve(a2, y, bilby) |>
autoplot()
Expand Down Expand Up @@ -768,6 +778,7 @@ roc_curve(a2, y, bilby) |>
```{r}
#| echo: false
#| eval: false
#| label: sine-curve-data
# Generate the sine-curve data
set.seed(1259)
x1 <- runif(340)
Expand All @@ -790,6 +801,7 @@ write_csv(d, file="data/sine-curve-test.csv")
```

```{r}
#| label: sine-curve-plot
#| echo: false
#| fig-width: 4
#| fig-height: 4
Expand Down Expand Up @@ -817,6 +829,7 @@ ggplot(w, aes(x=x1, y=x2, colour = cl)) +
::: {.column width=30%}

```{r}
#| label: sin-curve-models
#| echo: false
#| fig-width: 4
#| fig-height: 8
Expand Down Expand Up @@ -847,7 +860,8 @@ p2 <- ggplot(w_grid, aes(x=x1, y=x2, colour = prf)) +
ggtitle("Non-parametric") +
theme(legend.position = "none",
axis.text = element_blank())
p1 + p2 + plot_layout(ncol=1)
#p1 + p2 + plot_layout(ncol=1)
grid.arrange(p1, p2)
```

:::
Expand All @@ -861,6 +875,7 @@ p1 + p2 + plot_layout(ncol=1)

::: {.column width=30%}
```{r}
#| label: model-errors1
#| echo: false
#| fig-width: 5
#| fig-height: 5
Expand Down Expand Up @@ -893,6 +908,7 @@ ggplot(w, aes(x=x1, y=x2,
::: {.column width=30%}

```{r}
#| label: model-errors2
#| echo: false
#| fig-width: 5
#| fig-height: 5
Expand Down Expand Up @@ -921,6 +937,7 @@ If the [model form is incorrect]{.monash-umber2}, the error (solid circles) may
## Flexible vs inflexible

```{r}
#| label: flexible-model
#| echo: false
#| fig-width: 9
#| fig-height: 3
Expand Down Expand Up @@ -957,7 +974,8 @@ p3 <- ggplot(w_grid, aes(x=x1, y=x2, colour = prf)) +
axis.text = element_blank(),
axis.title = element_blank())
p1 + p2 + p3 + plot_layout(ncol=3)
#p1 + p2 + p3 + plot_layout(ncol=3)
grid.arrange(p1, p2, p3, ncol=3)
```

[Parametric]{.monash-orange2} models tend to be [less flexible]{.monash-orange2} but [non-parametric]{.monash-blue2} models can be flexible or less flexible depending on [parameter settings]{.monash-blue2}.
Expand Down Expand Up @@ -995,6 +1013,7 @@ refers to how much your estimate would change if you had different training data
::: {.column}

```{r}
#| label: bias1
#| echo: false
#| fig-width: 4
#| fig-height: 4
Expand All @@ -1008,6 +1027,7 @@ p1 + ggtitle("Large bias")
::: {.column}

```{r}
#| label: bias2
#| echo: false
#| fig-width: 4
#| fig-height: 4
Expand All @@ -1026,6 +1046,7 @@ p3 + ggtitle("Small bias")
::: {.column}

```{r}
#| label: variance1
#| echo: false
#| fig-width: 4
#| fig-height: 4
Expand All @@ -1039,6 +1060,7 @@ p1 + ggtitle("Small variance")
::: {.column}

```{r}
#| label: variance2
#| echo: false
#| fig-width: 4
#| fig-height: 4
Expand All @@ -1063,6 +1085,7 @@ Goal: Without knowing what the true structure is, fit the signal and ignore the
## Trade-off between accuracy and interpretability

```{r}
#| label: tradeoff-acc-interp
#| echo: false
#| fig-width: 5
#| fig-height: 5
Expand Down Expand Up @@ -1111,6 +1134,7 @@ Compute and examine the [usual diagnostics]{.monash-blue2}, some methods have mo
[*Training - plusses; Test - dots*]{.smaller .center}

```{r}
#| label: training-test
#| echo: false
#| fig-width: 4
#| fig-height: 4
Expand Down
1 change: 1 addition & 0 deletions setup.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ library(colorspace)
library(patchwork)
library(MASS)
library(randomForest)
library(gridExtra)


# Locations
Expand Down
Loading

0 comments on commit a13be89

Please sign in to comment.