-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathslides.qmd
536 lines (332 loc) · 14.9 KB
/
slides.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
---
title: "Introduction to Neural Networks with PyTorch"
subtitle: "ICCS Summer School 2024"
bibliography: references.bib
format:
revealjs:
embed-resources: true
slide-number: true
chalkboard: false
preview-links: auto
history: false
logo: https://iccs.cam.ac.uk/sites/iccs.cam.ac.uk/files/logo2_2.png
theme: [dark, custom.scss]
render-on-save: true
authors:
# - name: Jack Atkinson
# orcid: 0000-0001-5001-4812
# affiliations: ICCS/Cambridge
# - name: Jim Denholm
# affiliations: Cambridge
# orcid: 0000-0002-2389-3134
- name: Matt Archer
affiliations: ICCS/Cambridge
orcid: 0009-0002-7043-6769
- name: Surbhi Goel
affiliations: ICCS/Cambridge
orcid: 0009-0005-0237-756X
revealjs-plugins:
- attribution
---
## Rough Schedule {.smaller}
:::: {.columns}
::: {.column width=50%}
* 9:00-9:30 - NN lecture
* 9:30-10:30 - Teaching/Code-along
* 10:30-11:00 - Coffee
* 11:00-12:00 - Teaching/Code-along
Lunch
* 12:00 - 13:30
::: {style="color: turquoise;"}
Helping Today:
* Person 1 - Cambridge RSE
:::
:::
::::
## Material {.smaller}
These slides can be viewed at:
- [https://cambridge-iccs.github.io/practical-ml-with-pytorch](https://cambridge-iccs.github.io/practical-ml-with-pytorch)
The html and source can be found [on GitHub](https://github.com/Cambridge-ICCS/practical-ml-with-pytorch). Follow this link:
- [https://tinyurl.com/ml-iccs-24](https://tinyurl.com/ml-iccs-24)
\
\
Based on the workshop developed by [Jack Atkinson](https://orcid.org/0000-0001-5001-4812) and [Jim Denholm](https://orcid.org/0000-0002-2389-3134):
- [github.com/Cambridge-ICCS/practical-ml-with-pytorch](https://github.com/Cambridge-ICCS/practical-ml-with-pytorch)
- [LICENSE](https://github.com/Cambridge-ICCS/practical-ml-with-pytorch/blob/main/LICENSE)
V1.0 released and JOSE paper accepted:
- [@atkinson2024practical]
<!--
## NCAS School (rough) Schedule {.smaller}
:::: {.columns}
::: {.column width=50%}
AM session - Fitzwilliam College
* 9:00-9:30 - ML lecture
* 9:30-10:30 - Teaching/Code-along
* 10:30-11:00 - Coffee
* 11:00-12:00 - Teaching/Code-along
* 12:00-12:30 - CNN Lecture
Lunch
* 12:30 - 13:30
:::
::: {.column width=50%}
PM session - Computer Lab
* 13:30-15:30 - CNN exercise in groups
* 15:30-16:00 - Tea, `GOTO SS03`
* 16:00-16:15 - CNN Solution recap
* 16:15-17:00 - Climate applications of ML
::: {style="color: turquoise;"}
Helping Today:
* Jack Atkinson - ICCS Climate RSE
* Dominic Orchard - Kent/Cambridge CompSci
* Matt Archery - Cambridge RSE
:::
:::
::::
-->
# Part 1: Neural-network basics -- and fun applications.
## Stochastic gradient descent (SGD)
- Generally speaking, most neural networks are fit/trained using SGD (or some variant of it).
- To understand how one might fit a function with SGD, let's start with a straight line: $$y=mx+c$$
## Fitting a straight line with SGD I {.smaller}
- **Question**---when we a differentiate a function, what do we get?
::: {.fragment .fade-in}
- Consider:
$$y = mx + c$$
$$\frac{dy}{dx} = m$$
- $m$ is certainly $y$'s slope, but is there a (perhaps) more fundamental way to view a derivative?
:::
## Fitting a straight line with SGD II {.smaller}
- **Answer**---a function's derivative gives a _vector_ which points in the direction of _steepest ascent_.
::: {.fragment .fade-in}
:::: {.columns}
::: {.column width="50%"}
- Consider
$$y = x$$
$$\frac{dy}{dx} = 1$$
- What is the direction of _steepest descent?_
$$-\frac{dy}{dx}$$
:::
::::
{.absolute top=35% left=60% width=40%}
:::
## Fitting a straight line with SGD III {.smaller}
- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$.
- We therefore need a way of measuring how well a model's predictions match our observations.
::: {.fragment .fade-in}
:::: {.columns}
::: {.column width="30%"}
- Consider the data:
| $x_{i}$ | $y_{i}$ |
|:--------:|:-------:|
| 1.0 | 2.1 |
| 2.0 | 3.9 |
| 3.0 | 6.2 |
:::
::: {.column width="70%"}
- We can measure the distance between $f(x_{i})$ and $y_{i}$.
- Normally we might consider the mean-squared error:
$$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
:::
::::
:::
::: {.fragment .fade-in}
- We can differentiate the loss function w.r.t. to each parameter in the the model $f$.
- We can use these directions of steepest descent to iteratively 'nudge' the parameters in a direction which will reduce the loss.
:::
## Fitting a straight line with SGD IV {.smaller}
:::: {.columns}
::: {.column width="45%"}
- Model: \ $f(x) = mx + c$
- Data: \ $\{x_{i}, y_{i}\}$
- Loss: \ $\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}$
:::
::: {.column width="55%"}
$$
\begin{align}
L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\
&= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2}
\end{align}
$$
:::
::::
::: {.fragment .fade-in}
- We can iteratively minimise the loss by stepping the model's parameters in the direction of steepest descent:
::: {layout="[0.5, 1, 0.5, 1, 0.5]"}
:::: {#placeholder}
::::
$$m_{n + 1} = m_{n} - \frac{dL}{dm} \cdot l_{r}$$
:::: {#placeholder}
::::
$$c_{n + 1} = c_{n} - \frac{dL}{dc} \cdot l_{r}$$
:::: {#placeholder}
::::
:::
- where $l_{\text{r}}$ is a small constant known as the _learning rate_.
:::
## Quick recap {.smaller}
To fit a model we need:
- Some^[Well, a suitable amount of - often a lot.] data.
- A model.
- A loss function
- An optimisation procedure (often SGD and other flavours of SGD).
## What about neural networks? {.smaller}
- Neural networks are just functions.
- We can "train", or "fit", them as we would any other function:
- by iteratively nudging parameters to minimise a loss.
- With neural networks, differentiating the loss function is a bit more complicated
- but ultimately it's just the chain rule.
- We won't go through any more maths on the matter---learning resources on the topic are in no short supply.^[The term to search for is ['backpropogation'](https://en.wikipedia.org/wiki/Backpropagation).]
## Fully-connected neural networks {.smaller}
- The simplest neural networks commonly used are generally called fully-connected neural nets, dense networks, multi-layer perceptrons, or artifical neural networks (ANNs).
:::: {.columns}
::: {.column width=40%}
- We map between the features at consecutive layers through matrix multiplication and the application of some non-linear activation function.
$$a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)$$
- For common choices of activation function, see the [PyTorch](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) docs.
:::
::::
{style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
::: {.attribution}
Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
:::
## Uses: Classification and Regression {.smaller}
- Fully-connected neural networks are often applied to tabular data.
- i.e. where it makes sense to express the data in a table-like object (such as a `pandas` data frame).
- The input features and targets are represented as vectors.
::: {.fragment .fade-in}
- Neural networks are normally used for one of two things:
- **Classification**: assigning a semantic label to something -- i.e. is this a dog or cat?
- **Regression**: Estimating a continuous quantity -- e.g. mass or volume -- based on other information.
:::
# Python and PyTorch {.smaller}
- In this workshop, we will implement some straightforward neural networks in PyTorch, and use them for different classification and regression problems.
- PyTorch is a deep learning framework that can be used in both Python and C++.
- I have never met anyone actually training models in C++; I find it a bit weird.
- See the PyTorch website: [https://pytorch.org/](https://pytorch.org/)
# Resources
- [coursera.org/machine-learning-introduction](https://www.coursera.org/specializations/machine-learning-introduction/?utm_medium=coursera&utm_source=home-page&utm_campaign=mlslaunch2022IN)
- [uvadlc](https://uvadlc-notebooks.readthedocs.io/en/latest/)
- [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
# Exercises
## Penguins!
<!---->

::: {.attribution}
Image source: [Palmer Penguins by Alison Horst](https://allisonhorst.github.io/palmerpenguins)
:::
## Exercise 1 -- classification
- In this exercise, you will train a fully-connected neural network to [*classify the species*]{style="text-decoration: underline;"} of penguins based on certain physical features.
- [https://github.com/allisonhorst/palmerpenguins](https://github.com/allisonhorst/palmerpenguins)
## Exercise 2 -- regression
- In this exercise, you will train a fully-connected neural network to [*predict the mass*]{style="text-decoration: underline;"} of penguins based on other physical features.
- [https://github.com/allisonhorst/palmerpenguins](https://github.com/allisonhorst/palmerpenguins)
# Part 2: Fun with CNNs
## Convolutional neural networks (CNNs): why? {.smaller}
Advantages over simple ANNs:
- They require far fewer parameters per layer.
- The forward pass of a conv layer involves running a filter of fixed size over the inputs.
- The number of parameters per layer _does not_ depend on the input size.
- They are a much more natural choice of function for *image-like* data:
:::: {.columns}
::: {.column width=10%}
:::
::: {.column width=35%}

:::
::: {.column width=10%}
:::
::: {.column width=35%}

:::
::::
::: {.attribution}
Image source: [Machine Learning Mastery](https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/)
:::
## Convolutional neural networks (CNNs): why? {.smaller}
Some other points:
- Convolutional layers are translationally invariant:
- i.e. they don't care _where_ the "dog" is in the image.
- Convolutional layers are _not_ rotationally invariant.
- e.g. a model trained to detect correctly-oriented human faces will likely fail on upside-down images
- We can address this with data augmentation (explored in exercises).
## What is a (1D) convolutional layer? {.smaller}

See the [`torch.nn.Conv1d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html)
## 2D convolutional layer {.smaller}
- Same idea as in on dimension, but in two (funnily enough).

- Everthing else proceeds in the same way as with the 1D case.
- See the [`torch.nn.Conv2d` docs](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).
- As with Linear layers, Conv2d layers also have non-linear activations applied to them.
## Typical CNN overview {.smaller}
::: {layout="[ 0.5, 0.5 ]"}

- Series of conv layers extract features from the inputs.
- Often called an encoder.
- Adaptive pooling layer:
- Image-like objects $\to$ vectors.
- Standardises size.
- [``torch.nn.AdaptiveAvgPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html)
- [``torch.nn.AdaptiveMaxPool2d``](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html)
- Classification (or regression) head.
:::
- For common CNN architectures see [``torchvision.models`` docs](https://pytorch.org/vision/stable/models.html).
::: {.attribution}
Image source: [medium.com - binary image classifier cnn using tensorflow](https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697)
:::
# Exercises
## Exercise 1 -- classification
### MNIST hand-written digits.
::: {layout="[ 0.5, 0.5 ]"}

- In this exercise we'll train a CNN to classify hand-written digits in the MNIST dataset.
- See the [MNIST database wiki](https://en.wikipedia.org/wiki/MNIST_database) for more details.
:::
::: {.attribution}
Image source: [npmjs.com](https://www.npmjs.com/package/mnist)
:::
## Exercise 2---regression
### Random ellipse problem
- In this exercise, we'll train a CNN to estimate the centre $(x_{\text{c}}, y_{\text{c}})$ and the $x$ and $y$ radii of an ellipse defined by
$$
\frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1
$$
- The ellipse, and its background, will have random colours chosen uniformly on $\left[0,\ 255\right]^{3}$.
- In short, the model must learn to estimate $x_{\text{c}}$, $y_{\text{c}}$, $r_{x}$ and $r_{y}$.
<!-- # Further information -->
<!-- ## Slides
These slides can be viewed at:
[https://cambridge-iccs.github.io/practical-ml-with-pytorch](https://cambridge-iccs.github.io/practical-ml-with-pytorch)
The html and source can be found [on GitHub](https://github.com/Cambridge-ICCS/practical-ml-with-pytorch). -->
## Contact {.smaller}
For more information we can be reached at:
:::: {.columns style="font-size: 60%"}
::: {.column width="25%"}
{{< fa pencil >}} \ Matt Archer
{{< fa solid person-digging >}} \ [ICCS/UoCambridge](https://iccs.cam.ac.uk/about-us/our-team)
{{< fa solid envelope >}} \ [ma595[AT]cam.ac.uk](mailto:[email protected])
{{< fa brands github >}} \ [ma595](https://github.com/ma595)
:::
::: {.column width="25%"}
{{< fa pencil >}} \ Surbhi Goel
{{< fa solid person-digging >}} \ [ICCS/UoCambridge](https://iccs.cam.ac.uk/about-us/our-team)
{{< fa solid envelope >}} \ [sg2147[AT]cam.ac.uk](mailto:[email protected])
{{< fa brands github >}} \ [surbhigoel77](https://github.com/surbhigoel77)
:::
::: {.column width="25%"}
{{< fa pencil >}} \ Jack Atkinson
{{< fa solid person-digging >}} \ [ICCS/UoCambridge](https://iccs.cam.ac.uk/about-us/our-team)
{{< fa solid globe >}} \ [jackatkinson.net](https://jackatkinson.net)
{{< fa solid envelope >}} \ [jwa34[AT]cam.ac.uk](mailto:[email protected])
{{< fa brands github >}} \ [jatkinson1000](https://github.com/jatkinson1000)
{{< fa brands mastodon >}} \ [\@jatkinson1000\@fosstodon.org](https://fosstodon.org/@jatkinson1000)
:::
::: {.column width="25%"}
{{< fa pencil >}} \ Jim Denholm
{{< fa solid person-digging >}} \ UoCambridge
{{< fa solid globe >}} \ [linkedin](https://uk.linkedin.com/in/jim-denholm-13043b189)
{{< fa solid envelope >}} \ [jd949[AT]cam.ac.uk](mailto:[email protected])
{{< fa brands github >}} \ [jdenholm](https://github.com/jdenholm)
:::
::::
You can also contact the ICCS, [make a resource allocation request](https://iccs.cam.ac.uk/resources-vesri-members/resource-allocation-process), or visit us at the [Summer School RSE Helpdesk](https://docs.google.com/spreadsheets/d/1WKZxp3nqpXrIRMRkfFzc71sos-UD-Uy1zeab0c1p7Xc/edit#gid=0).