Skip to content

Commit

Permalink
use grid cards in readme
Browse files Browse the repository at this point in the history
  • Loading branch information
SkafteNicki committed Apr 10, 2024
1 parent 77e522e commit 551cca7
Show file tree
Hide file tree
Showing 16 changed files with 308 additions and 138 deletions.
Binary file added figures/icons/gcp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,9 @@ nav:
- M14 - Boilerplate: s4_debugging_and_logging/boilerplate.md
- S5 - Continuous Integration ✔️:
- s5_continuous_integration/README.md
- M15 - Unittesting: s5_continuous_integration/unittesting.md
- M15 - Unit testing: s5_continuous_integration/unittesting.md
- M16 - Github Actions: s5_continuous_integration/github_actions.md
- M17 - Pre commit: s5_continuous_integration/pre_commit.md
- M17 - Pre-commit: s5_continuous_integration/pre_commit.md
- M18 - Continuous Containers: s5_continuous_integration/auto_docker.md
- M19 - Continuous Machine Learning: s5_continuous_integration/cml.md
- S6 - The cloud 🌐:
Expand Down
33 changes: 27 additions & 6 deletions s10_extra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,30 @@
All modules listed here are not part of the core course, but expands on some of the other topics.
Some of them may still be under construction and may in the future be moved into other sessions.

<p align="center">
<img src="../figures/icons/click.png" width="130">
<img src="../figures/icons/material.png" width="130">
<img src="../figures/icons/optuna.png" width="130">
<img src="../figures/icons/pbs.png" width="130">
</p>
<div class="grid cards" markdown>

- ![](../figures/icons/click.png){align=right : style="height:100px;width:100px"}

Learn how to setup a simple command line interface for your application

[:octicons-arrow-right-24: M30: Command Line Interfaces](click.md)

- ![](../figures/icons/material.png){align=right : style="height:100px;width:100px"}

Learn how to setup a simple documentation system for your application

[:octicons-arrow-right-24: M31: Documentation](documentation.md)

- ![](../figures/icons/optuna.png){align=right : style="height:100px;width:100px"}

Learn how to do hyperparameter optimization using Optuna

[:octicons-arrow-right-24: M32: Hyperparameter Optimization](hyperparameters.md)

- ![](../figures/icons/pbs.png){align=right : style="height:100px;width:100px"}

Learn how to use HPC systems that uses PBS to do job scheduling

[:octicons-arrow-right-24: M33: High Performance Clusters](pbs.md)

</div>
4 changes: 2 additions & 2 deletions s10_extra/hyperparameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ Hyperparameter optimization is not a new idea within machine learning but have s
the uprise of deep learning. This can mainly be contributed to the following:

* Trying to beat state-of-the-art often comes down to very small differences in performance, and hyperparameter
optimization can help squeeze out a bit more
optimization can help squeeze out a bit more
* Deep learning models are in general not that robust towards the choice of hyparameter so choosing the wrong set
may lead to a model that does not work
may lead to a model that does not work

However the problem with doing hyperparameter optimization of a deep learning models is that it can take over a
week to train a single model. In most cases we therefore cannot do a full grid search of all hyperparameter
Expand Down
44 changes: 33 additions & 11 deletions s1_development_environment/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,38 @@
# Getting started - Setting up a development environment
# Setting up a development environment

[Slides](../slides/DeepLearningSoftware.pdf){ .md-button }

<p align="center">
<img src="../figures/icons/terminal.png" width="130">
<img src="../figures/icons/conda.png" width="130">
<img src="../figures/icons/vscode.png" width="130">
<img src="../figures/icons/pytorch.png" width="130">
</p>
<div class="grid cards" markdown>

Today we start our journey into the world of machine learning operations (MLOps). However, before we can really get
started, we need to make sure that you have a basic understanding of a couple of topics, as we will be using these
- ![](../figures/icons/terminal.png){align=right : style="height:100px;width:100px"}

Learn the basics of the command line, and how to use it to navigate your file system and run programs.

[:octicons-arrow-right-24: M1: Command line](command_line.md)

- ![](../figures/icons/conda.png){align=right : style="height:100px;width:100px"}

Learn how package managers work in Python and how to create reproducible virtual environments using
`conda` and `pip`.

[:octicons-arrow-right-24: M2: Package Manager](package_manager.md)

- ![](../figures/icons/vscode.png){align=right : style="height:100px;width:100px"}

Learn how to use a modern editor for code development.

[:octicons-arrow-right-24: M3: Editor](editor.md)

- ![](../figures/icons/pytorch.png){align=right : style="height:100px;width:100px"}

Refresh your Pytorch skills and implement a simple deep-learning model.

[:octicons-arrow-right-24: M4: Deep Learning Software](deep_learning_software.md)

</div>

Today we start our journey into the world of machine learning operations (MLOps). However, before we can get started, we
need to make sure that you have a basic understanding of a couple of topics, as we will be using these
throughout the course. In particular, today is all about getting set up with a proper development environment that can
support your journey. Most of you probably already have experience with these topics, and it will be mostly repetition.

Expand All @@ -25,5 +47,5 @@ you check out [The Missing Semester of Your CS Education](https://missing.csail.

* Understand the basics of the command line.
* Being able to create reproducible virtual environments.
* Able to use a modern IDE / editor for code development
* Write and run a Python program, implementing a simple deep learning model
* Able to use a modern editor for code development
* Write and run a Python program, implementing a simple deep-learning model
39 changes: 30 additions & 9 deletions s2_organisation_and_version_control/README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,43 @@
# Getting started with MLOps - Organization and version control
# Organization and version control

[Slides](../slides/IntroToMLOps.pdf){ .md-button }

<p align="center">
<img src="../figures/icons/git.png" width="130">
<img src="../figures/icons/cookiecutter.png" width="130">
<img src="../figures/icons/pep8.png" width="130">
<img src="../figures/icons/dvc.png" width="130">
</p>
<div class="grid cards" markdown>

- ![](../figures/icons/git.png){align=right : style="height:100px;width:100px"}

Learn the basics of version control and how to use `git` to track changes to your code and collaborate with others.

[:octicons-arrow-right-24: M5: Git](git.md)

- ![](../figures/icons/cookiecutter.png){align=right : style="height:100px;width:100px"}

Learn how to organize Python code into a library, package it and how to use templates to create new projects.

[:octicons-arrow-right-24: M6: Code Structure](code_structure.md)

- ![](../figures/icons/pep8.png){align=right : style="height:100px;width:100px"}

Learn different coding practices and how to use them to improve the quality of your code.

[:octicons-arrow-right-24: M7: Good Coding Practice](good_coding_practice.md)

- ![](../figures/icons/dvc.png){align=right : style="height:100px;width:100px"}

Learn how to version control data using `dvc`.

[:octicons-arrow-right-24: M8: Data Version Control](dvc.md)

</div>

Today we take our first steps into the world of MLOps. The set of modules in this session focuses on getting organized
and making sure that you are familiar with good development practices. While many of the practices you will learn about
these modules does not seem that important when you are a single person working on a project, it is crucial when
working in large groups that the difference in how different people organize and write their code is minimized.
The topics in this session will focus on:

* Version control for helping tracking and managing changes to your code and data
* Coding practices for staying organized in large projects
- Version control to help tracking and manage changes to your code and data
- Coding practices for staying organized in large projects

<figure markdown>
![Image](../figures/wtf.jpeg){ width="700" }
Expand Down
4 changes: 2 additions & 2 deletions s2_organisation_and_version_control/git.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ For a full explanation please see this [page](https://git-scm.com/book/en/v2/Get

Secondly, it is important to note that GitHub is not git! GitHub is the dominating player when it comes to
hosting repositories but that does not mean that they are the only one providing free repository hosting
(see [bitbucket](https://bitbucket.org/product/) or [gitlab](https://about.gitlab.com/)) for some other examples).
(see [bitbucket](https://bitbucket.org/product/) or [gitlab](https://about.gitlab.com/)) for some other examples.

That said we will be using git and GitHub throughout this course. It is a requirement for passing this course that
you create a public repository with your code and use git to upload any code changes. How much you choose to integrate
Expand Down Expand Up @@ -101,7 +101,7 @@ Of course, the real power of version control is the ability to make branches, as
Each branch can contain code that are not present on other branches. This is useful when you are many developers
working together on the same project.

## ❔ Exercises
### ❔ Exercises

1. In your GitHub account create an repository, where the intention is that you upload the code from the final
exercise from yesterday
Expand Down
25 changes: 18 additions & 7 deletions s3_reproducibility/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,21 @@

[Slides](../slides/ReproducibilityAndSoftware.pdf){ .md-button }

<p align="center">
<img src="../figures/icons/docker.png" width="130">
<img src="../figures/icons/hydra.png" width="130">
</p>
<div class="grid cards" markdown>

- ![](../figures/icons/docker.png){align=right : style="height:100px;width:100px"}

Learn how to create reproducible computing environments using `docker` and how to use them to run your code.

[:octicons-arrow-right-24: M9: Docker](docker.md)

- ![](../figures/icons/hydra.png){align=right : style="height:100px;width:100px"}

Learn how to use `hydra` to manage configuration files and how to integrate it with your code.

[:octicons-arrow-right-24: M10: Config Files](config_files.md)

</div>

Today is all about reproducibility - one of those concepts that everyone agrees is very important and something should
be done about, but the reality is that it is very hard to secure full reproducibility. The last sessions have already
Expand All @@ -30,9 +41,9 @@ of making sure that machine learning is **trustworthy**.
<figcaption>
Many different aspects are needed if trustworthy machine learning is ever going to be a reality. We need robustness of
our pipelines so we can trust that they do not fail under heavy load. We need integrity to make sure that pipelines are
deployed if they are of high quality. We need explainability to make sure that we understand what our machine learning models
are doing, so it is not just a black box. We need reproducibility to make sure that the results of our models can be
reproduced over and over again. Finally, we need fairness to make sure that our models are not biased toward specific
deployed if they are of high quality. We need explainability to make sure that we understand what our machine learning
models are doing, so it is not just a black box. We need reproducibility to make sure that the results of our models can
be reproduced over and over again. Finally, we need fairness to make sure that our models are not biased toward specific
populations. Figure inspired by this<a href="https://arxiv.org/abs/2209.06529"> paper</a>.
</figcaption>
</figure>
Expand Down
64 changes: 42 additions & 22 deletions s4_debugging_and_logging/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,53 @@

[Slides](../slides/DebuggingML.pdf){ .md-button }

<p align="center">
<img src="../figures/icons/debugger.png" width="130">
<img src="../figures/icons/profiler.png" width="130">
<img src="../figures/icons/w&b.png" width="130">
<img src="../figures/icons/lightning.png" width="130">
</p>
<div class="grid cards" markdown>

Today we are initially going to go over three different topics that are all fundamentally necessary for any data
scientist or DevOps engineer:
- ![](../figures/icons/debugger.png){align=right : style="height:100px;width:100px"}

Learn how to use the debugger in your editor to find bugs in your code.

[:octicons-arrow-right-24: M11: Debugging](debugging.md)

- ![](../figures/icons/profiler.png){align=right : style="height:100px;width:100px"}

Learn how to use a profiler to identify bottlenecks in your code and from those profiles optimize the runtime of
your programs.

[:octicons-arrow-right-24: M12: Profiling](profiling.md)

* Debugging
* Profiling
* Logging
- ![](../figures/icons/w&b.png){align=right : style="height:100px;width:100px"}

All three topics can be characterized by something you probably already are familiar with. Since you started programming,
you have done debugging as nobody can write perfect code in the first try. Similarly, while you have not directly
profiled your code, I bet that you at some point have had some very slow code and optimized it to run faster.
Identifying and improving is the fundamentals of profiling code. Finally, logging is a very broad term and basically
refers to any kind of output from your applications that help you at a later point identify the "performance" of
you application.
Learn how to systematically log experiments and hyperparameters to make your code reproducible.

However, while we expect you to already be familiar with these topics, we do not expect all of you to be expects in
this as it is very rarely topics that are focused on. Today we are going to introduce some best practices and tools to
help you overcome each and everyone of these three important topics.
[:octicons-arrow-right-24: M13: Logging](logging.md)

- ![](../figures/icons/lightning.png){align=right : style="height:100px;width:100px"}

Learn how to use `pytorch-lightning` framework to minimize boilerplate code and structure deep learning models.

[:octicons-arrow-right-24: M14: Boilerplate](boilerplate.md)

</div>

Today we are initially going to go over three different topics that are all fundamentally necessary for any data
scientist or DevOps engineer:

As the final topic for today we are going to learn about how we can *minimize* boilerplate and focus on coding what
actually matters for our project instead of all the boilerplate to get it working.
- Debugging
- Profiling
- Logging

All three topics can be characterized by something you probably already are familiar with. Since you started
programming, you have done debugging as nobody can write perfect code on the first try. Similarly, while you have not
directly profiled your code, I bet that you at some point have had some very slow code and optimized it to run faster.
Identifying and improving are the fundamentals of profiling code. Finally, logging is a very broad term and refers to
any kind of output from your applications that helps you at a later point identify the "performance" of you application.

However, while we expect you to already be familiar with these topics, we do not expect all of you to be experts as it
is very rare that these topics are focused on. Today we are going to introduce some best practices and tools to
help you overcome every one of these three important topics. As the final topic for today, we are going to learn about
how we can *minimize* boilerplate and focus on coding what matters for our project instead of all the boilerplate to get
it working.

!!! tip "Learning objectives"

Expand Down
43 changes: 33 additions & 10 deletions s5_continuous_integration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,44 @@

[Slides](../slides/ContinuousIntegration.pdf){ .md-button }

<p align="center">
<img src="../figures/icons/pytest.png" width="130">
<img src="../figures/icons/actions.png" width="130">
<img src="../figures/icons/precommit.png" width="130">
<img src="../figures/icons/dockerhub.png" width="130">
<img src="../figures/icons/cml.png" width="130">
</p>
<div class="grid cards" markdown>

- ![](../figures/icons/pytest.png){align=right : style="height:100px;width:100px"}
Learn how to write unit tests that cover both data and models in your ML pipeline.

[:octicons-arrow-right-24: M15: Unit testing](unittesting.md)

- ![](../figures/icons/actions.png){align=right : style="height:100px;width:100px"}
Learn how to implement CI using Github actions such that tests are automatically executed on code changes.

[:octicons-arrow-right-24: M16: Github Actions](github_actions.md)

- ![](../figures/icons/precommit.png){align=right : style="height:100px;width:100px"}
Learn how to use pre-commit to ensure that code that is not up to standard does not get committed.

[:octicons-arrow-right-24: M17: Pre-commit](pre_commit.md)

- ![](../figures/icons/dockerhub.png){align=right : style="height:100px;width:100px"}

Learn how to implement CI for continuous building of containers.

[:octicons-arrow-right-24: M18: Continuous Containers](auto_docker.md)

- ![](../figures/icons/cml.png){align=right : style="height:100px;width:100px"}

Learn how to implement continuous machine learning pipelines in Github actions.

[:octicons-arrow-right-24: M19: Continuous Machine Learning](cml.md)

</div>

Continues integration is a sub-discipline of the general field of *Continues X*. Continuous X is one of the core
elements of modern DevOps, and by extension MLOps. Continuous X assumes that we have a (long) developer pipeline
(see image below) where we want to make some changes to our code e.g:

* Update our training data or data processing
* Update our model architecture
* Something else...
- Update our training data or data processing
- Update our model architecture
- Something else...

Basically, any code change we will expect will have a influence on the final result. The problem with
doing changes to the start of our pipeline is that we want the change to propagate all the way through
Expand Down
Loading

0 comments on commit 551cca7

Please sign in to comment.