Skip to content

This site contains all document relevant for the Machine Learning courses of the Erasmus+ network

Notifications You must be signed in to change notification settings

camcasuga/MLErasmus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Probability theory and statistical methods play a central role in science. Nowadays we are surrounded by huge amounts of data. For example, there are about one trillion web pages; more than one hour of video is uploaded to YouTube every second, amounting to years of content every day; the genomes of 1000s of people, each of which has a length of more than a billion base pairs, have been sequenced by various labs and so on. This deluge of data calls for automated methods of data analysis, which is exactly what machine learning aims at providing.

Learning outcomes

This course aims at giving you insights and knowledge about many of the central algorithms used in Data Analysis and Machine Learning. The course is project based and through various numerical projects, normally three, you will be exposed to fundamental research problems in these fields, with the aim to reproduce state of the art scientific results. Both supervised and unsupervised methods will be covered. You will learn to develop and structure large codes for studying different systems where Machine Learning is applied to, get acquainted with computing facilities and learn to handle large scientific projects. A good scientific and ethical conduct is emphasized throughout the course. More specifically, after this course you will

  • Learn about basic data analysis, data optimization and machine learning;
  • Be capable of extending the acquired knowledge to other systems and cases;
  • Have an understanding of central algorithms used in data analysis and machine learning;
  • Understand linear methods for regression and classification, from ordinary least squares, via Lasso and Ridge to Logistic regression;
  • Learn about various neural networks and deep learning methods for supervised and unsupervised learning;
  • Learn about about decision trees, random forests and boosting
  • Learn about support vector machines and kernel transformations
  • Reduction of data sets
  • Work on numerical projects to illustrate the theory. The projects play a central role and you are expected to know modern programming languages like Python or C++.

The course has two central parts

  1. Statistical analysis and optimization of data
  2. Machine learning

These topics will be scattered thorughout the course and may not necessarily be taught separately. Rather, we will often take an approach (during the lectures and project/exercise sessions where say elements from statistical data analysis are mixed with specific Machine Learning algorithms.

Statistical analysis and optimization of data

The following topics will be covered

  • Basic concepts, expectation values, variance, covariance, correlation functions and errors;
  • Simpler models, binomial distribution, the Poisson distribution, simple and multivariate normal distributions;
  • Gradient methods for data optimization
  • Linear methods for regression and classification;
  • Estimation of errors using cross-validation, blocking, bootstrapping and jackknife methods;
  • Practical optimization using Singular-value decomposition and least squares for parameterizing data.

Machine learning

The following topics will be covered

  • Linear Regression and Logistic Regression;
  • Neural networks and deep learning;
  • Decisions trees, random forests, boosting and bagging
  • Support vector machines
  • Dimensionality reduction, mainly Principal Component Analysis

Hands-on demonstrations, exercises and projects aim at deepining your understanding of these topics.

Computational aspects play a central role and you are expected to work on numerical examples and projects which illustrate the theory and methods. We recommend strongly to form small project groups of 2-3 participants.

Prerequisites

Basic knowledge in programming and mathematics, with an emphasis on linear algebra. Knowledge of Python or/and C++ as programming languages is strongly recommended and experience with Jupiter notebook is recommended.

Practicalities

  1. Lectures are in the morning, from 10am-12pm.
  2. Four hours of laboratory sessions for work on computational projects, from 2pm to 6pm;
  3. Lectures and lab sessions will all be at GANIL, starting January 20 at 9am.
  4. Grading scale: Grades are awarded on a scale from A to F, where A is the best grade and F is a fail. We are aiming at having two projects to be handed in. These will graded and should be finalized not later than two weeks after the course is over. Both projects count 50% each of the final grade. We plan to make the grades available not later than March 1, hopefully the grades will be available before that.

Lecture material

The link https://compphysics.github.io/MLErasmus/doc/web/course.html gives you direct access to the learning material with lectures slides and jupyter notebooks. Videos of the lectures will be added.

Possible textbooks

Recommended textbooks:

  • Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer
  • Aurelien Geron, Hands‑On Machine Learning with Scikit‑Learn and TensorFlow, O'Reilly

General learning book on statistical analysis:

  • Christian Robert and George Casella, Monte Carlo Statistical Methods, Springer
  • Peter Hoff, A first course in Bayesian statistical models, Springer

General Machine Learning Books:

  • Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press
  • Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer
  • David J.C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press
  • David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press

Teaching schedule, topics and teachers

Teachers: Stian Bilek (SB), Lucas Charpentier (LC), Morten Hjorth-Jensen (MHJ), and Hanna Svennevik (HS)

Week 4, January 20-24, 2020

Week 5, January 27-31, 2020

About

This site contains all document relevant for the Machine Learning courses of the Erasmus+ network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published