Probability theory and statistical methods play a central role in science. Nowadays we are surrounded by huge amounts of data. For example, there are about one trillion web pages; more than one hour of video is uploaded to YouTube every second, amounting to years of content every day; the genomes of 1000s of people, each of which has a length of more than a billion base pairs, have been sequenced by various labs and so on. This deluge of data calls for automated methods of data analysis, which is exactly what machine learning aims at providing.
This course aims at giving you insights and knowledge about many of the central algorithms used in Data Analysis and Machine Learning. The course is project based and through various numerical projects, normally three, you will be exposed to fundamental research problems in these fields, with the aim to reproduce state of the art scientific results. Both supervised and unsupervised methods will be covered. You will learn to develop and structure large codes for studying different systems where Machine Learning is applied to, get acquainted with computing facilities and learn to handle large scientific projects. A good scientific and ethical conduct is emphasized throughout the course. More specifically, after this course you will
- Learn about basic data analysis, data optimization and machine learning;
- Be capable of extending the acquired knowledge to other systems and cases;
- Have an understanding of central algorithms used in data analysis and machine learning;
- Understand linear methods for regression and classification, from ordinary least squares, via Lasso and Ridge to Logistic regression;
- Learn about various neural networks and deep learning methods for supervised and unsupervised learning;
- Learn about about decision trees, random forests and boosting
- Learn about support vector machines and kernel transformations
- Reduction of data sets
- Work on numerical projects to illustrate the theory. The projects play a central role and you are expected to know modern programming languages like Python or C++.
- Statistical analysis and optimization of data
- Machine learning
These topics will be scattered thorughout the course and may not necessarily be taught separately. Rather, we will often take an approach (during the lectures and project/exercise sessions where say elements from statistical data analysis are mixed with specific Machine Learning algorithms.
The following topics will be covered
- Basic concepts, expectation values, variance, covariance, correlation functions and errors;
- Simpler models, binomial distribution, the Poisson distribution, simple and multivariate normal distributions;
- Gradient methods for data optimization
- Linear methods for regression and classification;
- Estimation of errors using cross-validation, blocking, bootstrapping and jackknife methods;
- Practical optimization using Singular-value decomposition and least squares for parameterizing data.
The following topics will be covered
- Linear Regression and Logistic Regression;
- Neural networks and deep learning;
- Decisions trees, random forests, boosting and bagging
- Support vector machines
- Dimensionality reduction, mainly Principal Component Analysis
Hands-on demonstrations, exercises and projects aim at deepining your understanding of these topics.
Computational aspects play a central role and you are expected to work on numerical examples and projects which illustrate the theory and methods. We recommend strongly to form small project groups of 2-3 participants.
Basic knowledge in programming and mathematics, with an emphasis on linear algebra. Knowledge of Python or/and C++ as programming languages is strongly recommended and experience with Jupiter notebook is recommended.
- Lectures are in the morning, from 10am-12pm.
- Four hours of laboratory sessions for work on computational projects, from 2pm to 6pm;
- Lectures and lab sessions will all be at GANIL, starting January 20 at 9am.
- Grading scale: Grades are awarded on a scale from A to F, where A is the best grade and F is a fail. We are aiming at having two projects to be handed in. These will graded and should be finalized not later than two weeks after the course is over. Both projects count 50% each of the final grade. We plan to make the grades available not later than March 1, hopefully the grades will be available before that.
The link https://compphysics.github.io/MLErasmus/doc/web/course.html gives you direct access to the learning material with lectures slides and jupyter notebooks. Videos of the lectures will be added.
Recommended textbooks:
- Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer
- Aurelien Geron, Hands‑On Machine Learning with Scikit‑Learn and TensorFlow, O'Reilly
General learning book on statistical analysis:
- Christian Robert and George Casella, Monte Carlo Statistical Methods, Springer
- Peter Hoff, A first course in Bayesian statistical models, Springer
General Machine Learning Books:
- Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press
- Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer
- David J.C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press
- David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press
Teachers: Stian Bilek (SB), Lucas Charpentier (LC), Morten Hjorth-Jensen (MHJ), and Hanna Svennevik (HS)
- Monday Lecture 10am-12pm: Introduction to Machine Learning and linear regression (MHJ)
- Video: https://folk.uio.no/mhjensen/MLErasmus/LectureJan20.mp4
- Monday Laboratory 2pm-6pm: Getting familiar with Git, GitHub, installing Python packages and Computational Exercises (SB, LC and HS)
- Tuesday Lecture 10am-2pm: Linear Regression and Logistic Regression (MHJ)
- Video: https://folk.uio.no/mhjensen/MLErasmus/LectureTue21.mp4
- Tuesday Laboratory 10am-2pm: Computational Exercises (SB, LC and HS), exercise set 2
- Wednesday Lecture 10am-12pm: Regression and Bias-Variance Tradeoff (MHJ)
- Video: https://folk.uio.no/mhjensen/MLErasmus/LectureWedn22.mp4
- Wednesday Laboratory 2pm-6pm: Computational Exercises (SB, LC and HS), exercise sets 2 and 3
- Thursday Lecture 10am-12pm: Bias-Variance tradeoff, Logistic Regression and Optimization (MHJ)
- Video: https://folk.uio.no/mhjensen/MLErasmus/FirstLectureThurs23.mp4
- Video: https://folk.uio.no/mhjensen/MLErasmus/SecondLectureThurs23.mp4
- Thursday Laboratory 2pm-6pm: Computational Exercises (SB, LC and HS), exercise sets 2 and 3
- Friday Lecture 10am-12pm: Logistic Regression and begin Neural Networks (MHJ)
- Video: https://folk.uio.no/mhjensen/MLErasmus/LectureFri24.mp4
- Friday Laboratory 2pm-6pm: Using and installing TensorFlow and Computational Exercises (SB, LC and HS), exercise sets 2 and 3 and first project
-
Monday Lecture 10am-12pm: Neural Networks (MHJ)
-
Video: https://folk.uio.no/mhjensen/MLErasmus/LectureJan27.mp4
-
Monday Laboratory 2pm-6pm: Computational Exercises (SB, LC and HS) and work on project 1
-
Tuesday Lecture 10am-2pm: Neural Networks, back propagation and examples of classification and regression problems (MHJ)
-
Video: https://folk.uio.no/mhjensen/MLErasmus/LectureTue28.mp4
-
Tuesday Laboratory 2pm-6pm: Computational Exercises (SB, LC and HS) and work on project 1
-
Wednesday Lecture 10am-12pm: Decision Trees, Random Forests and Boosting (MHJ)
-
Video: https://folk.uio.no/mhjensen/MLErasmus/LectureWedn29.mp4
-
Wednesday Laboratory 2pm-6pm: Computational Exercises (SB, LC and HS), work on project 1
-
Thursday Lecture 10am-12pm: Decision trees, Random Forests and Boosting (MHJ)
-
Video: https://folk.uio.no/mhjensen/MLErasmus/LectureThurs30.mp4
-
Thursday Laboratory 2pm-6pm: Computational Exercises (SB, LC and HS), work on project 1
-
Friday Lecture 10am-12pm: Bossting and XGBoost and Summary of course (MHJ), presentation of project 2
-
Video: https://folk.uio.no/mhjensen/MLErasmus/LectureFri31.mp4
-
Friday Laboratory 2pm-6pm: Computational Exercises (SB, LC and HS), work on projects 1 and 2
-
Handwritten Notes First and Second weeks: https://folk.uio.no/mhjensen/MLErasmus/HandwrittenNotes.pdf