Table of Content
This project was started as a university team project, the goal being to predict the outcomes of games in the German Bundesliga. Being programmed in a modal and abstract manner, this algorithm features easy ways to tie in and predict various other sports, provided only that they feature a quantifiable way to track scores. One example of this flexibility is our already implemented adaptation of basketball, more specifically the US' NBA.
To elaborate further on what this project is about: we wanted to utilize past data to find clever algorithms and machine learning models, striving to accurately predict future outcomes using the aforementioned models.
With more features, however, came a more complex user interaction. As a means of counteracting this, we built a web based, streamlined User Interface (UI) to guide the user through this extensive selection process. Advanced settings in the UI are pre-selected, allowing newer users to easily navigate, without needing detailed knowledge of the algorithms capabilities, while giving more experienced users direct access to fine-tuned predictions.
The resulting predictions are dependent on the algorithm used, simpler models merely return the chances for a draw or win for either team, whereas the more complex machine learning models may also give a concrete score predicted for the match.
There is a lot more to be discovered, so let's get started.
To give you a glimpse of our User Interface and how to navigate through, we decided to show you two screenshots, the first of which shows our starting screen, where you select the sport you'd like to let us predict. As you can see, the selections take place over multiple pages, allowing us to hide unnecessary menus and display only those that we need for the selected sport and algorithm. To prevent incomplete inputs, the button to continue to the next site only appears once a selection has been made.
After choosing the sport, general data and model to be used, the advanced selection will appear. This page lets you fine-tune your data down to the day, when choosing what the model will train on, as well as what teams or matches you want to predict. Following it is the page displayed below. It shows the predicted possibilities for each outcome and in addition to that (courtesy of the poisson model), a prediction for the resulting score.
Requirements:
- Python 3.9
- pip
- Chrome, Chromium or Mircosoft Edge Browser
Then download the whole Repo or clone it with git.:
git clone https://github.com/stedavkle/ML-Bundesliga/
Now navigate in the project folder and install all dependencies.:
cd ML-Bundesliga pip install -e .
After the package and all dependencies are installed, you can execute the program as followed.:
python3 teamproject
What was our focus while developing?
Our project stands out in terms of the user experience. We realised our user interface with eel, a python library that allows you to create offline user interfaces structured like websites. Because we chose this implementation, the project is cross-platform, which allows for a wider audience.
We also focused on robustness and accessibility. Therefore it is possible to jump to any of the previous steps of the program, and if you accidentally refresh the page, you will still be in the same step of the process. Furthermore, all images support alt-text, the UI can be used with either only a keyboard or mouse and the simplistic design emphasizes usability.
Outside of the User Interface, the focus on flexibility led us to design the crawler and prediction algorithms as abstract classes. This allows for a modular build, in which additional algorithms or crawlers are automatically integrated by the UI, reducing the needed work to add new sports, leagues or prediction models.
Which Algorithms did we use?
We implemented four different algorithms. The first one is short and sweet algorithm, we called it MostWins. It filters the given dataset for the selected match-up and enumerates all past results. Once this is completed, the algorithm makes a simple prediction and, as the name implies, returns the chances of a win for either team or draw occurring.
The second one uses the Poisson Model to predict the amount of goals (or any quantifiable scoring value). The predicted values are displayed as a matrix, where each entry represents one possible outcome to the game with X and Y axis representing each team's score.
The third model is the Dixon-Coles Model, an improvement upon the Poisson Model, iterating over the given data repeatedly to increase accuracy at the expense of computing time. A selection can be made between weighing all data equally or weighing the data progressively according to age.
And the last standard model is the Logistic Regression Model. It calculates a table of factors using the logistic regression function, each entry representing a modifier to the game, such as home-field advantage or simply the teams' overall success in the past.
There are thorough tests written with pytest for both the models and crawler functions. These can be found in the 'tests' folder and executed with the following commands:
python -m pytest tests/test_crawler.py python -m pytest tests/test_models.py
The authors are Stephan Amann, Cornelius Bopp, David Kleindiek and Amelie Schäfer. This project started as a university assignment, therefore we acknowledge and thank our tutors Felix Dangel, Thomas Gläßle and Frank Schneider for their feedback and generous help.
This project is licensed under the permissive open source MIT license.