Repo for our machine learning exercises in SS2020.
Authors:
- Alexander Leitner
- Aleksander Hadzhiyski
- Peter Holzner
The others:
- Cedrik und Co: Group 7
- Jan+Fabian: Group 33
Submission deadline: 25.03.2020, 23:59
Regression: Moneyball
Classification: Carabana - Don't get kicked
Submission deadline: 4.05.2020, 23:59
Choose 4 techniques from:
a) linear regression
b) polynominal regression
c) logarithmic regression
d) kNN
e) Lasso
f) Ridge
g) Regression tree
h) ...?
You need to chose a total of 4 datasets.
- 1 from exercise 0
- 1 from Kaggle/UCI ML Repository (or other repositories)
- 2 from list below
Choose 2 from:
Bias correction of numerical prediction model temperature forecast Data Set
http://archive.ics.uci.edu/ml/datasets/Bias+correction+of+numerical+prediction+model+temperature+forecast
Data Folder: http://archive.ics.uci.edu/ml/machine-learning-databases/00514/
Some info:
> Number of Instances: 7750
> Number of Attributes: 25
> Missing Values: Yes
Verdict:
> # of samples: med
> # of dimensions: med-high
QSAR fish toxicity Data Set http://archive.ics.uci.edu/ml/datasets/QSAR+fish+toxicity Data Folder: http://archive.ics.uci.edu/ml/machine-learning-databases/00504/
Some info:
> Number of Instances: 908
> Number of Attributes: 7
Verdict:
> # of samples: low
> # of dimensions: low
Metro Interstate Traffic Volume Data Set http://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume Data Folder:http://archive.ics.uci.edu/ml/machine-learning-databases/00492/
Some info:
> Number of Instances: 48204
> Number of Attributes: 9
Verdict:
> # of samples: high
> # of dimensions: low
Real estate valuation data set Data Set http://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set Data Folder: http://archive.ics.uci.edu/ml/machine-learning-databases/00477/
Some info:
> Number of Instances: 414
> Number of Attributes: 7
Verdict:
> # of samples: low
> # of dimensions: low
Meeting 1: Fr., 24.04.2020
Next meeting: Monday evening, 17:00
Chosen:
a) linear regression
b) kNN
c) Lasso
d) random forest --> regression tree?
How:
> Jupyter notebook
> bissl plots
> bissl code zum spielen
> bissl ergebnisse/fazits
> bissl pros/cons
Who & what (on Moneyball):
a) linear regression --> Alex
b) kNN --> Aleks
c) Lasso: --> Code von Alex
d) random forest --> Peter
Some info:
> Number of Instances: 1230
> Number of Attributes: 15
Verdict:
> # of samples: low
> # of dimensions: low-med
Dev: Peter
Video Games Sales with Metacritic ratings
Dimensionality analysis:
> Rows: 16719
> Columns: 16
> Column names: ['Name', 'Platform', 'Year_of_Release', 'Genre', 'Publisher', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales', 'Critic_Score', 'Critic_Count', 'User_Score', 'User_Count', 'Developer', 'Rating']
Possible targets:
> Sales: so either of ['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']
> Rating: so either of ['Critic_Score', 'User_Score']
http://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume Data Folder:http://archive.ics.uci.edu/ml/machine-learning-databases/00492/
Dev: Alex L
Some info:
> Number of Instances: 48204
> Number of Attributes: 9
Verdict:
> # of samples: high
> # of dimensions: low
http://archive.ics.uci.edu/ml/datasets/Bias+correction+of+numerical+prediction+model+temperature+forecast
Data Folder: http://archive.ics.uci.edu/ml/machine-learning-databases/00514/
Dev: Aleks H
Some info:
> Number of Instances: 7750
> Number of Attributes: 25
Verdict:
> # of samples: med
> # of dimensions: med-high
Free books on Data Science by Springer: https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189
Here is github repo with some cheat sheets for various Data Science/Machine Learning related cheat sheets:
https://github.com/abhat222/Data-Science--Cheat-Sheet
https://www.youtube.com/watch?v=tODN7x3BO_E three videos of linear regression
Other cheat sheets are saved in the folder of the same name.