Skip to content

Latest commit

 

History

History
74 lines (62 loc) · 2.51 KB

README.md

File metadata and controls

74 lines (62 loc) · 2.51 KB

Data science beginner projects

Description

Study projects developed during data science courses.

module_0

The function guesses the number and prints the number of attempts.

module_1

Studying the provided data using pandas.

EDA, prepare the data for the machine learning

  • Filter outliers
  • Perform correlation analysis in quantitative data
  • Perform analysis of the nominative variables
  • Select columns for the machine learning step.

Predict tripadvisor restaurant rating.

  • Data cleaning
  • Filling NA
  • Outlier removing
  • Feature Engineering
  • EDA
  • Using ML first time with default parameters
    First whole data preprocessing with eda and feature engineering.

Bank score prediction project

  • Data cleaning
  • Filling NA
  • Outlier removing
  • Feature Engineering
  • EDA
  • ML
  • Naive model
  • PCA, SVD to reduce the matrix size
  • Hyperparameter tuning

Predict car classes from the pictures using deep learning

  • 6 types of augmentation
  • Different sizes of images starting from 512 to 224
  • Different number of epochs
  • Different batch sizes
  • All model types that are presented in tf.keras.applications
  • Fine-tuning and transfer learning
  • LR were optimized using ReduceLROnPlateau
  • Different optimizers
  • Batch Normalization
  • Different callback Keras functions
  • TTA
  • Different head architecture

Analysis of vacancies from HeadHunter using SQL query in jupyter notebook

Property price prediction
The data have a lot of outliers, mistakes, input errors, slang abbreviations, that's why the project was split into 2 parts data_cleaning.ipynb and eda_ml.ipynb

  • Data cleaning
  • Data Enrichment
  • EDA
  • Feature Engineering
  • ML
  • Outlier removal using different models: IsolationForest, EllipticEnvelope, LocalOutlierFactor
  • Feature selection using different methods: RFE, SelectFromModel, FeatureImportance
  • Testing of linear models. Baseline.
  • Testing of 5 different advanced models: Random Forest, CatBoost, Gradient Boosting, XGBoost, LightGBM. Bagging and stacking have also been tested.
  • Hyperparameter tuning