Library contains most useful functions to participate in Kaggle and other machine learning competitions. Module is divided into parts usually included in machine learning application pipeline: Preprocessing -> Features engineering -> Features selection -> Model tuning -> Model training -> Model predicting -> Building ensemble of models
All dependencies listed in requirements.txt
- clone repository
cd KaggleLib
pip install -r requirements.txt
pip install .
- check installation by running example:
python examples/example.py
The library contains following parts:
-
Model - generic class for machine learning models
type
: model type (XGBoost, LightGBM, Keras or Scikit-Learn)params
: dictionary of model parametersmodel
: instance of model objectcv_score
: cross-validation score
-
Preprocessing
hash_data
: hashing of categorical columns (one-hot)normalize_data
: numerical data normalization
-
Feature engineering
make_numerical_interactions
: feature interactions of 2 and 3 order, operations: sum, division, multipliciation, divisionmake_categorical_interactions
: categorical feature interactions of 2 and 3 ordercategorical_target_encoding
:logarithm
: log feature transformationexponent
: exponent feature transformationsigmoid
: sigmoid feature transformationtrgonometry
: sin, cos, tan feature transformation
-
Feature selection
genetic_feature_selection
: select subset of features with best cross-validation metric by genetic algorithm (evolutional change of features subsets)
-
Model tuning
cross_validation
: calculate cross-validation score of a modeltune_lgbm
: find best LightGBM parameters by HyperOpttune_xgb
: find best XGBoost parameters by HyperOpt
-
Model training
train_keras
: train Keras modeltrain_lgbm
: train LightGBM modeltrain_xgb
: train XGBoost modeltrain_sklearn
: train Scikit-Learn mdoel
-
Model predicting
predict_keras
: prediction by Keras modelpredict_lgbm
: prediction by LightGBM modelpredict_xgb
: prediction by XGBoost modelpredict_sklearn
: prediction by Scikit-Learn model
-
Model ensembles
stacking
: creating stack of model using out-of-fold predictions technique
-
Utils
make_folds
: split data into foldsgenerate_keras_model
: generate Keras model by dictionaryHistoryCallback
: callback to preserve Keras training information on every epoch