- Permutation importance and Feature importance are now two different plotting methods.
Model.test_estimators
now takes afeature_pipeline
argument- Fixed a bug where
FillNA
did not create a_is_na
column if the column didn't have a missing value - Implemented Bayesian Search for hyperparameter optimization
- Added a
read_file
convenience method toFileDataset
to read - Fixed a bug where
copy_to
failed between two instances of Sqlite based SQLDatasets - Fixed a bug where
ClassificationVisualize.confusion_matrix
would fail on multi-class problems due to wrong defaults - Added repr to demodataset
- Lift curve now can plot multi-class
- Precision-Recall curve can now plot multi-class
- ROC AUC curve can now plot multi-class
- Fixed Binner to have a default value
- Fixed FuncTransform to have a default value
load_estimator
now uses default storage if nothing is passedModel.bayessearc
is nowModel.bayesiansearch
- Added
load_demo_dataset
function - If the dataset has no train set
score_estimator
will now runcreate_train_test
with default configurations Model.make_prediction
now takes a threshold argument when making a binary classification- All ML-tooling logging messages now go to stdout instead of stderr
- Can pass a feature pipeline to
Model
which will then automatically generate a combined feature_pipeline + estimator Pipeline - Can pass a feature pipeline to
Dataset.plot
methods, to apply preprocessing before visualization - New config implementation. If you need to reset the configuration, you should use
Model.config.reset_config()
- Fixed typehints in Dataset
- Dataset.create_train_test now takes a boolean
stratify
parameter. - Added default local filestorage when using
save_estimator
- The dataframe returned by
.make_prediction
now labels the columns in a more human friendly manner - Dataset now verifies that
load_training_data
andload_prediction_data
do not return empty - Added a missing data visualization to
Dataset.plot
- FillNA now accepts a
is_nan
flag which adds a flag indicating that a value was missing Model.make_prediction
now accepts ause_cache
flag to score everything in cached.x
- Added a new Transformer:
RareFeatureEncoder
- Fixed type inferences from data to sql in _load_data
- Added idx arg to load_prediction_data abstract method in SQLDataset
- Added caching of loaded data in SQLDataset
- Added
.copy_to
functionality to SQLDataset and FileDataset, allowing copying between datasets
- Bug fix for calculating feature importance when passing large amounts of data
- Bug fix when using default metric in
test_estimators
- Bug fix when gridsearching, only applying last change
- Add nicer error message when passing incorrect dtypes to FillNA
- Storage .save method now only takes filename as parameter
- Handles storage loading of paths outputted from the Storage .get_list method
- Handles case when Dataset does not have a
y
value - Added
plot_learning_curve
and correspondingresult.plot.learning_curve
- Added
plot_validation_curve
and correspondingresult.plot.validation_curve
- Replaced
permutation_importance
with scikit-learn's implementation - Added
target_correlation
plots to Dataset.plot
- Bug fix for logging when feature unions (DFFeatureUnion) had tuples
- Hot fix python version to 3.7
- Breaking change - Model methods load_estimator and save_estimator now takes a Storage class that defines how and where to store estimators.
- Added the ability to declare that a saved model should be a production estimator.
- Added corresponding
.load_production_estimator
toModel
- Removed gitpython as a dependency
- Replaced custom feature permutation importance with sklearns implementation from v0.22
- Breaking change - Dataset is now a separate object that has to be instantiated outside Modeldata
- Breaking change - ModelData is now renamed to Model
- Added new properties
is_estimator
andis_regressor
which checks what type of estimator is used
- Joblib is now a dependency, instead of being vendored with scikit-learn
- Updated requirements
- Breaking change - BaseClassModel renamed to ModelData.
- Breaking change - model renamed to estimator
- Added Precision-Recall Curve
- Added option to give custom file name to .save_estimator()
- Instantiating with estimator is now optional - set estimator later using .init_estimator
- We have a logo! Added ML Tooling logo to docs
- Now issues a warning when git is not installed.
- Data for a class is changed from instance variable to class variable
- Grid search only copies data to workers once and reuses them across grid and folds.
- The Data Class now takes a random seed which it will receive from the BaseClass
- Disabled mem-maping in feature importance
- Added license file to package
- Updated requirements
- Feature importances changed to use permutation instead of built-in for better estimates.
- .train_estimator will now reset the result attribute to None, in order to prevent users from mistakenly assuming the result is from the training
- Fixed bug in lift_score when using dataframes
- Fixed bug when training model and then scoring model
- Fixed bug where users could not save models if no result had been created, as would happen if the user only called .train_estimator before saving.
- Default_metric is now the same metric as the one specified for the model in .config
- Each class inheriting from ModelData has an individual config
- Changed get_scorer_func to wrap sklearn's get_scorer
- Fixed bug when gridsearching twice
- Added Binarize Transformer
- Added ability to use keywords in FuncTransformer
- .predict now returns a dataframe indexed on input
- Updated dependencies
- Added gridsearch method to BaseClass. Gridsearch your model and return a list of results for inspection
- Added ResultGroup - any method that returns a list of results now returns a ResultGroup instead.
- Added logging
- Added ability to record runs as yaml
- Another bugfix release
- Fixed bug that prevented DFRowFunc from pickling properly
- Added DFRowFunc Transformer
- Updated FillNA to handle categorical values
- Allow user to choose whether score_model uses cv or not
- Plot_feature_importance now takes a top_n and bottom_n argument
- Fix for error in setup wheels
- Implemented new FillNA Transformer
- Refactored to use flat structure
- Renamed project to ml_tooling
- Initial release