-
-
Notifications
You must be signed in to change notification settings - Fork 299
Implement Random Forest as a sub case to EnsembleLearner #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…straint on predictors (or features) to be used. Introduce Dataset bootstrap implementation with indices denoted as '_with_indices' methods.
Thanks for your contribution. I would like to mention the prior art on RandomForest. Did you see it? Otherwise, you need to make clippy happy (I think you can go with |
Hi, thanks to you for taking your time to review the code!
Yes, I saw the previous implementation and, for those reason, I tried to stick with the already working implementation of I pushed a new commit in which I add a type alias for |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #410 +/- ##
==========================================
+ Coverage 36.05% 36.16% +0.11%
==========================================
Files 100 100
Lines 6549 6592 +43
==========================================
+ Hits 2361 2384 +23
- Misses 4188 4208 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thanks! The implementation looks good. And yes it would be great to adapt the documentation in You have to add also some automated tests in lib.rs for random forest as well as in |
For now, just to not increase this PR size, I added a minimal documentation with an example (and fixed some typo in EnsembleLearner documentation). Later on I'd guess a rewrite of this part of documentation is needed to be consistent with other sub-crates.
Added! I don't know if are enough or you have something more in mind, just ask. |
Premise: a Random Forest is just Bagging with the additional constraint that each weak predictor knows only a subset of the feature of the dataset. In this PR, we updated the code of
EnsembleLearner
to account for that key constraint.Implementation details:
src/dataset/impl_dataset.rs
3 new methods:bootstrap_with_indices
,bootstrap_samples_with_indices
,bootstrap_features_with_indices
. Indices knowledge is fundamental to know which feature should each weak classifier.feature_proportion
toEnsembleLearnerValidParams
.model_features
toEnsembleLearner
to keep track of which feature should each weak learner use.examples/bagging_iris.rs
->examples/ensemble.rs
.Hope this implementation meets the library standards. If any further modification is required, just ping me or feel free to modify it :)