Skip to content

Conversation

ladezai
Copy link

@ladezai ladezai commented Oct 17, 2025

Premise: a Random Forest is just Bagging with the additional constraint that each weak predictor knows only a subset of the feature of the dataset. In this PR, we updated the code of EnsembleLearner to account for that key constraint.

Implementation details:

  • Adds to src/dataset/impl_dataset.rs 3 new methods: bootstrap_with_indices, bootstrap_samples_with_indices, bootstrap_features_with_indices. Indices knowledge is fundamental to know which feature should each weak classifier.
  • Adds the parameter feature_proportion to EnsembleLearnerValidParams.
  • Adds model_features to EnsembleLearner to keep track of which feature should each weak learner use.
  • Updates the examples/bagging_iris.rs -> examples/ensemble.rs.

Hope this implementation meets the library standards. If any further modification is required, just ping me or feel free to modify it :)

…straint on predictors (or features) to be used. Introduce Dataset bootstrap implementation with indices denoted as '_with_indices' methods.
@relf
Copy link
Member

relf commented Oct 19, 2025

Thanks for your contribution.

I would like to mention the prior art on RandomForest. Did you see it?
In this comment, I was suggesting an implementation.
As it is a well known algorithm, I think random forest deserves a struct after its name. What do you think about it?

Otherwise, you need to make clippy happy (I think you can go with #[allow(clippy::type_complexity)])

@ladezai
Copy link
Author

ladezai commented Oct 20, 2025

Thanks for your contribution.

Hi, thanks to you for taking your time to review the code!

I would like to mention the prior art on RandomForest. Did you see it? In this comment, I was suggesting an implementation. As it is a well known algorithm, I think random forest deserves a struct after its name. What do you think about it?

Yes, I saw the previous implementation and, for those reason, I tried to stick with the already working implementation of EnsembleLearner with minor edits.

I pushed a new commit in which I add a type alias for RandomForest<F, L> = EnsembleLearner<DecisionTree<F, L>>. Do you think is it enough or I should add more documentation to the type alias or provide a different solution altogether?

@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 61.53846% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.16%. Comparing base (3f2c202) to head (7b3bdb2).

Files with missing lines Patch % Lines
algorithms/linfa-ensemble/src/algorithm.rs 47.05% 9 Missing ⚠️
algorithms/linfa-ensemble/src/hyperparams.rs 25.00% 6 Missing ⚠️
src/dataset/impl_dataset.rs 81.48% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #410      +/-   ##
==========================================
+ Coverage   36.05%   36.16%   +0.11%     
==========================================
  Files         100      100              
  Lines        6549     6592      +43     
==========================================
+ Hits         2361     2384      +23     
- Misses       4188     4208      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@relf
Copy link
Member

relf commented Oct 21, 2025

Thanks! The implementation looks good. And yes it would be great to adapt the documentation in lib.rs and add some on the RandomForest and EnsembleLearner types.

You have to add also some automated tests in lib.rs for random forest as well as in dataset\mod.rs for the new boostrap functions.

@ladezai
Copy link
Author

ladezai commented Oct 21, 2025

Thanks! The implementation looks good. And yes it would be great to adapt the documentation in lib.rs and add some on the RandomForest and EnsembleLearner types.

For now, just to not increase this PR size, I added a minimal documentation with an example (and fixed some typo in EnsembleLearner documentation). Later on I'd guess a rewrite of this part of documentation is needed to be consistent with other sub-crates.

You have to add also some automated tests in lib.rs for random forest as well as in dataset\mod.rs for the new boostrap functions.

Added! I don't know if are enough or you have something more in mind, just ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants