feat(linear): Add ensemble tree model and solver-aware scoring #18

shenkha · 2025-07-14T14:45:20Z

What does this PR do?

(Some descriptions here...)
This pull request introduces two major enhancements to the linear tree-based models:

Ensemble Tree Model: Implements an ensemble of tree models to improve prediction accuracy and robustness over a single tree.
Solver-Aware Scoring: Fixes a critical bug in the beam search scoring logic. The logic now correctly calculates path probabilities based on whether an SVM or a Logistic Regression solver is used.

Key Changes:

1. Ensemble of Trees

A new EnsembleTreeModel class in libmultilabel/linear/tree.py now manages multiple tree models.
The train_ensemble_tree function handles the training of n separate tree models, each with a different random seed for diversity.
The ensemble's final predictions are an average of the scores from each tree, providing a more stable and accurate result.
This functionality is exposed via a new CLI argument --tree_ensemble_models in main.py and integrated into linear_trainer.py.

Example usage:

python main.py --training_file data/eurlex_raw_texts_train.txt \
                --test_file data/eurlex_raw_texts_test.txt \
                --linear \
                --linear_technique tree \
                --tree_ensemble_models 3

2. Corrected Scoring Logic

The _is_lr method in TreeModel now correctly identifies all of LIBLINEAR's Logistic Regression solvers (0, 6, and 7).
The _get_scores method has been updated to use the correct scoring function based on the solver type:
- For Logistic Regression, it now uses log_expit to correctly accumulate log-probabilities along a path in the tree.
- For SVM-based solvers, it continues to use the existing calculation based on squared hinge loss.
  This fix is crucial for the beam search to find the optimal labels, as the previous implementation incorrectly applied the SVM scoring logic to LR models.

Test CLI & API (`bash tests/autotest.sh`)

Test APIs used by main.py.

Test Pass
- (Copy and paste the last outputted line here.)
Not Applicable (i.e., the PR does not include API changes.)

Check API Document

If any new APIs are added, please check if the description of the APIs is added to API document.

API document is updated (linear, nn)
Not Applicable (i.e., the PR does not include API changes.)

Test quickstart & API (`bash tests/docs/test_changed_document.sh`)

If any APIs in quickstarts or tutorials are modified, please run this test to check if the current examples can run correctly after the modified APIs are released.

libmultilabel/linear/tree.py

will945945945 · 2025-07-15T16:26:05Z

libmultilabel/linear/tree.py

                next_level.extend(zip(node.children, children_score.tolist()))

            cur_level = sorted(next_level, key=lambda pair: -pair[1])[:beam_width]
            next_level = []

        num_labels = len(self.root.label_map)
-        scores = np.zeros(num_labels)
+        scores = np.full(num_labels, 0.0)


why we need to modify this line?

my mistake, I have just checked and will revert right away

will945945945 · 2025-07-15T16:29:15Z

libmultilabel/linear/tree.py

+                return solver_type in ["0", "6", "7"]
+        return False
+
+    def _get_scores(self, pred, parent_score=0.0):


We should specify the parameter type. Please see other functions.

libmultilabel/linear/tree.py

Eleven1Liu

For the formatting issues mentioned above, please use black formatter.

feat(linear): Add ensemble tree model and solver-aware scoring

01f179e

shenkha requested review from cjlin1 and a team as code owners July 14, 2025 14:45

Eleven1Liu added the model/linear label Jul 14, 2025