Update build_tree function with SparseKmeans implementation #19

khoinpd0411 · 2025-07-14T20:13:38Z

What does this PR do?

Update build_tree function with SparseKmeans implementation and utilize an adaptive clustering method mixing Elkan's and Lloyd's algorithm based on the number of samples.

Improvements:

Speed up tree construction.
Resolved convergence issues caused by duplicate samples during clustering
Introduced an adaptive clustering strategy that dynamically switches between Elkan’s algorithm (for large sample sizes) and Lloyd’s algorithm (for smaller or dense datasets)

Test CLI & API (`bash tests/autotest.sh`)

Test APIs used by main.py.

Test Pass
- (Copy and paste the last outputted line here.)
Not Applicable (i.e., the PR does not include API changes.)

Check API Document

If any new APIs are added, please check if the description of the APIs is added to API document.

API document is updated (linear, nn)
Not Applicable (i.e., the PR does not include API changes.)

Test quickstart & API (`bash tests/docs/test_changed_document.sh`)

If any APIs in quickstarts or tutorials are modified, please run this test to check if the current examples can run correctly after the modified APIs are released.

…clustering method mixing Elkan's and Lloyd's algorithm based on the number of samples

Eleven1Liu · 2025-07-14T23:28:42Z

requirements.txt

@@ -6,3 +6,4 @@ scikit-learn
 scipy<1.14.0
 tqdm
 psutil
+sparsekmeans


Please also add sparsekmeans to install_requires

LibMultiLabel/setup.cfg

Lines 27 to 35 in a0bef91

install_requires =

liblinear-multicore>=2.49.0

numba

pandas>1.3.0

PyYAML

scikit-learn

scipy<1.14.0

tqdm

psutil

~~Please bump version to 0.8.0~~

LibMultiLabel/setup.cfg

Line 3 in a0bef91

version = 0.7.4

Sparsekmeans requires Python >= 3.10, whereas LibMultiLabel supports Python >= 3.8.
This causes installation issues when users try to install LibMultiLabel in Python 3.8 or 3.9.
There are two approach for this issue:

Update LibMultiLabel to require Python >= 3.10.

Or, detect the user's environment and apply the corresponding workaround.
There is much room for discussion.

@khoinpd0411 No need to bump version now, we will release with #20.

chcwww · 2025-07-15T18:02:03Z

libmultilabel/linear/tree.py

+                tol=0.0001,
+                random_state=np.random.randint(2**31 - 1),
+                verbose=True
+                )


I'm just wondering why the indentation isn't aligned with line:287.
BTW, should we pass the verbose flag through _build_tree() so that we can control the output when training?
And would it be better to do something like (I'm not sure)

if label_representation.shape[0] > 10000: kmeans_algo = ElkanKmeans else: kmeans_algo = LloydKmeans kmeans = kmeans_algo(those_params)

For the formatting issue, please use black formatter.

maclin726 · 2025-07-16T13:11:04Z

libmultilabel/linear/tree.py

+            child = _build_tree(child_representation, child_map, d + 1, K, dmax)
+        else:
+            child = Node(label_map=child_map, children=[])
+


Maybe it's better to have

num_unique_labels = len(np.unique(metalabels)) if len(num_unique_labels) == K: children = [] for i in range(K): child_representation = label_representation[metalabels == i] child_map = label_map[metalabels == i] child = _build_tree(child_representation, child_map, d + 1, K, dmax) children.append(child) else: children = [ Node(label_map=label_map[metalabels == i], children=[]) for i in range(num_unique_labels) ]

Update build_tree function with sparsekmeans and utilize an adaptive …

e119c79

…clustering method mixing Elkan's and Lloyd's algorithm based on the number of samples

khoinpd0411 requested review from cjlin1 and a team as code owners July 14, 2025 20:13

Eleven1Liu requested changes Jul 14, 2025

View reviewed changes

Eleven1Liu added model/linear release PyPI release tag is in this PR labels Jul 14, 2025

chcwww reviewed Jul 15, 2025

View reviewed changes

maclin726 reviewed Jul 16, 2025

View reviewed changes

Eleven1Liu mentioned this pull request Jul 17, 2025

[WIP] Upgrade minimum python version from 3.8 to 3.10 #20

Open

4 tasks

Eleven1Liu removed the release PyPI release tag is in this PR label Jul 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update build_tree function with SparseKmeans implementation #19

Update build_tree function with SparseKmeans implementation #19

khoinpd0411 commented Jul 14, 2025

Uh oh!

Eleven1Liu Jul 14, 2025

Uh oh!

Eleven1Liu Jul 14, 2025 •

edited

Loading

Uh oh!

zhi-bao Jul 15, 2025

Uh oh!

Eleven1Liu Jul 17, 2025

Uh oh!

chcwww Jul 15, 2025

Uh oh!

Eleven1Liu Jul 17, 2025

Uh oh!

maclin726 Jul 16, 2025

Uh oh!

Uh oh!

	install_requires =
	liblinear-multicore>=2.49.0
	numba
	pandas>1.3.0
	PyYAML
	scikit-learn
	scipy<1.14.0
	tqdm
	psutil

Update build_tree function with SparseKmeans implementation #19

Are you sure you want to change the base?

Update build_tree function with SparseKmeans implementation #19

Conversation

khoinpd0411 commented Jul 14, 2025

What does this PR do?

Test CLI & API (bash tests/autotest.sh)

Check API Document

Test quickstart & API (bash tests/docs/test_changed_document.sh)

Uh oh!

Eleven1Liu Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhi-bao Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

chcwww Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

maclin726 Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Test CLI & API (`bash tests/autotest.sh`)

Test quickstart & API (`bash tests/docs/test_changed_document.sh`)

Eleven1Liu Jul 14, 2025 •

edited

Loading