-
Notifications
You must be signed in to change notification settings - Fork 13
Update build_tree function with SparseKmeans implementation #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…clustering method mixing Elkan's and Lloyd's algorithm based on the number of samples
@@ -6,3 +6,4 @@ scikit-learn | |||
scipy<1.14.0 | |||
tqdm | |||
psutil | |||
sparsekmeans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add sparsekmeans to install_requires
Lines 27 to 35 in a0bef91
install_requires = | |
liblinear-multicore>=2.49.0 | |
numba | |
pandas>1.3.0 | |
PyYAML | |
scikit-learn | |
scipy<1.14.0 | |
tqdm | |
psutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please bump version to 0.8.0
Line 3 in a0bef91
version = 0.7.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sparsekmeans requires Python >= 3.10, whereas LibMultiLabel supports Python >= 3.8.
This causes installation issues when users try to install LibMultiLabel in Python 3.8 or 3.9.
There are two approach for this issue:
- Update LibMultiLabel to require Python >= 3.10.
- Or, detect the user's environment and apply the corresponding workaround.
There is much room for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khoinpd0411 No need to bump version now, we will release with #20.
tol=0.0001, | ||
random_state=np.random.randint(2**31 - 1), | ||
verbose=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering why the indentation isn't aligned with line:287
.
BTW, should we pass the verbose
flag through _build_tree()
so that we can control the output when training?
And would it be better to do something like (I'm not sure)
if label_representation.shape[0] > 10000:
kmeans_algo = ElkanKmeans
else:
kmeans_algo = LloydKmeans
kmeans = kmeans_algo(those_params)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the formatting issue, please use black formatter.
child = _build_tree(child_representation, child_map, d + 1, K, dmax) | ||
else: | ||
child = Node(label_map=child_map, children=[]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to have
num_unique_labels = len(np.unique(metalabels))
if len(num_unique_labels) == K:
children = []
for i in range(K):
child_representation = label_representation[metalabels == i]
child_map = label_map[metalabels == i]
child = _build_tree(child_representation, child_map, d + 1, K, dmax)
children.append(child)
else:
children = [
Node(label_map=label_map[metalabels == i], children=[])
for i in range(num_unique_labels)
]
What does this PR do?
Update build_tree function with SparseKmeans implementation and utilize an adaptive clustering method mixing Elkan's and Lloyd's algorithm based on the number of samples.
Improvements:
Test CLI & API (
bash tests/autotest.sh
)Test APIs used by main.py.
Check API Document
If any new APIs are added, please check if the description of the APIs is added to API document.
Test quickstart & API (
bash tests/docs/test_changed_document.sh
)If any APIs in quickstarts or tutorials are modified, please run this test to check if the current examples can run correctly after the modified APIs are released.