-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: The behavior of kmeans in SPU does not match kmeans in sklearns #536
Comments
Hello,,thanks for reporting this. The reason behind this is that sklearn runs the clustering algorithm several times(with different init centers) and use the best one. However,sml just runs the procedure once for efficiency ,so it may get different outputs. BTW,It's pretty straight to support this functionality,would you mind doing this job? |
Thanks for your response! |
I observed that another factor behind this difference is that default method to generate initial centers is different. I will also try adding a new initialization method. |
Exactly, sklearn use kmeans++ to decide init centroids, if you want to add this method, you MUST generate some random values before running in SPU. |
Sorry, I don't catch the point. You mean I cannot use functions from jax.random in my implementation of kmeans++? |
you can't generate random values in SPU runtime, you can refer to #80 . Lines 39 to 41 in 0988eec
|
I think I got your point. You mean that I need to generate all the random values I need in init function, and using jax.random.xxx in all the other functions will cause unexpected behavior? |
there are two cases in SML:
|
So values of all the attributes in init function are public, right? |
Yes.
I think so. |
Thanks for your reminder! It really helps a lot. |
When I implemented the selection of the best centers, I encountered a behavior looks like a bug.
Here is the sample with n_init = 2, and it raises the following runtime error.
The strange thing is that only when first dimension (n_init) of centers is 2 will there be a runtime error. All the other settings of n_init will execute as expected. Is this a bug or an expected behavior? |
Please try with the latest main branch code. This bug is fixed by the PR #532 . |
Sorry for not testing with latest commit and disturbing you. The code can execute as expected in latest commit. |
Solved with #546. |
Issue Type
Currentness/Accuracy
Modules Involved
Others
Have you reproduced the bug with SPU HEAD?
Yes
Have you searched existing issues?
Yes
SPU Version
spu 0.7.0b0
OS Platform and Distribution
Linux Ubuntu 22.04
Python Version
3.10
Compiler Version
No response
Current Behavior?
Hello, sorry for disturbing you.
When I use Kmeans implemented in sml/cluster, I encounter the problem that the behavior of it does not match kmeans implemented in sklearns with some inputs. I will include the code below.
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: