Skip to content

New Algorithm: DBSCAN clustering #16

@tkellogg

Description

@tkellogg

I like the clustering approach, but I don't like that k-means makes you say up front how many clusters there's going to be (i'm discovering too, it's a new day, i don't know yet, right??). I want to experiment with other clustering algorithms that make different assumptions and trade-offs about the data.

DBSCAN seems interesting because it finds clusters based on density. So you have to say what the expected density should be, that threshold that defines a cluster.

I expect that there will be a lot of tweaking to make it work for a certain embedding model, but after you get it to work it'll be a lot more dynamic and robust.

Note: DBSCAN doesn't assign all posts to a cluster, so you might not be able to use the toot_clusters.html on it's own. You'll probably need an offshoot of it. Feel free to skip this part on the first pass of the PR, we might even be able to get someone else to do this part.

Metadata

Metadata

Assignees

No one assigned

    Labels

    datascienceProbably involves knowledge of scikit-learn, LLMs, etc.htmxProbably involves mostly changes to HTML or HTMXpythonProbably involves a lot of changes to Python code

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions