Skip to content

Inter-Class Clustering of Text Data Using Dimensionality Reduction and BERT

Notifications You must be signed in to change notification settings

boorism/medium-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

medium-clustering

Inter-Class Clustering of Text Data Using Dimensionality Reduction and BERT

Files and data related to the medium article on clustering using UMAP and BERT.

All necessary code can be found in the clustering notebook. The precomputed embeddings are contained in the abstract_embeddings pickle object, and the already clustered data can either be accessed from the dict_list or directly in the clusters CSV file.

Refer to the code and Medium article for all the steps taken and the general idea!

Method highly inspired by Top2Vec, this is my own attempt at an implementation and introduction of a small change in the pipeline.

About

Inter-Class Clustering of Text Data Using Dimensionality Reduction and BERT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published