medium-clustering

Inter-Class Clustering of Text Data Using Dimensionality Reduction and BERT

Files and data related to the medium article on clustering using UMAP and BERT.

All necessary code can be found in the clustering notebook. The precomputed embeddings are contained in the abstract_embeddings pickle object, and the already clustered data can either be accessed from the dict_list or directly in the clusters CSV file.

Refer to the code and Medium article for all the steps taken and the general idea!

Method highly inspired by Top2Vec, this is my own attempt at an implementation and introduction of a small change in the pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
datasets		datasets
.gitignore		.gitignore
README.md		README.md
clustering.ipynb		clustering.ipynb
clustering_plot.png		clustering_plot.png
dict_list.pkl		dict_list.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

medium-clustering

Inter-Class Clustering of Text Data Using Dimensionality Reduction and BERT

About

Releases

Packages

Languages

sarinasabharwal19/medium-clustering

Folders and files

Latest commit

History

Repository files navigation

medium-clustering

Inter-Class Clustering of Text Data Using Dimensionality Reduction and BERT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages