Files and data related to the medium article on clustering using UMAP and BERT.
All necessary code can be found in the clustering notebook. The precomputed embeddings are contained in the abstract_embeddings pickle object, and the already clustered data can either be accessed from the dict_list or directly in the clusters CSV file.
Refer to the code and Medium article for all the steps taken and the general idea!
Method highly inspired by Top2Vec, this is my own attempt at an implementation and introduction of a small change in the pipeline.