clustering

This repository contains the c code which clusters data (and finds number of clusters in the data), r code that processes the results of clustering and example of the data and results of clustering.

The c code implements paralellised version of efficient population dynamics algorithm, developed for the model based Bayesian clustering in https://arxiv.org/abs/1810.02627, which assumes Gaussian distribution of data. To compile on the multiprocessor (Linux) machine the command "gcc -Wall -fopenmp PopulDynamClustV5v4.c -lm -O3 -o populdynam;" was used in the terminal. To run on the multiprocessor (Linux) machine the command "date; ./populdynam<parameters.in>parameters.out; date;" was used in the terminal.

The data has to be in the tab separated csv format (and transformed if this is needed). 10dL2c0.csv is the sample of correlated data (with the same mean vectors and different random covariance matrices) in 10 dimensions and with 2 clusters. 10dL2c0K10.in is the *.in file for this data. The meaning of the numbers "99191 20000 10 1 10 100 1000 10dL2c0.csv 0 nofile" in this file is given in the first line of the 10dL2c0K10.out file "#seed=99191 N=20000 d=10 K1=1 K2=10 rest.=100 t_max=1000 data-file=10dL2c0.csv". Here the "seed=99191" is the seed for random number generator, "N=20000" is the sample size , "d=10" is the dimension of data, "K1=1 K2=10" is the range for the number of clusters to consider , "rest.=100" is the number of restarts for the algorithm, "t_max=1000" is the maximum allowed "time" parameter, "data-file=10dL2c0.csv" is the data-file name and the last two "0 nofile" are always the same. The upper bound on the numerical complexity is proportional to $(K2-K1)\times t_{max}\times \mbox{rest.}\times N$

The clustering algorithm produces files which can be processed by the r code. For the data file "10dL2c0.csv" the r code is "10dL2c0.csv.cluster.statistics.r" and "10dL2c0.csv.clustering.statistics.r". The r code takes results of clustering and produces a number of *.tex and image files. Then one has to open "10dL2c0.csv.cluster.statistics.tex" and "10dL2c0.csv.clustering.statistics.tex" files produced by the r code and to compile in the tex editor. The latter gives the presentations "10dL2c0.csv.cluster.statistics.pdf" and "10dL2c0.csv.clustering.statistics.pdf" for statistics of, respectively, the clusters and clustering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

clustering

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
10dL2c0.csv		10dL2c0.csv
10dL2c0.csv.cluster.statistics.pdf		10dL2c0.csv.cluster.statistics.pdf
10dL2c0.csv.cluster.statistics.r		10dL2c0.csv.cluster.statistics.r
10dL2c0.csv.cluster.statistics.tex		10dL2c0.csv.cluster.statistics.tex
10dL2c0.csv.clustering.statistics.pdf		10dL2c0.csv.clustering.statistics.pdf
10dL2c0.csv.clustering.statistics.r		10dL2c0.csv.clustering.statistics.r
10dL2c0.csv.clustering.statistics.tex		10dL2c0.csv.clustering.statistics.tex
10dL2c0K10.in		10dL2c0K10.in
10dL2c0K10.out		10dL2c0K10.out
PopulDynamClustV5v4.c		PopulDynamClustV5v4.c
README.md		README.md

AMozeika/clustering

Folders and files

Latest commit

History

Repository files navigation

clustering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages