Efficient Density-Based Partitional Clustering Algorithm

Zareen Alamgir

Computer Science Department National University of Computer and Emerging Sciences Lahore, Pakistan
Hina Naveed

Computer Science Department National University of Computer and Emerging Sciences Lahore, Pakistan

Efficient Density-Based Partitional Clustering Algorithm

keywords: Clustering, K-means, density-based K-means, EDK-means, partitional clustering

Clustering is an important data mining technique that helps to detect hidden structures and patterns in the data. K-means algorithm is one of the most popular and widely used partitional clustering algorithms. It is a simple and efficient method but has several shortcomings. One major drawback of traditional K-means is that it selects initial centroids randomly, resulting in low-quality clusters. Various K-means extensions are designed to solve the issue of the random initial centroid. A novel density-based K-means (DK-means) algorithm is recently proposed that uses density-parameters for selecting initial centroids. It outperforms K-means in terms of accuracy at the cost of time. In this research, we present an efficient density-based K-means algorithm (EDK-means) that uses advance data structures and significantly reduces the DK-means algorithm's execution time. Furthermore, we rigorously evaluated the performance of density-based K-means on different challenging real-world datasets and compared it with traditional K-means. The experimental results are promising and show that density-based K-means outperforms K-means. It converges more rapidly than basic K-means, and it works well for the datasets with different cluster sizes.

mathematics subject classification 2000: 91C20, 11Y16

reference: Vol. 40, 2021, No. 6, pp. 1322–1344

doi: 10.31577/cai_2021_6_1322

Computing and Informatics

formerly Computers and Artificial Intelligence

Efficient Density-Based Partitional Clustering Algorithm