MST-Based Semi-Supervised Clustering Using M-Labeled Objects

keywords: Data mining, semi-supervised learning, clustering, label propagation, MST
Most of the existing semi-supervised clustering algorithms depend on pairwise constraints, and they usually use lots of priori knowledge to improve their accuracies. In this paper, we use another semi-supervised method called label propagation to help detect clusters. We propose two new semi-supervised algorithms named K-SSMST and M-SSMST. Both of them aim to discover clusters of diverse density and arbitrary shape. Based on Minimum Spanning Tree's algorithm variant, K-SSMST can automatically find natural clusters in a dataset by using K labeled data objects where K is the number of clusters. M-SSMST can detect new clusters with insufficient semi-supervised information. Our algorithms have been tested on various artificial and UCI datasets. The results demonstrate that the algorithm's accuracy is better than other supervised and semi-supervised approaches.
mathematics subject classification 2000: 62H30, 91C20
reference: Vol. 31, 2012, No. 6+, pp. 1557–1574