A Novel Scheme for Accelerating Support Vector Clustering
keywords: Support vector clustering, noise elimination, centroid, semi-supervised clustering
Limited by two time-consuming steps, solving the optimization problem and labeling the data points with cluster labels, the support vector clustering (SVC) based algorithms, perform ineffectively in processing large datasets. This paper presents a novel scheme aimed at solving these two problems and accelerating the SVC. Firstly, an innovative definition of noise data points is proposed which can be applied in the design of noise elimination to reduce the size of a data set as well as to improve its separability without destroying the profile. Secondly, in the cluster labeling, a double centroids (DBC) labeling method, representing each cell of a cluster by the centroids of shape and density, is presented. This method is implemented towards accelerating this procedure and addressing the problem of labeling the original data set with irregular or imbalanced distribution. Compared with the state-of-the-art algorithms, the experimental results show that the proposed method significantly reduces the computational resources and improves the accuracy. Further analysis and experiments of semi-supervised cluster labeling confirm that the proposed DBC model is suitable for representing cells in clustering.
mathematics subject classification 2000: 62H30, 68T30, 94A17
reference: Vol. 31, 2012, No. 3, pp. 613–638