Towards an Unsupervised Method for Network Anomaly Detection in Large Datasets

keywords: Cluster, unsupervised, cluster stability, ensemble, anomaly detection
In this paper, we present an effective tree based subspace clustering technique (TreeCLUSS) for finding clusters in network intrusion data and for detecting known as well as unknown attacks without using any labelled traffic or signatures or training. To establish its effectiveness in finding the appropriate number of clusters, we perform a cluster stability analysis. We also introduce an effective cluster labelling technique (CLUSSLab) to label each cluster based on the stable cluster set obtained from TreeCLUSS. CLUSSLab is a multi-objective technique that employs an ensemble approach for labelling each stable cluster generated by TreeCLUSS to achieve high detection rate. We also introduce an effective unsupervised feature clustering technique to identify the dominating feature set from each cluster. We evaluate the performance of both TreeCLUSS and CLUSSLab using several real world intrusion datasets to identify known as well as unknown attacks and find that results are excellent.
reference: Vol. 33, 2014, No. 1, pp. 1–34