Efficiently Using Prime-Encoding for Mining Frequent Itemsets in Sparse Data

keywords: Mining frequent itemsets, prime-block encoding, sparse data
In the data mining field, data representation turns out to be one of the major factors affecting mining algorithm scalability. Mining Frequent Itemsets (MFI) is a data mining problem that is heavily affected by this fact. The vertical approach is one of the successful data representations adopted for MFI problem. The main advantage of this approach is support for fast frequency counting via joining operations. Recently, an encoding method called prime-encoding is proposed as an enhancement for the vertical approach citeKaram10:JCSS. The performance study introduced in citeKaram10:JCSS confirmed the high quality of prime-encoding based vertical mining of frequent sequence over other vertical and horizontal ones in terms of space and time. Though sequence mining is more general than itemset mining, this paper presents a prime-encoding based vertical mining of frequent itemsets with new optimizations and a new re-encoding method that further enhance memory and speed. The experimental results show that prime encoding based vertical itemset mining is suitable for high-dimensional sparse data.
reference: Vol. 32, 2013, No. 5, pp. 1079–1099