TY - GEN
T1 - Mining approximate frequent itemsets from noisy data
AU - Liu, Jinze
AU - Paulsen, Susan
AU - Wang, Wei
AU - Nobel, Andrew
AU - Prins, Jan
PY - 2005
Y1 - 2005
N2 - Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, "exact" approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise, traditional itemset mining may fail to detect relevant itemsets, particularly those large itemsets that are more vulnerable to noise. In this paper we propose approximate frequent itemsets (AFI), as a noise-tolerant itemset model. In addition to the usual requirement for sufficiently many supporting transactions, the AFI model places constraints on the fraction of errors permitted in each item column and the fraction of errors permitted in a supporting transaction. Taken together, these constraints winnow out the approximate itemsets that exhibit systematic errors. In the context of a simple noise model, we demonstrate that AFI is better at recovering underlying data patterns, while identifying fewer spurious patterns than either the exact frequent itemset approach or the existing error tolerant itemset approach of Yang et al. [11].
AB - Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, "exact" approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise, traditional itemset mining may fail to detect relevant itemsets, particularly those large itemsets that are more vulnerable to noise. In this paper we propose approximate frequent itemsets (AFI), as a noise-tolerant itemset model. In addition to the usual requirement for sufficiently many supporting transactions, the AFI model places constraints on the fraction of errors permitted in each item column and the fraction of errors permitted in a supporting transaction. Taken together, these constraints winnow out the approximate itemsets that exhibit systematic errors. In the context of a simple noise model, we demonstrate that AFI is better at recovering underlying data patterns, while identifying fewer spurious patterns than either the exact frequent itemset approach or the existing error tolerant itemset approach of Yang et al. [11].
UR - http://www.scopus.com/inward/record.url?scp=33746082673&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746082673&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2005.93
DO - 10.1109/ICDM.2005.93
M3 - Conference contribution
AN - SCOPUS:33746082673
SN - 0769522785
SN - 9780769522784
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 721
EP - 724
BT - Proceedings - Fifth IEEE International Conference on Data Mining, ICDM 2005
T2 - 5th IEEE International Conference on Data Mining, ICDM 2005
Y2 - 27 November 2005 through 30 November 2005
ER -