Mining approximate frequent itemsets from noisy data

Jinze Liu, Susan Paulsen, Wei Wang, Andrew Nobel, Jan Prins

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, "exact" approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise, traditional itemset mining may fail to detect relevant itemsets, particularly those large itemsets that are more vulnerable to noise. In this paper we propose approximate frequent itemsets (AFI), as a noise-tolerant itemset model. In addition to the usual requirement for sufficiently many supporting transactions, the AFI model places constraints on the fraction of errors permitted in each item column and the fraction of errors permitted in a supporting transaction. Taken together, these constraints winnow out the approximate itemsets that exhibit systematic errors. In the context of a simple noise model, we demonstrate that AFI is better at recovering underlying data patterns, while identifying fewer spurious patterns than either the exact frequent itemset approach or the existing error tolerant itemset approach of Yang et al. [11].

Original languageEnglish
Title of host publicationProceedings - Fifth IEEE International Conference on Data Mining, ICDM 2005
Pages721-724
Number of pages4
DOIs
StatePublished - 2005
Event5th IEEE International Conference on Data Mining, ICDM 2005 - Houston, TX, United States
Duration: Nov 27 2005Nov 30 2005

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference5th IEEE International Conference on Data Mining, ICDM 2005
Country/TerritoryUnited States
CityHouston, TX
Period11/27/0511/30/05

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Mining approximate frequent itemsets from noisy data'. Together they form a unique fingerprint.

Cite this