TY - GEN
T1 - Mining approximate order preserving clusters in the presence of noise
AU - Zhang, Mengsheng
AU - Wang, Wei
AU - Liu, Jinze
PY - 2008
Y1 - 2008
N2 - Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression patterns can be represented by consistent orderin gs of attributes. Existing order preserving cluster models require all objects in a cluster have identical attribute order without deviation. However, real data are noisy due to measurement technology limitation and experimental variability which prohibits these strict models from revealing true clusters corrupted by noise. In this paper, we study the problem of revealing the order preserving clusters in the presence of noise. We propose a noise-tolerant model called approximate order preserving cluster (AOPC). Instead of requiring all objects in a cluster have identical attribute order, we require that (1) at least a certain fraction of the objects have identical attribute order; (2) other objects in the cluster may deviate from the consensus order by up to a certain fraction of attributes. We also propose an algorithm to mine AOPC. Experiments on gene expression data demonstrate the efficiency and effectiveness of our algorithm.
AB - Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression patterns can be represented by consistent orderin gs of attributes. Existing order preserving cluster models require all objects in a cluster have identical attribute order without deviation. However, real data are noisy due to measurement technology limitation and experimental variability which prohibits these strict models from revealing true clusters corrupted by noise. In this paper, we study the problem of revealing the order preserving clusters in the presence of noise. We propose a noise-tolerant model called approximate order preserving cluster (AOPC). Instead of requiring all objects in a cluster have identical attribute order, we require that (1) at least a certain fraction of the objects have identical attribute order; (2) other objects in the cluster may deviate from the consensus order by up to a certain fraction of attributes. We also propose an algorithm to mine AOPC. Experiments on gene expression data demonstrate the efficiency and effectiveness of our algorithm.
UR - http://www.scopus.com/inward/record.url?scp=52649099915&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=52649099915&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2008.4497424
DO - 10.1109/ICDE.2008.4497424
M3 - Conference contribution
AN - SCOPUS:52649099915
SN - 9781424418374
T3 - Proceedings - International Conference on Data Engineering
SP - 160
EP - 168
BT - Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
T2 - 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Y2 - 7 April 2008 through 12 April 2008
ER -