TY - GEN
T1 - Combined data distortion strategies for privacy-preserving data mining
AU - Peng, Bo
AU - Geng, Xingyu
AU - Zhang, Jun
PY - 2010
Y1 - 2010
N2 - The problem of privacy-preserving data mining has become more and more important in recent years. Many successful and efficient techniques have been developed. However, in collaborative data analysis, part of the datasets may come from different data owners and may be processed using different data distortion methods. Thus, combinations of datasets processed using different methods are of practical interests. In this paper, a class of novel data distortion strategies is proposed. Four schemes via attribute partition, with different combinations of singular value decomposition (SVD), nonnegative matrix factorization (NMF), discrete wavelet transformation (DWT), are designed to perturb submatrix of the original datasets for privacy protection. We use some metrics to measure the performance of the proposed new strategies. Data utility is examined by using a binary classification based on the support vector machine. Our experimental results indicate that, in comparison with the individual data distortion techniques, the proposed schemes are very efficient in achieving a good trade-off between data privacy and data utility, and provide a feasible solution for collaborative data analysis.
AB - The problem of privacy-preserving data mining has become more and more important in recent years. Many successful and efficient techniques have been developed. However, in collaborative data analysis, part of the datasets may come from different data owners and may be processed using different data distortion methods. Thus, combinations of datasets processed using different methods are of practical interests. In this paper, a class of novel data distortion strategies is proposed. Four schemes via attribute partition, with different combinations of singular value decomposition (SVD), nonnegative matrix factorization (NMF), discrete wavelet transformation (DWT), are designed to perturb submatrix of the original datasets for privacy protection. We use some metrics to measure the performance of the proposed new strategies. Data utility is examined by using a binary classification based on the support vector machine. Our experimental results indicate that, in comparison with the individual data distortion techniques, the proposed schemes are very efficient in achieving a good trade-off between data privacy and data utility, and provide a feasible solution for collaborative data analysis.
KW - Data distortation
KW - Data mining
KW - Privacy preservation
UR - http://www.scopus.com/inward/record.url?scp=78149330224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78149330224&partnerID=8YFLogxK
U2 - 10.1109/ICACTE.2010.5578952
DO - 10.1109/ICACTE.2010.5578952
M3 - Conference contribution
AN - SCOPUS:78149330224
SN - 9781424465408
T3 - ICACTE 2010 - 2010 3rd International Conference on Advanced Computer Theory and Engineering, Proceedings
SP - V1572-V1576
BT - ICACTE 2010 - 2010 3rd International Conference on Advanced Computer Theory and Engineering, Proceedings
T2 - 2010 3rd International Conference on Advanced Computer Theory and Engineering, ICACTE 2010
Y2 - 20 August 2010 through 22 August 2010
ER -