TY - GEN
T1 - Addressing accuracy issues in privacy preserving data mining through matrix factorization
AU - Wang, Jie
AU - Zhang, Jun
PY - 2007
Y1 - 2007
N2 - Maintaining data mining accuracy on distorted datasets is an important issue in privacy preserving data mining. Using matrix approximation, we propose several efficient and flexible techniques to address this issue, and utilize unique characteristics of matrix factorization to maintain data pattern. We use the support vector machine classification to compare accuracy maintenance after data distortion by different methods. With better performance than some classical data perturbation approaches, nonnegative matrix factorization and singular value decomposition are considered to be promising techniques for privacy preserving data mining Experimental results demonstrate that mining accuracy on the distorted data used these methods is almost as good as that on the original data, with added property of privacy preservation. It indicates that the matrix factorization-based data distortion schemes perturb only confidential attributes to meet privacy requirements while preserving general data pattern for knowledge extraction.
AB - Maintaining data mining accuracy on distorted datasets is an important issue in privacy preserving data mining. Using matrix approximation, we propose several efficient and flexible techniques to address this issue, and utilize unique characteristics of matrix factorization to maintain data pattern. We use the support vector machine classification to compare accuracy maintenance after data distortion by different methods. With better performance than some classical data perturbation approaches, nonnegative matrix factorization and singular value decomposition are considered to be promising techniques for privacy preserving data mining Experimental results demonstrate that mining accuracy on the distorted data used these methods is almost as good as that on the original data, with added property of privacy preservation. It indicates that the matrix factorization-based data distortion schemes perturb only confidential attributes to meet privacy requirements while preserving general data pattern for knowledge extraction.
KW - Data mining
KW - Matrix factorization
KW - Nonnegative matrix factorization
KW - Privacy
UR - http://www.scopus.com/inward/record.url?scp=34748877717&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34748877717&partnerID=8YFLogxK
U2 - 10.1109/isi.2007.379474
DO - 10.1109/isi.2007.379474
M3 - Conference contribution
AN - SCOPUS:34748877717
SN - 1424413303
SN - 9781424413300
T3 - ISI 2007: 2007 IEEE Intelligence and Security Informatics
SP - 217
EP - 220
BT - ISI 2007
T2 - ISI 2007: 2007 IEEE Intelligence and Security Informatics
Y2 - 23 May 2007 through 24 May 2007
ER -