TY - GEN
T1 - NNMF-based factorization techniques for high-accuracy privacy protection on non-negative-valued datasets
AU - Wang, Jie
AU - Zhong, Weijun
AU - Zhang, Jun
PY - 2006
Y1 - 2006
N2 - The challenge in preserving data privacy is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix techniques to protect privacy and present a novel algebraic technique based on iterative methods for non-negative-valued data distortion. As an unsupervised learning method for uncovering latent features in high-dimensional data, a low rank nonnegative matrix factorization (NNMF) is used to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. It is the first in privacy preserving data mining in our paper that combining non-negative matrix decomposition with distortion processing. Two iterative methods to solve bound-constrained optimization problem in NMF are compared by experiments on Wisconsin Breast Cancer Dataset. The overall performance of NMF on distortion level and data utility is compared to our previously-proposed SVD-based distortion strategies and other existing popular data perturbation methods. Data utility is examined by cross validation of a binary classification using the support vector machine. Our experimental results on data mining benchmark datasets indicate that, in comparison with standard data distortion techniques, the proposed NMF-based method are very efficient in balancing data privacy and data utility, and it affords a feasible solution with a good promise on high-accuracy privacy preserving data mining.
AB - The challenge in preserving data privacy is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix techniques to protect privacy and present a novel algebraic technique based on iterative methods for non-negative-valued data distortion. As an unsupervised learning method for uncovering latent features in high-dimensional data, a low rank nonnegative matrix factorization (NNMF) is used to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. It is the first in privacy preserving data mining in our paper that combining non-negative matrix decomposition with distortion processing. Two iterative methods to solve bound-constrained optimization problem in NMF are compared by experiments on Wisconsin Breast Cancer Dataset. The overall performance of NMF on distortion level and data utility is compared to our previously-proposed SVD-based distortion strategies and other existing popular data perturbation methods. Data utility is examined by cross validation of a binary classification using the support vector machine. Our experimental results on data mining benchmark datasets indicate that, in comparison with standard data distortion techniques, the proposed NMF-based method are very efficient in balancing data privacy and data utility, and it affords a feasible solution with a good promise on high-accuracy privacy preserving data mining.
KW - Iterative method
KW - Non-negative matrix factorization
KW - Privacy
UR - http://www.scopus.com/inward/record.url?scp=49549118860&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49549118860&partnerID=8YFLogxK
U2 - 10.1109/icdmw.2006.123
DO - 10.1109/icdmw.2006.123
M3 - Conference contribution
AN - SCOPUS:49549118860
SN - 0769527027
SN - 9780769527024
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 513
EP - 517
BT - Proceedings - ICDM Workshops 2006 - 6th IEEE International Conference on Data Mining - Workshops
ER -