TY - GEN
T1 - Constrained nonnegative matrix factorization based data distortion techniques study of data privacy and utility
AU - Thapa, Nirmal
AU - Lin, Peng Peng
AU - Liu, Lian
AU - Wang, Jie
AU - Zhang, Jun
PY - 2012
Y1 - 2012
N2 - With the rise of data mining techniques came across the problem of privacy disclosure, that is why it has become one of the top priorities as far as designing the data mining techniques is concerned. In this paper, we briefly discuss the Nonnegative Matrix Factorization (NMF) and the motivation behind using NMF for data representation. We provide the mathematical derivation for NMF with some additional constraints. Based on the mathematical derivations, we propose a couple of novel data distortion strategies. The first technique is called the Constrained Nonnegative Matrix Factorization (CMF) and the second one is Sparsified CNMF. We study the distortion level of each of these algorithms with the other matrix based techniques like SVD and NMF. K-means is used to study the data utility of the two proposed methods. Our experimental results show that, in comparison with standard data distortion techniques, the proposed schemes are very effective in achieving a good tradeoff between data privacy and data utility, and affords a feasible solution to protect sensitive information and promise higher accuracy in decision making. We investigate utility of the perturbed data based on the results from the original data.
AB - With the rise of data mining techniques came across the problem of privacy disclosure, that is why it has become one of the top priorities as far as designing the data mining techniques is concerned. In this paper, we briefly discuss the Nonnegative Matrix Factorization (NMF) and the motivation behind using NMF for data representation. We provide the mathematical derivation for NMF with some additional constraints. Based on the mathematical derivations, we propose a couple of novel data distortion strategies. The first technique is called the Constrained Nonnegative Matrix Factorization (CMF) and the second one is Sparsified CNMF. We study the distortion level of each of these algorithms with the other matrix based techniques like SVD and NMF. K-means is used to study the data utility of the two proposed methods. Our experimental results show that, in comparison with standard data distortion techniques, the proposed schemes are very effective in achieving a good tradeoff between data privacy and data utility, and affords a feasible solution to protect sensitive information and promise higher accuracy in decision making. We investigate utility of the perturbed data based on the results from the original data.
KW - Constrained NMF
KW - Data distortion
KW - NMF
KW - Nonnegative matrix factorization
KW - SVD
UR - http://www.scopus.com/inward/record.url?scp=84868629167&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84868629167&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84868629167
SN - 9789898565181
T3 - DATA 2012 - Proceedings of the International Conference on Data Technologies and Applications
SP - 51
EP - 56
BT - DATA 2012 - Proceedings of the International Conference on Data Technologies and Applications
T2 - 1st International Conference on Data Technologies and Applications, DATA 2012
Y2 - 25 July 2012 through 27 July 2012
ER -