TY - GEN
T1 - Generalized random rotation perturbation for vertically partitioned data sets
AU - Lin, Zhenmin
AU - Wang, Jie
AU - Liu, Lian
AU - Zhang, Jun
PY - 2009
Y1 - 2009
N2 - Random rotation is one of the common perturbation approaches for privacy preserving data classification, in which the data matrix is multiplied by a random rotation matrix before publishing in order to preserve data privacy. One distinct advantage of this approach is that it can maintain the geometric properties of the data matrix, so several categories of classifiers that are based on the geometric properties of the data can achieve similar accuracy on the transformed data as that on the original data. In this paper, we generalize this idea to the situation where the data matrix is assumed to be vertically partitioned into several sub-matrices and held by different owners. Each data holder can choose a rotation matrix randomly and independently to perturb their individual data. Then they all send the transformed data to a third party, who collects all of them and forms a whole data set for data mining or other analysis purposes. We show that under such a scheme the geometric properties of the data set is also preserved and thus it can maintain the accuracy of many classifiers and clustering techniques applied on the transformed data as on the original data. This method enables us to develop efficient centralized data mining algorithms instead of distributed algorithms to preserve privacy. Experiments on real data sets show that such generalization is effective for vertically partitioned data sets.
AB - Random rotation is one of the common perturbation approaches for privacy preserving data classification, in which the data matrix is multiplied by a random rotation matrix before publishing in order to preserve data privacy. One distinct advantage of this approach is that it can maintain the geometric properties of the data matrix, so several categories of classifiers that are based on the geometric properties of the data can achieve similar accuracy on the transformed data as that on the original data. In this paper, we generalize this idea to the situation where the data matrix is assumed to be vertically partitioned into several sub-matrices and held by different owners. Each data holder can choose a rotation matrix randomly and independently to perturb their individual data. Then they all send the transformed data to a third party, who collects all of them and forms a whole data set for data mining or other analysis purposes. We show that under such a scheme the geometric properties of the data set is also preserved and thus it can maintain the accuracy of many classifiers and clustering techniques applied on the transformed data as on the original data. This method enables us to develop efficient centralized data mining algorithms instead of distributed algorithms to preserve privacy. Experiments on real data sets show that such generalization is effective for vertically partitioned data sets.
KW - Data mining
KW - Data perturbation
KW - Matrix rotation
KW - Privacy preserving
UR - http://www.scopus.com/inward/record.url?scp=67650469160&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650469160&partnerID=8YFLogxK
U2 - 10.1109/CIDM.2009.4938644
DO - 10.1109/CIDM.2009.4938644
M3 - Conference contribution
AN - SCOPUS:67650469160
SN - 9781424427659
T3 - 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings
SP - 159
EP - 162
BT - 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings
T2 - 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009
Y2 - 30 March 2009 through 2 April 2009
ER -