Addressing accuracy issues in privacy preserving data mining through matrix factorization

Jie Wang, Jun Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Maintaining data mining accuracy on distorted datasets is an important issue in privacy preserving data mining. Using matrix approximation, we propose several efficient and flexible techniques to address this issue, and utilize unique characteristics of matrix factorization to maintain data pattern. We use the support vector machine classification to compare accuracy maintenance after data distortion by different methods. With better performance than some classical data perturbation approaches, nonnegative matrix factorization and singular value decomposition are considered to be promising techniques for privacy preserving data mining Experimental results demonstrate that mining accuracy on the distorted data used these methods is almost as good as that on the original data, with added property of privacy preservation. It indicates that the matrix factorization-based data distortion schemes perturb only confidential attributes to meet privacy requirements while preserving general data pattern for knowledge extraction.

Original languageEnglish
Title of host publicationISI 2007
Subtitle of host publication2007 IEEE Intelligence and Security Informatics
Pages217-220
Number of pages4
DOIs
StatePublished - 2007
EventISI 2007: 2007 IEEE Intelligence and Security Informatics - New Brunswick, NJ, United States
Duration: May 23 2007May 24 2007

Publication series

NameISI 2007: 2007 IEEE Intelligence and Security Informatics

Conference

ConferenceISI 2007: 2007 IEEE Intelligence and Security Informatics
Country/TerritoryUnited States
CityNew Brunswick, NJ
Period5/23/075/24/07

Keywords

  • Data mining
  • Matrix factorization
  • Nonnegative matrix factorization
  • Privacy

ASJC Scopus subject areas

  • General Computer Science
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Addressing accuracy issues in privacy preserving data mining through matrix factorization'. Together they form a unique fingerprint.

Cite this