A comparative study on data perturbation with feature selection

Pengpeng Lin, Jun Zhang, Ingrid St. Omer, Huanjing Wang, Jie Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

As a major concern in designing various data mining applications, privacy preservation has become a critical component seeking a trade-off between mining utilities and protecting sensitive information. Data perturbation or distortion is a widely used approach for privacy protection. Either by adding noises or matrix decomposition methods, many algorithms were developed based on the simulation of attacker's behaviors. Most of them are complicated and computationally infeasible on dataset with huge attribute space. In addition, the real-world data tend to be inconsistent, redundant and consist of irrelevant part to target information. Executing algorithms on such data is costly and ineffective. Data preprocessing routines attempt to smooth out noise while identifying outliers, and correct inconsistencies in the data. One of the most important data preprocessing techniques is feature selection. In this paper, we intensively studied Singular Value Decomposition (SVD) based data distortion strategy and feature selection techniques, and conducted experiments to explore how feature selection approaches should be used and better serve for privacy preservation purpose. Sparsified Singular Value Decomposition (SSVD) and filter based feature selection are used for data distortion and reducing feature space. We propose a modified version of Exponential Threshold Strategy(ETS) as our threshold function for matrix sparsification. Some metrics are used to measure data distortion level. We also proposed a novel algorithm to compute rank and gave its lower running time bound. The mining utility of distorted data is tested with a well known Classifier, Support Vector Machine (SVM).

Original languageEnglish
Title of host publicationIMECS 2011 - International MultiConference of Engineers and Computer Scientists 2011
Pages454-459
Number of pages6
StatePublished - 2011
EventInternational MultiConference of Engineers and Computer Scientists 2011, IMECS 2011 - Kowloon, Hong Kong
Duration: Mar 16 2011Mar 18 2011

Publication series

NameIMECS 2011 - International MultiConference of Engineers and Computer Scientists 2011
Volume1

Conference

ConferenceInternational MultiConference of Engineers and Computer Scientists 2011, IMECS 2011
Country/TerritoryHong Kong
CityKowloon
Period3/16/113/18/11

Keywords

  • Feature selection
  • Perturbation
  • SSVD
  • SVD
  • SVM

ASJC Scopus subject areas

  • General Computer Science
  • General Engineering

Fingerprint

Dive into the research topics of 'A comparative study on data perturbation with feature selection'. Together they form a unique fingerprint.

Cite this