Feature selection: A preprocess for data perturbation

Pengpeng Lin, Nirmal Thapa, Ingrid St. Omer, Lian Liu, Jun Zhang

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

As a major concern in designing various data mining applications, privacy preservation has become a critical component seeking a trade-off between mining performances and protecting sensitive information. Data perturbation or distortion is a widely used approach for privacy protection. Many privacy preservation approaches were developed, either by adding noises or by matrix decomposition methods. In this paper, we intensively studied Singular Value Decomposition (SVD) based data distortion strategy and feature selection techniques, and conducted experiments to explore how feature selection technique could be used and better serve for privacy preservation purpose. Sparsified Singular Value Decomposition (SSVD) and filter based feature selection are used for data distortion and reducing feature space. We design a modified version of Exponential Threshold Strategy (ETS) as our threshold function for matrix sparsification process, and implement several metrics to measure data perturbation level. We also propose a novel algorithm to compute rank and analyze its lower running time bound. The mining utility of distorted data is tested with a well known Classifier, Support Vector Machine (SVM).

Original languageEnglish
Pages (from-to)168-175
Number of pages8
JournalIAENG International Journal of Computer Science
Volume38
Issue number2
StatePublished - May 25 2011

Keywords

  • Feature selection
  • Perturbation
  • SSVD
  • SVD
  • SVM

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Feature selection: A preprocess for data perturbation'. Together they form a unique fingerprint.

Cite this