TY - JOUR
T1 - A data recipient centered de-identification method to retain statistical attributes
AU - Gal, Tamas S.
AU - Tucker, Thomas C.
AU - Gangopadhyay, Aryya
AU - Chen, Zhiyuan
PY - 2014/8
Y1 - 2014/8
N2 - Privacy has always been a great concern of patients and medical service providers. As a result of the recent advances in information technology and the government's push for the use of Electronic Health Record (EHR) systems, a large amount of medical data is collected and stored electronically. This data needs to be made available for analysis but at the same time patient privacy has to be protected through de-identification. Although biomedical researchers often describe their research plans when they request anonymized data, most existing anonymization methods do not use this information when de-identifying the data. As a result, the anonymized data may not be useful for the planned research project. This paper proposes a data recipient centered approach to tailor the de-identification method based on input from the recipient of the data. We demonstrate our approach through an anonymization project for biomedical researchers with specific goals to improve the utility of the anonymized data for statistical models used for their research project. The selected algorithm improves a privacy protection method called Condensation by Aggarwal et al. Our methods were tested and validated on real cancer surveillance data provided by the Kentucky Cancer Registry.
AB - Privacy has always been a great concern of patients and medical service providers. As a result of the recent advances in information technology and the government's push for the use of Electronic Health Record (EHR) systems, a large amount of medical data is collected and stored electronically. This data needs to be made available for analysis but at the same time patient privacy has to be protected through de-identification. Although biomedical researchers often describe their research plans when they request anonymized data, most existing anonymization methods do not use this information when de-identifying the data. As a result, the anonymized data may not be useful for the planned research project. This paper proposes a data recipient centered approach to tailor the de-identification method based on input from the recipient of the data. We demonstrate our approach through an anonymization project for biomedical researchers with specific goals to improve the utility of the anonymized data for statistical models used for their research project. The selected algorithm improves a privacy protection method called Condensation by Aggarwal et al. Our methods were tested and validated on real cancer surveillance data provided by the Kentucky Cancer Registry.
KW - Privacy
KW - Statistical analysis
KW - Utility based privacy preserving data mining
UR - http://www.scopus.com/inward/record.url?scp=84905190808&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905190808&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2014.01.001
DO - 10.1016/j.jbi.2014.01.001
M3 - Article
C2 - 24412834
AN - SCOPUS:84905190808
SN - 1532-0464
VL - 50
SP - 32
EP - 45
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -