Abstract
In the era of big data, researchers interested in developing statistical models are challenged with how to achieve parsimony. Usually, some sort of dimension reduction strategy is employed. Classic strategies are often in the form of traditional inference procedures, such as hypothesis testing; however, the increase in computing capabilities has led to the development of more sophisticated methods. In particular, sufficient dimension reduction has emerged as an area of broad and current interest. While these types of dimension reduction strategies have been employed for numerous data problems, they are scantly discussed in the context of analyzing survey data. This paper provides an overview of some classic and modern dimension reduction methods, followed by a discussion of how to use the transformed variables in the context of analyzing survey data. We highlight some of these methods with an analysis of health insurance coverage using the US Census Bureau’s 2015 Planning Database.
Original language | English |
---|---|
Article number | 43 |
Journal | Journal of Big Data |
Volume | 4 |
Issue number | 1 |
DOIs | |
State | Published - Dec 1 2017 |
Bibliographical note
Funding Information:JW: Identified and prepared background material on the dimension reduction procedures discussed in the manuscript. Wrote all R scripts that were used for the analysis. Assembled summaries of all results. DSY: Identified and provided context for this applied problem. Responsible for interpreting results and identifying appropriate summaries. Responsible for preparation of manuscript. Both authors read and approved the final manuscript. JW is a Ph.D. student in the Department of Statistics at the University of Kentucky. Her Ph.D. research focuses on novel sufficient dimension reduction methods. DSY is an Assistant Professor in the Department of Statistics at the University of Kentucky. His research interests include mixture modeling, tolerance regions, statistical computing, and applied survey data analysis. Prior to joining the faculty at the University of Kentucky, he spent 3.5 years as a Senior Statistician working on data problems for the Naval Nuclear Propulsion Program and 3 years as a Research Mathematical Statistician at the US Census Bureau working on big data problems, some of which utilized older versions of the Planning Database. DSY is also an Accredited Professional Statistician™ of the American Statistical Association. We would like to thank Professor Xiangrong Yin of the University of Kentucky for many helpful comments on an earlier draft of this manuscript. We would also like to thank five anonymous reviewers who provided a number of important comments that helped improve the overall quality of this manuscript. The authors declare that they have no competing interests. The 2015 PDB is a publicly available Census Bureau dataset located at http://goo.gl/LlcwY7. All R code used to analyze the data is available as Additional files 1 , 2. Not applicable. Not applicable. JW was supported as a Research Assistant by NSF Grant SES-1562503 throughout the duration of this research. The funding body did not have any role in the design of the study or the collection, analysis, and interpretation of data. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Funding Information:
JW was supported as a Research Assistant by NSF Grant SES-1562503 throughout the duration of this research. The funding body did not have any role in the design of the study or the collection, analysis, and interpretation of data.
Publisher Copyright:
© 2017, The Author(s).
Keywords
- Big data
- Central mean subspace
- Flexible models
- Official statistics
- Principal component analysis
- Sufficient dimension reduction
ASJC Scopus subject areas
- Information Systems
- Hardware and Architecture
- Computer Networks and Communications
- Information Systems and Management