Abstract
As new technologies permit the generation of hitherto unprecedented volumes of data (e.g. genome-wide association study data), researchers struggle to keep up with the added complexity and time commitment required for its analysis. For this reason, model selection commonly relies on machine learning and data-reduction techniques, which tend to afford models with obscure interpretations. Even in cases with straightforward explanatory variables, the so-called ‘best’ model produced by a given model-selection technique may fail to capture information of vital importance to the domain-specific questions at hand. Herein we propose a new concept for model selection, feasibility, for use in identifying multiple models that are in some sense optimal and may unite to provide a wider range of information relevant to the topic of interest, including (but not limited to) interaction terms. We further provide an R package and associated Shiny Applications for use in identifying or validating feasible models, the performance of which we demonstrate on both simulated and real-life data.
Original language | English |
---|---|
Pages (from-to) | 2022-2041 |
Number of pages | 20 |
Journal | Journal of Applied Statistics |
Volume | 48 |
Issue number | 11 |
DOIs | |
State | Published - 2021 |
Bibliographical note
Publisher Copyright:© 2020 Informa UK Limited, trading as Taylor & Francis Group.
Keywords
- Data analysis
- feasibility
- model selection
- model validation
- regression
- statistical model
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty