Abstract
A central goal in the world of statistics and data science is the construction of linear regression models for continuous variables of interest. Often, our objective is to examine the impact of one or more explanatory variables, after adjusting for demographic covariates or other known/relevant factor(s). While the traditional approach is to use hypothesis testing to determine statistical significance, the p-values obtained are heavily dependent on sample size. This is particularly problematic for large datasets or “overpowered” studies, where even the tiniest of effects will appear to be highly significant. Computing capabilities and cloud-enhanced data sharing have revolutionized the way we use data worldwide, from healthcare and investments to manufacturing and retail. While machine learning and artificial intelligence are improving predictive analytics, we need better statistical inference to help understand and translate our models into meaningful and actionable insights. The coefficient of partial determination (or partialR2) is widely used in applied science to supplement hypothesis testing, but little work has been done to understand its statistical properties. In this work, we derive the complete distribution of partial R2 and perform simulated and real-world data analyses to show the advantages of adding it to your next analysis of Big Data.
Original language | English |
---|---|
Pages (from-to) | 115-128 |
Number of pages | 14 |
Journal | Journal of Statistical Theory and Applications |
Volume | 23 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2024 |
Bibliographical note
Publisher Copyright:© The Author(s) 2024.
Keywords
- Big data
- Coefficient of partial determination
- Linear regression
- Partial R
- R
ASJC Scopus subject areas
- Statistics and Probability
- Computer Science Applications
- Applied Mathematics