Outliers and extreme values are common in the era of big data, especially in the collection of survey data and real analysis. Clearly, care needs to be taken with how such values are treated in the calculation of statistical summaries, such as those involving the sample mean and sample variance. Robust alternatives based on trimming or Winsorization are often employed to mitigate the effect of those outlying points. An aspect critical to these methods, however, is in the determination of the cutoff locations. One classic approach is g-and-g-times trimming/Winsorization, which takes a proportion g off from both tails. However, this method does not carry any confidence statement, such as one finds with the calculation of statistical intervals. We propose the application of nonparametric statistical tolerance intervals, which captures a specified proportion of the sampled population at a confidence level, to determine cutoff locations for trimming and Winsorization. Extensive simulation studies show that this approach yields better coverage than the g-and-g-times method, even though the latter was not designed as a confidence procedure. Census of Agriculture data since 1982 is analyzed to highlight the impact on statistical summaries regarding farm land. Supplementary materials accompanying this paper appear online.
|Journal||Journal of Agricultural, Biological, and Environmental Statistics|
|State||Accepted/In press - 2023|
Bibliographical noteFunding Information:
This project has been made possible in part by grant number 2020-225193 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.
© 2023, International Biometric Society.
- Convex hull
- Data depth
- Influence function
- Order statistics
- Survey analysis
- Tolerance limits
ASJC Scopus subject areas
- Statistics and Probability
- Environmental Science (all)
- Agricultural and Biological Sciences (miscellaneous)
- Agricultural and Biological Sciences (all)
- Statistics, Probability and Uncertainty
- Applied Mathematics