An Approach for Specifying Trimming and Winsorization Cutoffs

Kedai Cheng, Derek S. Young

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Outliers and extreme values are common in the era of big data, especially in the collection of survey data and real analysis. Clearly, care needs to be taken with how such values are treated in the calculation of statistical summaries, such as those involving the sample mean and sample variance. Robust alternatives based on trimming or Winsorization are often employed to mitigate the effect of those outlying points. An aspect critical to these methods, however, is in the determination of the cutoff locations. One classic approach is g-and-g-times trimming/Winsorization, which takes a proportion g off from both tails. However, this method does not carry any confidence statement, such as one finds with the calculation of statistical intervals. We propose the application of nonparametric statistical tolerance intervals, which captures a specified proportion of the sampled population at a confidence level, to determine cutoff locations for trimming and Winsorization. Extensive simulation studies show that this approach yields better coverage than the g-and-g-times method, even though the latter was not designed as a confidence procedure. Census of Agriculture data since 1982 is analyzed to highlight the impact on statistical summaries regarding farm land. Supplementary materials accompanying this paper appear online.

Original languageEnglish
Pages (from-to)299-323
Number of pages25
JournalJournal of Agricultural, Biological, and Environmental Statistics
Volume28
Issue number2
DOIs
StatePublished - Jun 2023

Bibliographical note

Publisher Copyright:
© 2023, International Biometric Society.

Funding

This project has been made possible in part by grant number 2020-225193 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.

FundersFunder number
Silicon Valley Community Foundation
Chan Zuckerberg Initiative

    Keywords

    • Convex hull
    • Data depth
    • Influence function
    • Order statistics
    • Survey analysis
    • Tolerance limits

    ASJC Scopus subject areas

    • Statistics and Probability
    • General Environmental Science
    • Agricultural and Biological Sciences (miscellaneous)
    • General Agricultural and Biological Sciences
    • Statistics, Probability and Uncertainty
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'An Approach for Specifying Trimming and Winsorization Cutoffs'. Together they form a unique fingerprint.

    Cite this