Indirect cross-validation for density estimation

Olga Y. Savchuk, Jeffrey D. Hart, Simon J. Sheather

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

A new method of bandwidth selection for kernel density estimators is proposed. The method, termed indirect cross-validation (ICV), makes use of so-called selection kernels. Least-squares cross-validation (LSCV) is used to select the bandwidth of a selection-kernel estimator and this bandwidth is appropriately rescaled for use in a Gaussian kernel estimator. The proposed selection kernels are linear combinations of two Gaussian kernels and need not be unimodal or positive. A theory is developed showing that the relative error of ICV bandwidths can converge to 0 at a rate of n'1/4, which is substantially better than the n'1/10 rate of LSCV. Interestingly, the selection kernels that are best for purposes of bandwidth selection are very poor if used to actually estimate the density function. This property appears to be part of the larger and well-documented paradox to the effect that "the harder the estimation problem, the better cross-validation performs." The ICV method uniformly outperforms LSCV in a simulation study, a real data example, and a simulated example in which bandwidths are chosen locally. Supplemental materials for the article are available online.

Original languageEnglish
Pages (from-to)415-423
Number of pages9
JournalJournal of the American Statistical Association
Volume105
Issue number489
DOIs
StatePublished - Mar 2010

Bibliographical note

Funding Information:
Olga Y. Savchuk is Visiting Assistant Professor, Binghamton University, Binghamton, NY 13902-6000 (E-mail: osavchuk@math.binghamton.edu). Jeffrey D. Hart is Professor, Department of Statistics, Texas A&M University, College Station, TX 77843-3143 (E-mail: hart@stat.tamu.edu). Simon J. Sheather is Professor and Head, Department of Statistics, Texas A&M University, College Station, TX 77843-3143 (E-mail: sheather@stat.tamu.edu). The authors are grateful to David Scott and George Terrell for providing valuable insight about cross-validation, and to three referees and an associate editor, whose comments led to a much improved final version of our paper. The research of Savchuk and Hart was supported in part by NSF grant DMS-0604801.

Keywords

  • Bandwidth selection
  • Kernel density estimation
  • Local cross-validation
  • Simulation of Bayes risk

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Indirect cross-validation for density estimation'. Together they form a unique fingerprint.

Cite this