Applications of robust regression to “big” data problems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Robust regression methods have many potential applications in big data problems. In this paper, we consider two such applications using publicly available data. The first application looks at modeling taxi fares based on the trip distance of n = 49; 800 taxi rides in New York City on Tuesday January 15, 2013. The second application focuses on modeling the airfare from the miles flown of n = 78; 905 round trip itineraries for single passengers which consisted of 2 direct one-way flights within the contiguous domestic US market on Southwest Airlines in the fourth quarter of 2014. The robust estimates were obtained for both applications using PROC ROBUSTREG in SAS 9.4. In both cases, we find that the confidence intervals around the robust estimates of the parameters in the regression models are very narrow, typically $0.01 or lower. With these confidence intervals being so narrow, one is left with the impression that these robust estimates differ in some meaningful way across at least some of the robust methods. Finally, utilizing findings in Cox (Biometrika, 102:712–716, 2015) we argue that in such applications it is not surprising that the confidence intervals around the robust estimates are very narrow, thus producing the illusion of apparently very high precision.

Original languageEnglish
Title of host publicationRobust Rank-Based and Nonparametric Methods - Selected, Revised, and Extended Contributions
EditorsJoseph W. McKean, Regina Y. Liu
Pages101-120
Number of pages20
DOIs
StatePublished - 2016
EventInternational Conference on Robust Rank-Based and Nonparametric Methods, 2015 - Kalamazoo, United States
Duration: Apr 9 2015Apr 10 2015

Publication series

NameSpringer Proceedings in Mathematics and Statistics
Volume168
ISSN (Print)2194-1009
ISSN (Electronic)2194-1017

Conference

ConferenceInternational Conference on Robust Rank-Based and Nonparametric Methods, 2015
Country/TerritoryUnited States
CityKalamazoo
Period4/9/154/10/15

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2016.

Keywords

  • Big data
  • Illusion of very high precision
  • Robust regression
  • SAS

ASJC Scopus subject areas

  • General Mathematics

Fingerprint

Dive into the research topics of 'Applications of robust regression to “big” data problems'. Together they form a unique fingerprint.

Cite this