Identifying Pareto-based solutions for regression subset selection via a feasible solution algorithm

Joshua W. Lambert, Gregory S. Hawk

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The concept of Pareto optimality has been utilized in fields such as engineering and economics to understand fluid dynamics and consumer behavior. In machine learning contexts, Pareto-optimality has been used to identify tuning parameters that best optimize a set of m criteria (multi-objective optimization). During the process of regression model selection, data scientists are often concerned with choosing a model which has the best single criterion (e.g., Akaike information criterion (AIC) or R-squared (R2)) before continuing to check a number of other regression model characteristics (e.g., model size, form, diagnostics, and interpretability). This strategy is multi-objective in nature but single objective in its numeric execution. This paper will first introduce a feasible solution algorithm (FSA) and explain how it can be applied to multi-objective problems for regression subset selection. Then we introduce the general framework of Pareto optimality within the regression setting. We then apply the algorithm in a simulation setting where we seek to estimate the first four Pareto boundaries for regression models using two model fit criteria. Finally, we present an application where we use a US communities and crime dataset.

Original languageEnglish
Pages (from-to)277-284
Number of pages8
JournalInternational Journal of Data Science and Analytics
Volume10
Issue number3
DOIs
StatePublished - Sep 1 2020

Bibliographical note

Publisher Copyright:
© 2020, Springer Nature Switzerland AG.

Funding

This work was partially supported by the Kentucky Biomedical Research Infrastructure and Institutional Development Award of Biomedical Research Excellence Grant [P20 RR16481] and Multiple Sclerosis Society [PP-1609-25975]

FundersFunder number
Kentucky Biomedical Research Infrastructure Network Bioinformatics CoreP20 RR16481
National Multiple Sclerosis SocietyPP-1609-25975

    Keywords

    • Feasible solution
    • Multiple
    • Objective
    • Optimal
    • Pareto
    • Regression
    • Subset selection

    ASJC Scopus subject areas

    • Information Systems
    • Modeling and Simulation
    • Computer Science Applications
    • Computational Theory and Mathematics
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'Identifying Pareto-based solutions for regression subset selection via a feasible solution algorithm'. Together they form a unique fingerprint.

    Cite this