Quantifying overfitting potential in drug binding datasets

Brian Davis, Kevin Mcloughlin, Jonathan Allen, Sally R. Ellingson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.

Original languageEnglish
Title of host publicationComputational Science – ICCS 2020 - 20th International Conference, Proceedings
EditorsValeria V. Krzhizhanovskaya, Gábor Závodszky, Michael H. Lees, Peter M.A. Sloot, Peter M.A. Sloot, Peter M.A. Sloot, Jack J. Dongarra, Sérgio Brissos, João Teixeira
Pages585-598
Number of pages14
DOIs
StatePublished - 2020
Event20th International Conference on Computational Science, ICCS 2020 - Amsterdam, Netherlands
Duration: Jun 3 2020Jun 5 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12139 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Computational Science, ICCS 2020
Country/TerritoryNetherlands
CityAmsterdam
Period6/3/206/5/20

Bibliographical note

Funding Information:
Supported by Lawrence Livermore National Laboratory and the University of Kentucky Markey Cancer Center.

Publisher Copyright:
© Springer Nature Switzerland AG 2020.

Keywords

  • Data overfitting
  • Drug discovery
  • Machine learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'Quantifying overfitting potential in drug binding datasets'. Together they form a unique fingerprint.

Cite this