Quantifying overfitting potential in drug binding datasets

Brian Davis, Kevin Mcloughlin, Jonathan Allen, Sally R. Ellingson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations


In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.

Original languageEnglish
Title of host publicationComputational Science – ICCS 2020 - 20th International Conference, Proceedings
EditorsValeria V. Krzhizhanovskaya, Gábor Závodszky, Michael H. Lees, Peter M.A. Sloot, Peter M.A. Sloot, Peter M.A. Sloot, Jack J. Dongarra, Sérgio Brissos, João Teixeira
Number of pages14
StatePublished - 2020
Event20th International Conference on Computational Science, ICCS 2020 - Amsterdam, Netherlands
Duration: Jun 3 2020Jun 5 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12139 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference20th International Conference on Computational Science, ICCS 2020

Bibliographical note

Publisher Copyright:
© Springer Nature Switzerland AG 2020.


  • Data overfitting
  • Drug discovery
  • Machine learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Quantifying overfitting potential in drug binding datasets'. Together they form a unique fingerprint.

Cite this