In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.
|Title of host publication||Computational Science – ICCS 2020 - 20th International Conference, Proceedings|
|Editors||Valeria V. Krzhizhanovskaya, Gábor Závodszky, Michael H. Lees, Peter M.A. Sloot, Peter M.A. Sloot, Peter M.A. Sloot, Jack J. Dongarra, Sérgio Brissos, João Teixeira|
|Number of pages||14|
|State||Published - 2020|
|Event||20th International Conference on Computational Science, ICCS 2020 - Amsterdam, Netherlands|
Duration: Jun 3 2020 → Jun 5 2020
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||20th International Conference on Computational Science, ICCS 2020|
|Period||6/3/20 → 6/5/20|
Bibliographical noteFunding Information:
Supported by Lawrence Livermore National Laboratory and the University of Kentucky Markey Cancer Center.
© Springer Nature Switzerland AG 2020.
- Data overfitting
- Drug discovery
- Machine learning
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science (all)