Abstract
In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.
Original language | English |
---|---|
Title of host publication | Computational Science – ICCS 2020 - 20th International Conference, Proceedings |
Editors | Valeria V. Krzhizhanovskaya, Gábor Závodszky, Michael H. Lees, Peter M.A. Sloot, Peter M.A. Sloot, Peter M.A. Sloot, Jack J. Dongarra, Sérgio Brissos, João Teixeira |
Pages | 585-598 |
Number of pages | 14 |
DOIs | |
State | Published - 2020 |
Event | 20th International Conference on Computational Science, ICCS 2020 - Amsterdam, Netherlands Duration: Jun 3 2020 → Jun 5 2020 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12139 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 20th International Conference on Computational Science, ICCS 2020 |
---|---|
Country/Territory | Netherlands |
City | Amsterdam |
Period | 6/3/20 → 6/5/20 |
Bibliographical note
Publisher Copyright:© Springer Nature Switzerland AG 2020.
Keywords
- Data overfitting
- Drug discovery
- Machine learning
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science