Abstract
In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.
| Original language | English |
|---|---|
| Title of host publication | Computational Science – ICCS 2020 - 20th International Conference, Proceedings |
| Editors | Valeria V. Krzhizhanovskaya, Gábor Závodszky, Michael H. Lees, Peter M.A. Sloot, Peter M.A. Sloot, Peter M.A. Sloot, Jack J. Dongarra, Sérgio Brissos, João Teixeira |
| Pages | 585-598 |
| Number of pages | 14 |
| DOIs | |
| State | Published - 2020 |
| Event | 20th International Conference on Computational Science, ICCS 2020 - Amsterdam, Netherlands Duration: Jun 3 2020 → Jun 5 2020 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 12139 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 20th International Conference on Computational Science, ICCS 2020 |
|---|---|
| Country/Territory | Netherlands |
| City | Amsterdam |
| Period | 6/3/20 → 6/5/20 |
Bibliographical note
Publisher Copyright:© Springer Nature Switzerland AG 2020.
Funding
Supported by Lawrence Livermore National Laboratory and the University of Kentucky Markey Cancer Center.
| Funders | Funder number |
|---|---|
| University of Kentucky Markey Comprehensive Cancer Center |
Keywords
- Data overfitting
- Drug discovery
- Machine learning
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science