Abstract
Computational predictions of ligand binding is a difficult problem, with more accurate methods being extremely computationally expensive. The use of machine learning for drug binding predictions could possibly leverage the use of biomedical big data in exchange for time-intensive simulations. This paper reviews current trends in the use of machine learning for drug binding predictions, data sources to develop machine learning algorithms, and potential problems that may lead to overfitting and ungeneralizable models. A few popular datasets that can be used to develop virtual high-throughput screening models are characterized using spatial statistics to quantify potential biases. We can see from evaluating some common benchmarks that good performance correlates with models with high-predicted bias scores and models with low bias scores do not have much predictive power. A better understanding of the limits of available data sources and how to fix them will lead to more generalizable models that will lead to novel drug discovery.
Original language | English |
---|---|
Article number | 129545 |
Journal | Biochimica et Biophysica Acta - General Subjects |
Volume | 1864 |
Issue number | 6 |
DOIs | |
State | Published - Jun 2020 |
Bibliographical note
Funding Information:This research was supported by the Cancer Research Informatics Shared Resource Facility of the University of Kentucky Markey Cancer Center ( P30CA177558 ), the University of Kentucky CCTS KL2TR000116 and 1KL2TR001996-01 grants, and the Markey Women Strong – philanthropy grant.
Funding Information:
This research was supported by the Cancer Research Informatics Shared Resource Facility of the University of Kentucky Markey Cancer Center (P30CA177558), the University of Kentucky CCTS KL2TR000116 and 1KL2TR001996-01 grants, and the Markey Women Strong ? philanthropy grant. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Publisher Copyright:
© 2020 Elsevier B.V.
Keywords
- Drug binding
- Drug discovery
- Machine learning
- Overfitting
ASJC Scopus subject areas
- Biophysics
- Biochemistry
- Molecular Biology