Using diverse potentials and scoring functions for the development of improved machine-learned models for protein-ligand affinity and docking pose prediction

© 2021. The Author(s), under exclusive licence to Springer Nature Switzerland AG..

The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein-ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted R2 with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.

Medienart:

E-Artikel

Erscheinungsjahr:

2021

Erschienen:

2021

Enthalten in:

Zur Gesamtaufnahme - volume:35

Enthalten in:

Journal of computer-aided molecular design - 35(2021), 11 vom: 28. Nov., Seite 1095-1123

Sprache:

Englisch

Beteiligte Personen:

Demerdash, Omar N A [VerfasserIn]

Links:

Volltext

Themen:

Binding affinity
Binding-pose prediction
Docking
Journal Article
Ligands
Machine learning
Proteins
Rescoring
Research Support, U.S. Gov't, Non-P.H.S.

Anmerkungen:

Date Completed 07.03.2022

Date Revised 07.03.2022

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1007/s10822-021-00423-4

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM33246606X