Explainable Supervised Machine Learning Model To Predict Solvation Gibbs Energy

Many challenges persist in developing accurate computational models for predicting solvation free energy (ΔGsol). Despite recent developments in Machine Learning (ML) methodologies that outperformed traditional quantum mechanical models, several issues remain concerning explanatory insights for broad chemical predictions with an acceptable speed-accuracy trade-off. To overcome this, we present a novel supervised ML model to predict the ΔGsol for an array of solvent-solute pairs. Using two different ensemble regressor algorithms, we made fast and accurate property predictions using open-source chemical features, encoding complex electronic, structural, and surface area descriptors for every solvent and solute. By integrating molecular properties and chemical interaction features, we have analyzed individual descriptor importance and optimized our model though explanatory information form feature groups. On aqueous and organic solvent databases, ML models revealed the predictive relevance of solutes with increasing polar surface area and decreasing polarizability, yielding better results than state-of-the-art benchmark Neural Network methods (without complex quantum mechanical or molecular dynamic simulations). Both algorithms successfully outperformed previous ΔGsol predictions methods, with a maximum absolute error of 0.22 ± 0.02 kcal mol-1, further validated in an external benchmark database and with solvent hold-out tests. With these explanatory and statistical insights, they allow a thoughtful application of this method for predicting other thermodynamic properties, stressing the relevance of ML modeling for further complex computational chemistry problems.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - volume:64

Enthalten in:

Journal of chemical information and modeling - 64(2024), 7 vom: 08. Apr., Seite 2250-2262

Sprache:

Englisch

Beteiligte Personen:

Ferraz-Caetano, José [VerfasserIn]
Teixeira, Filipe [VerfasserIn]
Cordeiro, M Natália D S [VerfasserIn]

Links:

Volltext

Themen:

059QF0KO0R
Journal Article
Solutions
Solvents
Water

Anmerkungen:

Date Completed 09.04.2024

Date Revised 25.04.2024

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1021/acs.jcim.3c00544

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM36102293X