A look inside the black box : Using graph-theoretical descriptors to interpret a Continuous-Filter Convolutional Neural Network (CF-CNN) trained on the global and local minimum energy structures of neutral water clusters

We describe a method for the post-hoc interpretation of a neural network (NN) trained on the global and local minima of neutral water clusters. We use the structures recently reported in a newly published database containing over 5 × 106 unique water cluster networks (H2O)N of size N = 3-30. The structural properties were first characterized using chemical descriptors derived from graph theory, identifying important trends in topology, connectivity, and polygon structure of the networks associated with the various minima. The code to generate the molecular graphs and compute the descriptors is available at https://github.com/exalearn/molecular-graph-descriptors, and the graphs are available alongside the original database at https://sites.uw.edu/wdbase/. A Continuous-Filter Convolutional Neural Network (CF-CNN) was trained on a subset of 500 000 networks to predict the potential energy, yielding a mean absolute error of 0.002 ± 0.002 kcal/mol per water molecule. Clusters of sizes not included in the training set exhibited errors of the same magnitude, indicating that the CF-CNN protocol accurately predicts energies of networks for both smaller and larger sizes than those used during training. The graph-theoretical descriptors were further employed to interpret the predictive power of the CF-CNN. Topological measures, such as the Wiener index, the average shortest path length, and the similarity index, suggested that all networks from the test set were within the range of values as the ones from the training set. The graph analysis suggests that larger errors appear when the mean degree and the number of polygons in the cluster lie further from the mean of the training set. This indicates that the structural space, and not just the chemical space, is an important factor to consider when designing training sets, as predictive errors can result when the structural composition is sufficiently different from the bulk of those in the training set. To this end, the developed descriptors are quite effective in explaining the results of the CF-CNN (a.k.a. the "black box") model.

Medienart:

E-Artikel

Erscheinungsjahr:

2020

Erschienen:

2020

Enthalten in:

Zur Gesamtaufnahme - volume:153

Enthalten in:

The Journal of chemical physics - 153(2020), 2 vom: 14. Juli, Seite 024302

Sprache:

Englisch

Beteiligte Personen:

Bilbrey, Jenna A [VerfasserIn]
Heindel, Joseph P [VerfasserIn]
Schram, Malachi [VerfasserIn]
Bandyopadhyay, Pradipta [VerfasserIn]
Xantheas, Sotiris S [VerfasserIn]
Choudhury, Sutanay [VerfasserIn]

Links:

Volltext

Themen:

Journal Article

Anmerkungen:

Date Revised 16.07.2020

published: Print

Citation Status PubMed-not-MEDLINE

doi:

10.1063/5.0009933

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM312453701