Quantum Mechanics and Machine Learning Synergies: Graph Attention Neural Networks to Predict Chemical Reactivity

03/24/2021 ∙ by Mohammadamin Tavakoli, et al. ∙ University of California, Irvine 70

There is a lack of scalable quantitative measures of reactivity for functional groups in organic chemistry. Measuring reactivity experimentally is costly and time-consuming and does not scale to the astronomical size of chemical space. In previous quantum chemistry studies, we have introduced Methyl Cation Affinities (MCA*) and Methyl Anion Affinities (MAA*), using a solvation model, as quantitative measures of reactivity for organic functional groups over the broadest range. Although MCA* and MAA* offer good estimates of reactivity parameters, their calculation through Density Functional Theory (DFT) simulations is time-consuming. To circumvent this problem, we first use DFT to calculate MCA* and MAA* for more than 2,400 organic molecules thereby establishing a large dataset of chemical reactivity scores. We then design deep learning methods to predict the reactivity of molecular structures and train them using this curated dataset in combination with different representations of molecular structures. Using ten-fold cross-validation, we show that graph attention neural networks applied to informative input fingerprints produce the most accurate estimates of reactivity, achieving over 91 predicting the MCA* plus-minus 3.0 or MAA* plus-minus 3.0, over 50 orders of magnitude. Finally, we demonstrate the application of these reactivity scores to two tasks: (1) chemical reaction prediction; (2) combinatorial generation of reaction mechanisms. The curated dataset of MCA* and MAA* scores is available through the ChemDB chemoinformatics web portal at www.cdb.ics.uci.edu.



There are no comments yet.


page 2

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In general terms, the chemical reactivity of an atom in a molecule is its propensity towards being an electron donor or acceptor in a chemical reaction. Being able to assign reactivity scores to atoms and molecules can be useful to better understand chemical reactions and their mechanisms in different areas such as chemical synthesis, atmospheric chemistry, drug design, and materials sciences. Reaction rates have been measured experimentally for a long time, but this is typically a time-consuming process, which becomes increasingly costly as one tries to explore the most challenging reactions. Moreover, true solution-phase reaction rates are bounded by the rates of molecular diffusion, complicating the quantifying the extremes of reactivity. Herbert Mayr and his colleagues have pioneered the empirical study of chemical reactivity by laboriously measuring the reactivity of the main organic functional groups and deriving corresponding scales of reactivity [mayr2008general, mayr2015quantitative]. However, due to the experimental limitations, their scale covers only a limited range of electrophiles and nucleophiles. An alternative approach to derive reactivity scores is to use quantum mechanical (QM) simulations.

Recently, we used QM with Density Functional Theory (DFT) simulations to investigate this problem for electron donor and electron acceptor functional groups [mood2020methyl, mca2021]. Specifically, we applied DFT to over 100 diverse molecular structures and showed that in general methyl ion affinities, using a solvation model, are highly correlated to the Mayr reactivity scale. However, while using QM is faster than running laboratory experiments and can potentially cover a larger range of electrophiles and nucleophiles, the underlying DFT calculations still take up several hours for a molecule with only twenty atoms. Therefore, here we develop a more efficient approach leveraging the synergies between QM and machine learning [sadowski2016synergies]

, where we first use DFT to produce a substantial training set of chemical reactivity scores and then develop and train machine learning methods to interpolate and extrapolate chemical reactivity scores in real-time. These machine learning methods, in particular deep learning methods, have already been successfully applied to a variety of chemoinformatics problems, including the prediction of molecular properties

[ralaivola05b, lusci2013deep, lusci2015accurate] and reactions [kayala2011learning, kayala2012reactionpredictor, coley2017prediction, fooshee2018deep, de2018molgan, you2018graph, zhou2019optimization, coley2019graph, tavakoli2020continuous]. Training these models may take some time but, once trained, they are orders of magnitude faster than QM calculations and can generalize, making them a suitable complement to time-consuming DFT simulations (Figure 1).

Figure 1: Typical speed differences in computing a given molecular property for a database of 1M molecules using DFT versus machine learning. Assuming QM simulations take approximately five hours per molecular structure, the total processing time for 1M structures is . This processing time can be reduced to a few hours, after training a suitable neural architecture, assuming that inference time in a trained deep learning model is in the range of milliseconds: .

In what follows, we first explain the concept of reactivity and how it correlates with methyl ion affinities. Then we describe the curated dataset we produce using DFT calculations. We then develop different machine learning methods tuned to different molecular representations and interpretations of molecular reactivity and proceed to train, test, and compare them using the curated data set. Finally, we discuss the potential applications of this approach for estimating the relative reactivity of atoms for the tasks of chemical reaction prediction and combinatorial generation of organic mechanisms.

2 Reactivity Scales

Until the pioneering work of Herbert Mayr and his team, the very idea that nucleophilicity or electrophilicity might be quantified on independent scales had eluded chemists. Through a massive experimental and theoretical undertaking, Mayr and his collaborators have made comprehensive and systematic measurements of reaction rates of various electrophiles and nucleophiles in the laboratory. Furthermore, they have shown that the solution-phase nucleophilicity and electrophilicity can be independently quantified using a logarithmic scale that correlates with the free energy of activation, allowing useful predictions of reaction rate constants, below diffusion control, using the now-famous equation: , where is a nucleophile-dependent parameter typically near unity and is the rate constant. The success of this equation centers around a focus on reactions that form bonds to carbon atoms; given the importance of solution-phase organic chemistry in biochemistry and medicine, this is a reasonable restriction [mayr2008general, mayr2015quantitative].

Methyl cation affinity (MCA) is defined as the energy difference resulting from combining a methyl cation with a nucleophile; similarly, methyl anion affinity (MAA) is defined as the energy difference resulting from combining a methyl anion with an electrophile (Figure (a)a). Mayr, Ofial, Zipse, and others have shown that MCA and MAA correlate with the experimentally measured nucleophilicities and electrophilicities on a relatively small range of reactivity [schindele2002relationships, seeliger2008reactions, troshin2011electrophilicities, allgauer2017quantification, bottger2000electrophilic]( orders of magnitude [pellerite1983intrinsic, wei2008methyl]).

Recently, Van Vranken and his collaborators have shown that calculated methyl cation affinities and methyl anion affinities, with the inclusion of a solvation model, (MCA*) and (MAA*) where * denotes the solvation model, are highly correlated with measured nucleophilicity and electrophilicity over a broad range of molecules containing first and second-row atoms [mood2020methyl, mca2021]. They used this correlation to expand the lower and upper ends of the nucleophilicity and electrophilicity scale produced by Mayr. Their work introduced MCA* and MAA* as new metrics to estimate reactivity parameters without carrying laboratory experiments. Additionally, they showed that MCA* and MAA* provide useful reactivity scores in a much broader range of chemical reactivity than previous work (up to orders of magnitude). Since calculating MCA* or MAA* only requires quantum chemistry simulations, this approach is relatively fast compared to experimental approaches. However, it still requires extensive computational resources and time-consuming processes and cannot be used in high-throughput mode, hence the need to develop faster approaches.

The resulting chemical reactivity scores (MCA* and MAA*), formulated as the difference between the energy of reactants and products of the reactions shown in Figure (a)a, can be interpreted in slightly different ways, depending on the entity to which the score is attributed to. Van Vranken et al. interpreted these quantities as the electrophilicity and nucleophilicity of the reacting functional groups [mood2020methyl, mca2021]. However, in cases where only one functional group is going to react with methyl ions, the corresponding values can be attributed to the reactivity of the entire molecule. Another possibility is to attribute the MCA* and MAA* scores to individual atoms. As shown in Figure (a)a, during the reaction with methyl ions, the ion is attaching to one particular atom of the electrophilic or nucleophilic molecule. This atom is labeled as the most reactive atom within the electrophilic or the nucleophilic group, so one can assign the MCA* or MAA* as the relative reactivity of that specific atom. Although all these interpretations are valid and will be further examined, in most of what follows we will use the third attribution at the level of atoms. Throughout the rest of this paper, MCA* and MAA* are referred to as the nucleophilicity (electron donor) and electrophilicity scores (electron acceptor).

3 Curated Dataset of Chemical Reactivities

Following the method for calculating chemical reactivity in mood2020methyl, mca2021, we compute the methyl cation affinities (MCA*) of 1232 nucleophiles and the methyl anion affinities (MAA*) of 1189 electrophiles. These molecules contain simple carbon skeleton structural variations to improve the generalizability of the trained models such as the ones shown in Figure (b)b. The nucleophilic functional groups include: amines, ethers, amide anions, alkyl carbanions, aldehydes, ketones, esters, carboxylic acids, amides, enolates, nitronate anions, diazo compounds, cyanoalkyl anions, imines, nitriles, isonitriles, and bis(cyano)alkyl anions. The electrophilic functional groups include: iminium ions, imines, oxonium ions, aldehydes, ketones, esters, amides, benzyl cations, allyl cations, alkyl cations, carbonyl Michael acceptors, nitrile Michael acceptors, and nitro Michael acceptors. Methyl ion affinities (MCA* and MAA*) were calculated using TURBOMOLE V7.3 at the PBE0(disp)/DEF2-TZVP COSMO() level of theory [li2018kinetics, grimme2011effect]. It must be noted that the MCA* and MAA* scores are correct within a range of three orders of magnitude [mood2020methyl, mca2021], e.g., the reported number for C[CH:1]=[O+]C is 3.29 which means the actual reactivity score is within the range of 0.29 to 6.29.

Figure 2: (a): Schematic definition of MCA and MAA. Nuc = Nucleophile. E = Electrophile. MCA = The negative of the energy difference resulting from combining a methyl cation with a nucleophile. MAA = The negative of the energy difference resulting from combining a methyl anion with an electrophile. (b): A few molecules from the curated dataset and the corresponding reactivity scores shown on two separate scales for nucleophilicity and electrophilicity.

4 Deep Learning to Predict Chemical Reactivity

Deep learning [baldi2021deep] methods have many applications in chemoinformatics, from molecular property prediction [lusci2013deep, lusci2015accurate, feinberg2018potentialnet, tavakoli2020continuous] and optimization [zhou2019optimization, olivecrona2017molecular, de2018molgan, fmodel], to reaction prediction [kayala2011learning, kayala2012reactionpredictor, coley2017prediction, fooshee2018deep, de2018molgan, you2018graph, zhou2019optimization, coley2019graph, schwaller2019molecular], to the acceleration of QM calculations [PhysRevLett.108.253002, hermann2020deep]. Although it takes time to train deep learning methods, at inference time they are fast and tend to generalize well. Deep learning methods in chemoinformatics must be developed in tandem with the underlying chemical representations: while feedforward neural network can be applied to vectorial or tensorial representations of fixed size, recursive or graph neural network must be used in the case of variable size, structured, representations [urban2018inner]. Next, we describe the representations and architectures we use for the problem of predicting electrophilicity and nucleophilicity.

4.1 Representation of Molecular Structures

As previously mentioned, there are several ways of attributing the reactivity score introduced in mood2020methyl, mca2021. This score is defined as the difference between the free energy of the reactants and products in the reactions shown in Figure (a)a

. It is also calculated for the most reactive atom of the most reactive functional group in each molecule. Therefore, one can interpret this as: (1) the reactivity of the atom which is bonding to the methyl ion during the reaction (atomic reactivity); (2) the reactivity of the functional group which contains the atom that is bonding to the methyl ion (group reactivity); or (3) the reactivity of the molecule which is reacting with the methyl ion (molecular reactivity). Using our know-how and some preliminary exploration to avoid looking at all possible combinations of attribution and representations, we converged on the following list of possible representations: (1) informative fingerprint vector representation to learn the atomic reactivity; (2) extended connectivity fingerprint (ECFP) to learn the molecular reactivity; (3) SMILES string text representation to learn the molecular reactivity; and (4) graph representations to learn atomic, functional group, and molecular reactivity. Next, we develop deep learning models congruous with each representation.

4.2 Deep Learning Models

4.2.1 Informative Fingerprint Vector Representation and Model

To perform an atom level prediction, each atom has to be mapped to a vector. The first method to find this mapping is to use chemical features associated with each atom in a molecule. These features fall into two categories: 1) graph-topological; and 2) physical-chemical. Graph-topological features reflect patterns of connectivity and the neighborhood of atoms, as in standard molecular fingerprint representations. The physical-chemical features instead capture properties of the atom itself. Examples of physical-chemical features are the presence and type of filled and unfilled orbitals, electronegativity, and the location of the atom in the periodic table. In this work, we associate a feature vector of length 52 to each atom. This vector corresponds to the concatenation of 44 graph topological features and 8 physical-chemical features. We refer to this vector as the informative fingerprint vector representation of the corresponding atom. Then we train two separate neural networks, one for electrophilicity prediction, and one for nucleophilicity prediction. After a hyperparameter optimization phase carried using Sherpa

[hertel2018sherpa], both resulting networks comprise two hidden layers with 32 and 16 units and one output linear unit for the regression task using the mean squared error. We use dropout [srivastava2014dropout, baldi_dropout_2014]

on the hidden layers with a rate of 0.4. We also use SPLASH activation functions initialized to ReLU activations

[tavakoli2021splash] in the hidden layers. The results of this experiment for both electrophilicity and nucleophilicity are shown in Table 1.

4.2.2 Extended Connectivity Fingerprint Representation and Model

When the reactivity score is attributed to the molecule (molecular reactivity), we use molecular fingerprints to represent the entire molecule as a binary or integer (count) vector. In this work, we use the well-known extended-connectivity fingerprints (ECFP) [rogers2010extended] to train two separate networks for electrophilicity and nucleophilicity prediction. The length of the fingerprints must be adjusted as there is a basic trade-off: longer fingerprints capture more information, however, they increase the risk of overfitting. Considering the length of the fingerprints and the radius as hyperparameters, after some experimentation, we converge on a size of 512 with a radius of four. The same architecture as the one used in the previous section is employed here for both networks, with 64 and 32 units in the hidden layers, followed by a similar linear output unit. We apply regularization (with to the weights.

4.2.3 SMILES String Representation and Model

We also use canonical SMILES strings [weininger1988smiles] to represent the molecules. In order to apply deep learning methods, the SMILES strings must be converted to a numerical format. The most straightforward and widely-used embedding is the character level embedding [liu2017retrosynthetic, ikebata2017bayesian, schwaller2019molecular]. However, it is more efficient to use atomic symbols as the embedding units. Not only does this reduce the extra computations required by atoms represented with multiple characters, but it also provides a clearer separation between pairs of atoms versus atoms represented by multiple characters (e.g. Sc can be seen as either a Sulfur atom connected to an aromatic carbon or a Scandium atom). In this embedding, each atom and special character in a SMILES string is mapped into a high dimensional vector. This can be done through an embedding layer whose weights are learned during the training process. However, because we are using a relatively small training dataset of reactivity scores, we avoid adding extra embedding parameters to the model by using a pre-trained embedding of atoms. We use the trained atom embedding vectors used in fooshee2018deep for an atom classification task within a reaction prediction pipeline. In this case, atoms are mapped onto a 10-dimensional vector space. Figure (a)a shows the t-SNE 2D visualization of the embedded atoms preserving some of their physical-chemical properties and separating them from special characters.

To predict the reactivity as the molecular property (i.e. molecular reactivity), the molecules are represented as SMILES strings. Two neural networks with the same architecture are employed to predict electrophilicity and nucleophilicity. The architecture of the neural networks consists of a pre-trained atom embedding layer followed by a one-dimensional convolution layer with a window size of five. This is followed by a bidirectional LSTM layer with 16 hidden units. Then the output is computed by a linear unit fully connected to the previous layer. Due to the increase in the number of parameters, regularization with is applied for each weight of the convolution and LSTM layers. The results of this experiment are shown in Table 1.

4.2.4 Graph Representation and Model

Finally, there are deep learning methods that can be applied directly to graph-structured data, such as knowledge graphs, social networks, parse trees, and molecules

[lusci2013deep, duvenaud2015convolutional, urban2018inner, schlichtkrull2018modeling, velivckovic2017graph, kearnes2016molecular]. Within these methods, the graph convolutional network (GCN), or outer recursive neural network approach, is particularly suited for processing molecular graphs through an iterative message passing mechanism that aggregates information about each atom’s over increasing larger neighborhoods. Here we follow an approach similar to schlichtkrull2018modeling to model a molecule as a relational graph structure. Each molecule is first represented as an undirected labeled graph , where the nodes in correspond to the atoms and the edges in to the bonds. The labels for the vertices in correspond to atom types (e.g. C, O). The labels for the edges in

correspond to edge types (single, double, triple, aromatic). In the beginning, each node of the graph is associated with a vector representation (initial mapping) carrying information about the node. The edges of the graph can be treated in different ways; they can be mapped to real-valued vectors using learnable embedding weights, or they can be mapped to binary vectors using the one-hot encoding of the bond types. Since we are focusing on organic structures, there are only four types of bonds in the data set, and therefore the one-hot encoding approach is both reasonable and economical. Thus a molecule with

atoms and a feature vector of length for each atom (initial mapping) can be represented by an node-feature matrix together with an

adjacency tensor

(here ). Each row in the node-feature matrix is the vector representation of the corresponding atom. In the adjacency tensor, for a pair of vertices and , the corresponding vector of length four represents the one-hot-encoding of the corresponding bond type. For atoms that are not bonded, the corresponding vector is

. Using this representation, we can apply a graph convolutional neural network to recursively update the representation of each atom. The recursive propagation of information in the convolutional neural network is given by:


Here is the vector representation associated with node at level . denotes the shared weights of the convolution applied from level to level . denotes the neighborhood of vertex , consisting of its immediate neighbors and

is a non-linear transformation. In order to take different bond types into account, we use the approach in

schlichtkrull2018modeling using a different set of weights for each bond type, resulting in the form:


Where denotes the set of nodes connected to node through the edge type

. To train this graph neural network, we optimize a multi-objective loss function. In addition to minimizing the error between the predicted reactivity and the actual value of MCA* or MAA* for the node of interest, we penalize the network whenever the predicted reactivity of other nodes is greater than the predicted reactivity of the node of interest. This can be formulated as follows:


where is the index of the atom which reacts with methyl ions, is the atom representation produced by the last convolutional layer of the GCN (), and is a fully connected linear function from to which outputs the reactivity prediction. The first loss term is the standard least square regression loss for predicting the target value of the MCA* or MAA* masked to the node of interest . The second term looks at all the other nodes and the corresponding reactivity prediction , computes the maximum reactivity difference with respect to node , retains it only in the unwanted case where this difference is positive (through the ReLU function), and applies a square loss with a weighting hyperparameter . In other words, the reactions shown in Figure (a)a have the highest reaction rate among all other plausible reactions with the same set of reactants. For the initial mapping of each atom () we use a one-hot vector encoding of the atom type concatenated with eight physical-chemical features. As can be seen from Equation 2, applying the first level of graph convolution updates the node representation using information from its immediate (one-hop) neighbors. To incorporate information from nodes up to three hops away into an atom’s representation, we apply three layers of graph convolution with node representations of length 38, 24, 16, and 8 chosen after some exploratory experimentation. We also concatenate a count vector of 10 predefined molecular graph connectivity patterns to the output of the top convolutional layer. These predefined patterns have graph lengths greater than three and thus cannot be captured by the three-layer GCN. ReLU activation functions are used within the GCN layers and, after some experimentation, the hyperparameter is set to 0.2. An regularization with hyperparameter is applied to all the weights to avoid any overfitting. The GCN performance is shown in Table 1.

There are a number of graph pooling mechanisms, such as those described in lee2019self, murphy2019relational, that can be used to predict molecular reactivity and functional group reactivity after the graph convolutions. Because of our relatively small training dataset, to avoid the risk of overfitting, we try simple pooling mechanisms that require minimal additional learning steps. In addition to the function

described above, we use an element-wise max-pooling, average-pooling, and the pooling mechanism introduced in

duvenaud2015convolutional which is described in Equation 4, where is the final representation of the entire graph and is the number of input nodes to the pooling operation. When predicting molecular reactivity, ; when predicting group reactivity, is the number of atoms within the functional group which reacts with the methyl ions. As previously seen, is a vector of size representing the node in the top convolutional layer , and is the shared pooling weight matrix of size . Since this pooling is known to work better with relatively longer node representations [duvenaud2015convolutional], we set . In our experiments, however, we found that for the prediction of molecular reactivity and group reactivity max-pooling and average-pooling work slightly better than the mechanism introduced in duvenaud2015convolutional.


4.2.5 Graph Attention Networks

Another recently introduced type of neural network which can operate on graph data is the Graph Attention Network (GAT) [velivckovic2017graph]. In this approach, node representations are updated by a weighted message passing scheme between neighbors, where the weights are calculated through an attention mechanism as follows:


Here is the number of attention mechanisms (also known as attention heads [velivckovic2017graph]), ranges from 1 to , denote the shared weights of layer and attention mechanism , and represents a shallow neural network with a leaky-ReLU output activation mapping the input vector into a scalar. The symbol represents the concatenation operation and , as the updated node representation, is the concatenation of the output of different attention mechanisms. Through our experiments, we find that using a larger node representation at the last layer of the GAT together with averaging (instead of concatenation) works better. We use three GAT layer representations of size 38, 24, 16, and 12 and concatenate the representation of the top layer with the same count vector of predefined patterns used with the GCN. The GAT is trained using the same loss function as described in Equation 3. We experimented with several variations in terms of the number of attentions heads and bond representations and report only the best results (), although the difference observed were minor. The GAT results for the prediction of reactivity are shown in Table 1.

5 Results

5.1 Data

The full curated data set of 2421 molecules and their scores, covering 53 orders of magnitude of chemical reactivity, is available for download from the UCI ChemDB chemoinformatics web portal at www.cdb.ics.uci.edu. Figure 3 illustrates the range of electrophilicity and nucleophilicity in the curated dataset.

Figure 3: The range of electrophilicity and nucleophilicity in the curated dataset, covering orders of magnitude in each case, together with five examples of electrophiles and nucleophiles.

5.2 Comparative Analysis and Predictions

Table 1 shows the results obtained with four different optimized neural networks with the corresponding molecular representations. Each number in Table 1

corresponds to the average ten-fold cross-validation accuracy, together with the corresponding standard deviation, for both electrophilicity and nucleophilicity predictions. As previously mentioned, the reactivity numbers are valid within three orders of magnitude. Thus, a prediction is considered to be correct if the predicted reactivity is within three orders of magnitude of the actual MCA* or MAA*. Although the informative vector representation of atoms shows a good performance, it has several downsides and limitations. The informative features are specifically tailored for our dataset and are not generalizable to unseen molecular structures. Also, there is no guarantee that the extracted features are the best representation of atoms. Since human chemists have designed these features based on their experience, another chemist might come up with a different set of features. On the other hand, for the SMILES representation and the corresponding networks, although there are no manually tailored features, these networks do not show comparable performance in comparison to other representations. The reasons for this poor performance might be the implicit rules of writing SMILES string which convert the molecular graphs into text representations. For instance, the long dependency between two parentheses that specify a branch of atoms might not be captured with recurrent neural networks such as LSTMs. Finally, the graph representation of molecular structure does not have any of the aforementioned drawbacks and the corresponding models produce the best results. Figure


illustrates the performance of the best graph attention model for both electrophilicity and nucleophilicity predictions.

Representation Atomic Group Molecular
Electrophilicity Informative fingerprint 91.04 - -
ECFP - - 80.21
SMILES - - 73.66
GCN 90.03 86.97 86.24
GAT 91.17 87.66 87.93
Nulecophilicity Informative fingerprint 92.01 - -
ECFP - - 81.04
SMILES - - 72.59
GCN 90.91 86.68 87.42
GAT 92.14 87.03 87.24
Table 1: Accuracy results for the prediction of both electrophilicity and nucleophilicity. All accuracy figures and error bars (standard deviations) are obtained by averaging over a ten-fold cross validation experiment. The attention graph neural network outperforms the other models and representations.
Figure 4: The best performance of GAT models for (a) electrophilicity and (b) nucleophilicity predictions. The red curve is the actual electrophilicity and nucleophilicity of the test samples and the blue stripe is the predicted reactivity, plus or minus three orders of magnitude.
Figure 5: (a): A two-dimensional t-SNE visualization of the numerical embedding for atom symbols and special characters found in SMILES strings. Special characters are clustered together and far from the atom symbols. The embeddings of atoms with similar properties are close to each other. (b): An example of a chemical reaction that can be problematic for an automated reaction prediction system. This reaction has two potential electron donors, marked in red. While both donors are reasonable, the nitrogen’s reactivity score is considerably smaller than the carbon’s reactivity score.

These results provide at least a proof-of-concept that the high-throughput prediction of the MCA* and MAA* scores is feasible. Next, we demonstrate how such a system could enable other tasks, such as chemical reaction prediction and combinatorial generation of chemical reaction mechanisms.

5.3 Chemical Reaction Prediction

In recent years, several deep-learning-based methods have been introduced to predict the outcome of chemical reactions under certain conditions. These systems operate primarily either on string (SMIRKS) representations [schwaller2019molecular], or molecular graph representations [kayala2011learning, kayala2012reactionpredictor, fooshee2018deep, coley2017prediction, do2019graph]. All the molecular graph-based methods seek to identify the most reactive atoms within the reactants. For instance, coley2017prediction

identify the most reactive atoms (i.e. reaction centers) using a binary classifier that classify atoms as members of the reaction center or not.

fooshee2018deep focus on reaction mechanisms by identifying electron sources and sinks and ranking the corresponding pairings. Although these methods yield reasonably successful reaction predictors, they are not always able to accurately identify the most reactive atoms, especially when multiple such atoms appear in one reaction. The method presented here for estimating reactivity scores for electrophiles and nucleophiles could be used to address some of these problems by using the scores to rank the atoms. Such ranking may accurately identify reaction centers and electron donors and acceptors, bypassing or complementing the complex methods described, for instance, in coley2017prediction, fooshee2018deep. An example of this approach is depicted in Figure (b)b. For this example, a chemical reaction predictor such as fooshee2018deep might predict both the nitrogen (amine group) and the highlighted carbon as the potential electron donors (since both atoms are labeled as electron donors in the training set of such system). However, our predicted nucleophilicity score for the carbon is higher than the reactivity score predicted for the nitrogen (+21.64 vs +9.71), leading to the correct identification of the electron donor and the corresponding reaction.

Figure 6: (a): The four possible reactions corresponding to the first row of Table 2 [; ; ; ]. (b): Likewise, the four possible reactions corresponding to the second row of Table 2. In each figure, the reaction predicted using the reactivity scores is depicted in green, whereas the remaining three less plausible reactions are depicted in red. These implausible reactions can be automatically filtered out during the process of combinatorial reaction generation.
Templates Substituents
(donor) (acceptor) (donor) (acceptor)
N[R] C[CH+][R] O=C([R])NC [R]C(OC)=O +11.47 +28.21
C=C([R])[O-] C/[N+](C)=C\[R] N#C[R] O=C([R])N(C)C +7.6 + 18.48
Table 2: Two examples of combinatorial reaction mechanisms. and correspond to electron donor and electron acceptor templates respectively. Similarly, and correspond to electron donor and electron acceptor substituents respectively. and denote the predicted electrophilicity and nucleophilicity values. The last two columns show the difference between the reactivity of templates and substituents for both electron donors and electron acceptors. The positive numbers indicate that the predicted reactivity of the templates is higher than the predicted reactivity of the substituents (see text).

5.4 Combinatorial Generation of Chemical Reaction Mechanisms

fooshee2018deep introduced a method to combinatorially augment a training set of reactions. This method uses two fixed scaffolds respectively containing one electron donor group and one electron acceptor group (also called templates) and then combinatorially varies the decorative atoms (also called substituents) attached to the templates within realistic chemical constraints. Following this process, one can generate large numbers of elementary reactions covering a range of fundamental reaction classes. The most important constraint to enable this method to generate plausible reactions is that the atoms within the attaching substituent functional groups must be less reactive than the template atoms. Automatically enforcing this constraint requires a ranking of atoms based on their reactivity within different functional groups, and this can be done using the scales proposed here. A demonstration is given in Table 2 and Figure 6 showing that the proposed scales can be used to filter out the non-plausible combinatorially generated mechanisms. The first row of Table 2 and Figure (a)a consider four functional groups in two reactants. The higher nucleophilicity of the amino group over the carboxamide and the higher electrophilicity of the carbocation over the ester carbonyl correctly predict the attack of the amino group on the carbocation (the green reaction in Figure (a)a). Similarly, the second row of Table 2 and Figure (b)b consider four other functional groups in two more reactants. The higher nucleophilicity of the enolate relative to the nitrile and the higher electrophilicity of the iminium ion over the carboxamide carbonyl group correctly predicts a Mannich reaction involving the addition of the enolate to the iminium ion (the green reaction in Figure (b)b).

6 Conclusion

Methyl cation affinity and methyl anion affinity have been shown to be highly correlated to the reactivity of atoms in functional groups over a broad range of organic chemistry. Leveraging this correlation, we used DFT calculations to curate a dataset of relative reactivity scores for 2,421 electrophilic and nucleophilic functional groups covering 53 orders of magnitude of chemical reactivity. This curated dataset is available to the community and was used to train several deep neural networks, with different representations, to estimate reactivity. Through experiments, we have shown that graph attention neural networks outperform other methods and representations and can accurately estimate the reactivity with a ten-fold cross-validation accuracy of 92% showcasing another synergistic application of QM and Machine Learning methods

In the future, it may be useful to incorporate the experimentally validated reactivity parameters from the Mayr database and to further expand the curated dataset of DFT-derived reactivity scores (MCA* and MAA*) to a broader range of molecular structures. All the proposed methods in this work are trained using the 2421 structures with decent coverage of basic functional groups over 53 orders of magnitude of chemical reactivity. However, the method in mca2021, mood2020methyl for calculating chemical reactivity is applicable to molecular structures with a chemical reactivity covering 180 orders of magnitude. Thus, there is significant room for curating larger data sets and training more refined machine learning systems to predict chemical reactivity, as well as using the machine learning results to guide the DFT calculations towards the most informative regions of chemical reaction space.

7 Acknowledgement

Work in part supported by NSF grants 1633631 and 195811 to DVV and PB.