An Atomistic Fingerprint Algorithm for Learning Ab Initio Molecular Force Fields

09/26/2017 ∙ by Yu-Hang Tang, et al. ∙ 0

Molecular fingerprints, i.e. feature vectors describing atomistic neighborhood configurations, is an important abstraction and a key ingredient for data-driven modeling of potential energy surface and interatomic force. In this paper, we present the Density-Encoded Canonically Aligned Fingerprint (DECAF) fingerprint algorithm, which is robust and efficient, for fitting per-atom scalar and vector quantities. The fingerprint is essentially a continuous density field formed through the superimposition of smoothing kernels centered on the atoms. Rotational invariance of the fingerprint is achieved by aligning, for each fingerprint instance, the neighboring atoms onto a local canonical coordinate frame computed from a kernel minisum optimization procedure. We show that this approach is superior over PCA-based methods especially when the atomistic neighborhood is sparse and/or contains symmetry. We propose that the `distance' between the density fields be measured using a volume integral of their pointwise difference. This can be efficiently computed using optimal quadrature rules, which only require discrete sampling at a small number of grid points. We also experiment on the choice of weight functions for constructing the density fields, and characterize their performance for fitting interatomic potentials. The applicability of the fingerprint is demonstrated through a set of benchmark problems.

READ FULL TEXT VIEW PDF

Authors

page 14

page 17

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Molecular Dynamics (MD) simulations have been widely used for studying atomistic systems, e.g. proteins and catalysts, due to their ability to precisely capture transient events and to predict macroscopic properties from microscopic details [1, 2]. In its most prevalent implementation, the trajectory of an atomistic system is integrated in time according to the Newton’s law of motion using forces calculated as the negative gradient of a Hamiltonian, whose functional form and parameters are collectively referred to as a force field [3, 4, 5]. Traditionally, the pairwise and many-body terms that comprise a force field are derived empirically by fitting to quantum mechanical calculations and experimental data.

Three properties directly relate to the applicability of a force field: accuracy, transferrability, and complexity [6, 7]

. Over the years, a large number of force fields have been developed, each carrying a particular emphasis over these three properties. However, the combinatorial complexity of atomistic systems can easily outpace force field development efforts, the difficulty of which explodes following the curse of dimensionality 

[8]. A deceptively simple system that can demonstrate the situation is water, a triatomic molecule with a well-characterized molecular structure. In fact, all common water models, such as SPC-E, TIP3P, and TIP4P, have only succeeded in reproducing a small number of structural and dynamical properties of water due to the difficulty in modeling strong intermolecular many-body effects such as hydrogen bonding and polarization [9, 10].

In lieu of a force field, quantum mechanical (QM) calculations can be employed straightforwardly to drive molecular dynamics simulations. The method achieves significantly better accuracy and transferrability by solving for the electronic structure of the system. However, the computational complexity of QM methods is at least cubic in the number of electrons, and consequently the time and length scales accessible by QM-driven molecular dynamics are severely constrained.

Assuming that there is smoothness in the potential energy surface of the atomistic system, one possible strategy to accelerate QM-driven molecular dynamics is to use QM calculations on only a subset of the time steps, and to interpolate for similar atomic configurations 

[11, 12, 13, 14]. A schematic overview of the process is given in Figure 1, which is enabled by the recent development of high-dimensional nonlinear statistical learning and regression techniques such as Gaussian process regression [15]

and artificial neural networks 

[16].

Figure 1:

In the pipeline of machine learning-driven molecular computations, atomistic neighborhood configurations are transformed into feature vectors, called fingerprints, and used to train non-linear regression models.

This paper focuses on a particular aspect of the machine-learning-driven molecular computation pipeline, i.e. fingerprint algorithms, whose importance arises naturally from the aforementioned regression protocol. A fingerprint is an encoding of an atomistic configuration that can facilitate regression tasks such as similarity comparison across structures consisting of variable numbers of atoms and elements. As has been pointed out previously [12], a good fingerprint should possess the following properties:

  1. It can be encoded as a fixed-length vector so as to facilitate regression (particularly for artificial neural networks).

  2. It is complete, i.e. different atomistic neighborhood configurations lead to different fingerprints and vice versa, and the ‘distance’ between the fingerprints should be proportional to the intrinsic difference between the atomistic neighborhood configurations.

  3. It is continuous with regard to atomistic coordinates, and the change in fingerprint should be approximately proportional to the structural variation as characterized by, for example, some internal coordinates.

  4. It is invariant under permutation, rotation, and translation.

  5. It is computationally feasible and straightforward to implement.

Before we proceed to the details of our work, we will first briefly review several fingerprints that are closely related to our work, i.e. the Smooth Overlap of Atomic Positions (SOAP) kernel [12], the Coulomb matrix [17], and the Graph Approximated Energy (GRAPE) kernel [18].

Smooth Overlap of Atomic Positions (SOAP):

The SOAP kernel is built on the idea of representing atomistic neighborhoods as smoothed density fields using Gaussian kernels each centered at a neighbor atom. Similarity is measured as the inner product between density fields, while rotational invariance is achieved by integrating over all possible 3D rotations, which can be performed analytically using the power spectrum of the density field. In fact, our fingerprint algorithm is inspired by this idea of treating atoms as smoothed density fields. However, we take a different approach to endorse the fingerprint with rotational invariance, and use the Euclidean distance instead of inner product as a distance metric.

Coulomb Matrix:

The practice of using graphs to represent atomistic neighbor configurations was first implied by the Coulomb matrix, and later further formulated in the GRAPE kernel, where the diffusion distance was proposed as a similarity measure between different local chemical environments [19]. The idea is to construct an undirected, unlabeled graph with atoms serving as the vertices and pairwise interactions weighting the edges. For example, the Coulomb matrix can be treated as a physically-inspired Laplacian matrix [20]

(1)
(2)
(3)

where the degree matrix encodes a polynomial fit of atomic energies to the nuclear charge, while the adjacency matrix

corresponds to the Coulombic interactions between all pairs of atoms. Due to the use of only relative positions between atoms in the adjacency matrix, the Coulomb matrix is automatically invariant under translation and rotation. However, the matrix itself is not invariant under permutation, as swapping the order of two atoms will result in an exchange of the corresponding columns and the rows. To address this, the sorted list of eigenvalues of the Coulomb matrix can be used instead as a feature vector, while an

norm can be used as a distance metric. In practice, due to the fact that the number of neighbor atoms may change, the shorter eigenvalue list is padded with zeros in a distance computation.

Graph Approximated Energy (GRAPE):

The GRAPE kernel evaluates the simultaneous random walks on the direct product of the two graphs representing two atomistic neighborhood configurations. Permutational invariance is achieved by choosing a uniform starting and stopping distribution across nodes of both graphs. However, the cost of distance computation between two graphs scales as with a one-time per-graph diagonalization cost of .

In the sections below, we present our new fingerprint algorithm, namely the Density-Encoded Canonically Aligned Fingerprint (DECAF). The paper is organized as follows: in Section 2, we introduce a robust algorithm that can determine canonical coordinate frames for obtaining symmetry-invariant projections; in Section 3, we present numerical recipes to use smoothed atomistic density fields as a fingerprint for molecular configuration; in Section 4, we demonstrate the capability of the fingerprint via examples involving the regression of atomistic potential energy surfaces; in Section 5, we discuss the connection between our algorithm and previously proposed ones; we conclude with a discussion in Section 6.

2 Localized Canonical Coordinate Frame for Rotationally Invariant Description of Atomistic Neighborhood

2.1 Kernel Minisum Approach

To improve model generalization while minimizing data redundancy, a fingerprint algorithm should be able to recognize atomistic structures that differ only by a rigid-body transformation or a permutation of atoms of the same element, and to extract feature vectors invariant under these transformations. As summarized in Table 1, a variety of strategies have been successfully employed by common fingerprint algorithms to achieve rotational invariance.

However, these approaches do not provide a means for the acquisition of vector-valued quantities in a rotational invariant form. One approach is to only acquire and interpolate the potential energy, a scalar quantity, and then take the derivative of the regression model. This approach, however, triggers the need for methods to decompose the total energy among the atoms, which is a property of the entire system rather than individual atoms [21].

Another approach proposed by Li et al. [13, 22] is to project vector quantities onto a potentially overcomplete set of non-orthogonal basis vectors obtained from a weighted sum of the atomic coordinate vectors:

(4)

However, the approach may suffer from robustness issues. For example, all of the generated with different will point in the same direction if the radial distance of the atoms are all equal. Further, the configuration with 4 atoms at leads to

(5)
(6)

Thus, if gets close to zero, will always point toward either or , even if the vector quantity of interest may point in other directions.

Fingerprint Invariance
Translation Permutation Rotation
Coulomb matrix[17] relative distance sorting eigenvalues all vs. all graph
Behler[11] relative distance summation ignoring angular information
SOAP[12] relative distance summation integrating over all rotations
GRAPE[18] relative distance uniform distribution uniform distribution
Table 1: Comparison of strategies used by fingerprint algorithms to obtain feature vectors which are invariant under translation, permutation, and rotation.

Here, we present a robust kernel PCA-inspired algorithm for the explicit determination of a canonical coordinate frame, within which the projection of the atomistic neighborhood is invariant under rigid-body rotation. Furthermore, the canonical coordinate frame can be directly used to capture vector-valued quantities in a rotational-invariant form. Given atoms with position , we first formulate the PCA algorithm as an optimization problem where we seek a unit vector that maximizes the sum of the projections:

(7)
(8)

where is the distance from the origin to atom , is the unit vector pointing toward atom , respectively. The optimization process can only uniquely determine the orientation of a projection vector up to a line, because

. As a consequence, further heuristics are needed to identify a specific direction for the PCA vectors.

To overcome this difficulty, we generalize the term into a weight function of radial distance and the term into a bivariate kernel function between two vectors. We then attempt to seek a unit vector that minimizes the kernel summation:

(9)

In particular, we have found a square angle (SA) kernel and an exponentiated cosine (EC) kernel that perform well in practice:

(10)
(11)

As shown in Figure 2, both kernels are minimal when and are parallel, and monotonically reach maximum when and are antiparallel. Intuitively, optimizing the minisum objective function generated by the SA kernel will yield a vector that, loosely speaking, bisects the sector occupied by the atoms. The EC kernel exhibits very similar behavior but leads to a smoother objective function. As shown in Figure 2, this allows for the determination of a projection vector without ambiguity, even if the atom configuration contains perfect symmetry.

Figure 2: Shown here is an illustration of the minisum algorithm that determines projection vectors for rotational invariant description of atomistic neighbor configurations. Black dots represent atoms which all carry equal importance. The vectors that point from the origin to the atoms are used as the input to bivariate kernels to compute the minisum objective function, which are drawn in solid lines. For reference, the negated values of the PCA objective function are drawn in dashed lines. Projection vectors are obtained by finding the unit vector that minimizes the objective function.

A major advantage of the kernel minisum approach versus norm-based PCA, lies in its 1) robustness in the presence of structural symmetry; and 2) continuity of the resulting principal axes with respect to angular movement of the input data. As shown in Figure 3, kernel minisum is particularly suitable for atomistic systems where strong symmetries are common and the continuity against angular movement is desired. The minisum framework can also be used with other customized kernels to suit for the characteristics of specific application scenarios.

Figure 3:

A comparison of the orthogonal bases obtained using principal components analysis (PCA) and kernel minisum with the square-angle (MSA) kernel.

(A) The PCA algorithm, used in conjunction with the norm, fails to extract a principal axis that rotates with a system that exhibits planar symmetry. Both MSA and PCA with the norm can accommodate this scenario. (B) Both and principal axes change orientation abruptly when the system undergoes a slight angular motion. In contrast, the MSA output is continuous with regard to this movement as it always bisects the angle formed by the atoms and the origin. (C) Loosely speaking, the MSA axis points at the majority direction of the atoms if only a single cluster is present within the cutoff distance, but bisects the angle between two atom clusters. This is different from PCA-based results and should deliver robust rotational invariance as well as continuity. Arrows are drawn at different lengths to improve visual clarity in situations of overlapping. They are to be understood as unit vectors. De-trending is not performed because the radial distances of the atoms carry physically significant information.

2.2 Solving the Kernel Minisum Optimization Problems

The optimization problem can be solved very efficiently using a gradient descent algorithm as detailed below.

Square Angle:

The objective function of the minisum problem using the square angle (SA) kernel is

(12)

The gradient of with respect to is

(13)

Note that is singular when . This can be treated numerically by replacing the removable singularities at with the left-limit , while truncating the gradient at a finite threshold near the poles at .

A local minimum can be iteratively searched for with gradient descent while renormalizing after each iteration. Moreover, due to the locally quadratic nature of the objective function, we have found that the Barzilai-Borwein algorithm [23] can significantly accelerate the convergence at a minimal cost. The algorithm is presented in Alg. 1.

1:function MinSquareAngle(, , )
2:     
3:     repeat
4:         Compute gradient using Eq. 13.
5:         Obtain tangential component of gradient
6:         if at step 0 then
7:               Small initial step size for bootstrapping
8:         else
9:               Adaptive subsequent steps by Barzilai-Borwein
10:         end if
11:         Save as , save as
12:         Update and normalize to unit length
13:     until 
14:     return
15:end function
Algorithm 1 Gradient descent for solving the square angle minisum problem.

Exponentiated Cosine:

The objective function of the minisum problem using the exponentiated cosine (EC) kernel is:

(14)

The gradient of with respect to is

(15)

The gradient contains no singularity. However, it is not always locally quadratic or convex. This can cause the Barzilai-Borwein algorithm to generate negative step sizes and consequently divert the search towards a maximum. Luckily, this can be easily overcome by always using the absolute value of the step size generated by the Barzilai-Borwein algorithm. Such enforcement prevents the minimization algorithm from going uphill. The complete algorithm is given in Alg. 2.

1:function MinExpCosine(, , )
2:     
3:     repeat
4:         Compute gradient using Eq. 15.
5:         Obtain tangential component of gradient
6:         if at step 0 then
7:               Small initial step size for bootstrapping
8:         else
9:               Adaptive subsequent steps by Barzilai-Borwein
10:         end if
11:         Save as , save as
12:         Update and normalize to unit length
13:     until 
14:     return
15:end function
Algorithm 2 Gradient descent for solving the exponentiated cosine minisum problem.

As shown in Table 2, both Alg. 1 and Alg. 2 converge quickly and consistently across a wide range of representative point configurations commonly found in molecular systems. However, the gradient descent method can only find local optima. Thus, multiple trials should be performed using different initial guesses to ensure that a global minimum can be located.

Kernel Square Angle Exponentiated Cosine
Configuration Itrs./Guess Guesses Itrs./Guess Guesses
Single point 8.8 1 7.9 1
Two points, angle 7.1 1 7.7 1
Two points, angle 6.1 1 6.3 1
Two points, angle 6.0 1 5.6 1
Planar 6.2 1 6.3 1
Planar 5.6 1 6.9 1
Tetrahedra 10.6 1 7.7 1
Octahedra 11.7 1 9.4 1
Improper 17.9 1 23.1 1
Improper 14.7 1 18.0 1
2D random 10 points 7.2 1.1 8.9 1
3D random 50 points 16.9 1.2 14.4 1
Table 2: Listed here is the number of iterations and initial guesses used by the gradient descent algorithm to find an local optimum of the kernel minisum problems. The numbers are averaged over 500 repetitions, and the convergence criterion is . In cases where the iterative algorithm does not converge within 64 iterations, the optimization will be restarted with a new guess.

2.3 Complete Set of Orthogonal Projection Vectors as A Canonical Coordinate Frame

In 3D, a complete set of orthogonal bases can be found greedily using the protocol as described in Alg. 3. Specifically, we use the globally optimal solution of the minisum optimization problem as the first basis , and the constrained optimal solution in a plane orthogonal to as the second basis . Special care must be taken for determining the third basis

, as the only degree of freedom now is its sign due to the orthogonality constraint. The straightforward approach of choosing the direction that gives the smaller objective function value may fail, for example, when the system contains improper rotational symmetry. In that case,

and are interchangeable and both perpendicular to the rotation-reflection axis. As a result, the two candidates of will both align with the rotation-reflection axis and are thus indistinguishable by kernel minisum. However, the projection of the atoms into the two seemingly equivalent coordinate frames are not identical, but rather mirror images of each other. Fortunately, this can be addressed by choosing the direction of the half-space, as created by the plane , that yields the smaller kernel objective function between the bisector of and versus the points lies in that half-space. This rule can also handle general situations with/without symmetry.

1:function GetCanonicalProjection3D(, )
2:      global minimum of Minisum(, ) using multiple runs of Alg. 1 or Alg. 2
3:     if  contains only 1 point then
4:         construct arbitrary
5:         
6:     else
7:          global minimum of Minisum(, ) using multiple runs of Alg. 1 or Alg. 2, subjecting to the constraint that the probe vector and the gradient are all
8:         
9:         if  then
10:              
11:         else
12:              
13:         end if
14:     end if
15:     return , ,
16:end function
Algorithm 3 The procedure for determining a canonical coordinate frame using kernel minisum.

It is difficult to prove global uniqueness of the kernel minisum solution given the non-convex nature of the exponentiated cosine and square angle kernels. In fact, it seems that the only kernel that allows analytical proof of solution uniqueness is , whose solution simply corresponds to the weighted center of mass of the neighbor atoms. Unfortunately, this simple kernel is not robust against reflectional and rotational symmetry. Luckily, the rare cases where two global optimal solutions do coexist can be safely captured by the repeated searching procedure starting from different seeds. Thus, a fingerprint can be extracted using each of the resulting coordidate frame. This may mildly increase the size of the training set, which could even be advantagenous when training data is scarce.

3 Density-Encoded Canonically Aligned Fingerprint

3.1 Density Field and Approximation of Volume Integral

Figure 4: (A) Two 1D density profiles, and , are generated from two different atomistic configurations using atom-centered smoothing kernel functions. The ‘distance’ between them is measured as the norm of their pointwise difference, which corresponds to the orange area in the middle plot. (B) Shown here is a 2D density field using smoothing kernels whose widths depend on the distances of the atoms from the origin. Darker shades indicate higher density.

The local density field around a point is formulated as a superimposition of smoothing kernel functions each centered at a neighbor atom with relative displacement with regard to and within a cutoff distance :

(16)

This density field, as has been pointed out previously [12], may be used as a fingerprint of the local atomistic environment. Here, we assume that the smoothing kernel takes the form of a stationary Gaussian . We also assume that the density scaling function , which ensures the continuity of the density field when atoms enter or exit the cutoff distance, is a bell-shaped function with compact support. Further discussion on both and can be found in Section 3.2 and Section 3.3, respectively.

To achieve rotational invariance, we project the atom coordinates into the canonical coordinate frame as determined by the kernel minisum algorithm, when generating the density field:

(17)

Depending on the specific application, may not necessarily overlap with any of the . Scalar properties can be acquired directly from the target atom, while vector-valued properties, such as force, can be acquired and interpolated in the local orthogonal coordinates as .

We define the distance between two density fields and as a weighted volume integral of their pointwise difference:

(18)
(19)

The weight function provides additional flexibility for emphasizing particular regions of the atomistic neighborhood. It could play an important role for fitting properties with steep gradients, e.g. the repulsive part of the Lennard-Jones potential.

We now introduce an optimal quadrature rule to approximate the integral in Eq. 19 in a computationally tractable manner. A quadrature rule is a numerical recipe in the form , which numerically approximates a definite integral using only discrete evaluations of the integrand. To determine the quadrature nodes and weights, we decompose the volume integral in Eq. 19 into a surface integral over spherical shells and a 1D integral along the radial direction:

(20)

The surface integral can be optimally approximated using the Lebedev quadrature rule [24]:

(21)

where , , and are the weights, positional unit vectors, and number of the Lebedev nodes, respectively. The radial integral fits well into the generalized Laguerre-Gauss quadrature formula with weight function [25]:

(22)

where , , and are the weights, coordinates, and number of the Laguerre nodes, respectively. Combining Eq. 2022, a composite quadrature rule can be generated consisting of several spherical layers of nodes. As shown in Figure 5, the radial coordinates of the quadrature nodes are determined by the Laguerre quadrature nodes, while the angular positions are determined by the Lebedev quadrature nodes, respectively. This composite quadrature formula translates the 3D volume integral into a summation over discrete grid points:

(23)

Using the right hand side of Eq. 23 to replace the integral in Eq. 19, and use the multi-index notation to enumerate over the quadrature nodes located at with weights , we obtain the final discretized formula for computing the distance between the fingerprints:

(24)

For quick reference, we tabulated in Appendix the values for and in the Laguerre quadrature of up to 6 points, and the values for , in the Lebedev quadrature of up to 50 points.

In addition, the quadrature nodes could be radially scaled such that the outer most nodes lie at a radius within some cutoff distance . This allows us to fit a Laguerre quadrature of any order within an arbitrary cutoff distance. The scaled quadrature rule is given by:

(25)
(26)

Since the scaling is simply constant among all nodes, it can be safely omitted in many regression tasks where only the relative distance between the fingerprints are of significance.

Figure 5: (A) Laguerre quadrature nodes with order 1 to 5, normalized by the reciprocal of the largest node onto the unit interval. (B) The , , , and class of Lebedev grid points on a unit sphere. (C) The DECAF molecular fingerprint essentially comprises of a grid of quadrature nodes that optimally samples the density field induced by the neighbor atoms. Shown here is an example of one such composite quadrature grid which combines a 3-node Laguerre quadrature rule with three layers of Lebedev quadrature nodes of 3rd, 5th, and 7th order, respectively.

3.2 Radial Weight Functions

In this section, we examine two radial weight functions that can be used to fine-tune the density field fingerprint: the density scaling function as appears in Eq. 17 and the weight of integral as appears in Eq. 19.

Driven by the interest of reducing computational cost, we would like to use a cutoff distance to select atoms involved in constructing the density field. However, it is important to ensure that atoms will enter and exit the neighborhood smoothly. This naturally requests that the contribution of an atom to be zero outside of the cutoff, and to increase continuously and smoothly when the atom approaching entrance. Correspondingly, should: 1) become unity at the origin; 2) smoothly approach zero at the cutoff distance; and 3) be twice-differentiable to ensure the differentiability of regression models based on the fingerprint. Candidates satisfying the above conditions include, for example, a tent-like kernel

(27)

and a bell-shaped polynomial kernel with compact support

(28)

as detailed in Appendix.

The approximation of the radial integral using a Laguerre quadrature requires that the integrand, i.e. the pointwise difference between the atomistic density fields, decays sufficiently fast beyond the outermost quadrature nodes in order to achieve acceptable convergence. In addition, the steeply repulsive yet flat attractive interatomic short-range interactions prompt that the sensitivity of fingerprint be adjusted correspondingly in order to avoid numerical difficulties in training a regression model. The weight of the integral, , provides a convenient means for achieving the purpose. Different from , should instead satisfy the following conditions: 1) is normalized such that ; 2) decays sufficiently fast, but not necessarily to 0, beyond the outer most quadrature node; and 3) be sufficiently smooth beyond the outermost quadrature node. Candidates for includes the tent-like kernel and the bell-shaped kernel for , albeit with a different normalization factor. The Laplacian kernel

(29)

with a properly sized length scale also appears to be a candidate due to its similarity with part of the weight function of the Laguerre quadrature. Note that the constant kernel

(30)

may also be a choice as long as the density field already decays fast enough due to the density scaling function .

Figure 6: Shown here are examples of the distance matrices between fingerprints sampled from a biatomic system. As manifested by the difference between (A) against (B), a bell-shaped weight of integral helps to emphasize the near field. Meanwhile, the obvious discontinuities in the second row of matrices demonstrate the importance of the density scaling function when the fingerprint algorithm uses only atoms within a finite cutoff distance.

In Figure 6, we demonstrate the effect of the density scaling function and the weight of integral on the distance matrices between fingerprint obtained from a pair of atoms. The comparison between panel A and B shows that a bell-shaped integration weight allows the distance between fingerprints to change more rapidly when the atoms are closer but more slowly when the atoms are farther apart. The visible discontinuity in the second row clearly demonstrates the importance of the damping function when only atoms within a finite cutoff distance are used to compute the fingerprint.

We further examine the impact of the weight functions on the performance of Gaussian process regression (GPR) using the fingerprint of the interatomic force of a minimal system containing two nitrogen atoms. Despite the simplicity of the system, this case is of fundamental importance because of its ubiquity, and because the fast-growing repulsive regime of the Lennard-Jones potential could cause difficulty as a small change in the system configuration can trigger large changes in the regression target function. In Figure 7, we compare the performance among the combination of four weights of the integral and two density scaling functions. The initial training set consists of two samples collected at and

. The regression is then refined using a greedy strategy that consecutively learns the point with the largest posterior variance until the largest uncertainty, defined as twice the posterior standard deviation, is less than 0.1

. The active learning scheme is able to delivery a GPR model, for each and every combination of the weight functions, that closely fits the target function. However, the numbers of refinement iterations and the achieved accuracy do vary. Therefore, it is important to evaluate and choose the weight functions in the context of specific application scenarios.

Figure 7: Gaussian process regression of the force between two nitrogen atoms as a function of interatomic distance using different combinations of radial weight functions. Inset figures are plots of the regression function using distances from the feature space.

3.3 Quadrature Resolution and Density Kernel

Despite the formal convergence of the composite quadrature in DECAF, a cost of distance calculations and kernel evaluations are needed to sample a density field generated by atoms using quadrature nodes. A less prominent cost is associated with the distance calculation, which comes at a cost of floating point operations. Thus, in practice it is often desirable to use as few nodes as possible to capture only information of the density field within a certain band limit[26]. Accordingly, the integral cutoff , the number of quadrature nodes, and the width of the density kernel need to be tuned to obtain an optimal balance between resolution and computational speed.

When designing the composite quadrature rule, we chose the Laguerre quadrature for the radial direction because its nodes are denser near the origin but sparser farther away. This is consistent with our physical intuition that the near field generally has a stronger influence than the far field in an atomistic neighborhood. For example, the Van de Waals potential grows rapidly when atoms are in direct contact, but flattens out of the first coordinate shell. Accordingly, it may be possible for us to use sparser outer-layer grids to reduce the total number of quadrature nodes, while still keeping enough nodes in the inner layers to maintain the sensitivity of the quadrature toward close neighbors. Cooperatively, we can also use non-stationary Gaussian density kernels whose width dependent on the distance from the atom to the origin. In this way, the sparser nodes should still sufficiently sample the smoother far field. Wider kernels at remote atoms also reduce the total difference between the far fields of two fingerprints in a statistical sense. Thus, the contribution of the far field in the integral can be effectively tuned even though the weights on the quadrature nodes remain the same.

In Figure 8, we demonstrate how a variable-resolution quadrature can be combined with a widening smoothing density kernel to simultaneously reduce the computational complexity and preserve the quality of the fingerprint. In column A, a dense grid is used to sample density fields generated by a wide smoothing length. By examining the distance matrices of fingerprints sampled during bond stretching and angular stretching movements, we note that the radial similarity decreases monotonically while the angular similarity changes nearly constantly. In column B, the number of quadrature nodes is kept the same, but the smoothing length is reduced as an attempt to increase fingerprint sensitivity. Better response in the near field of the radial direction is obtained, but the linearity in the far field in the angular direction is compromised. In column C, the fingerprint performs even worse due to the combination of a sparser quadrature grid and a small smoothing length. In column D, the performance recovered because we let the smoothing length parameter of the Gaussian density kernels depend on the distance from the origin to each atom, and simultaneously adjust the quadrature node density according to this pattern.

Figure 8: A comparison of fingerprint distance matrices corresponding to bond stretching and angular stretching movements. (A) Dense grid + large smoothing length: radial similarity decreases monotonically while angular similarity changes nearly constantly. (B) Dense grid + smaller smoothing length: better fingerprint sensitivity in the near field for bond stretching, compromised linearity in the far field for angular stretching. (C) Sparser grid + smaller smoothing length: compromised far-field performance for both bond and angular movements. (D) Variable-resolution grid + radially dependent smoothing length: good resolution and linearity in both near and far fields.

4 Demonstration

4.1 Method

Regression tasks throughout this work are performed using Gaussian process regression (GPR), a nonlinear kernel method that treats training data points as discrete observations from an instantiation of a Gaussian process. Predictions are made using the posterior mean and variance of the joint distribution between the test data and the training data. One particular interesting property about Gaussian process is that the posterior variance may be interpreted as a measure of prediction uncertainty, which can be exploited to design active learning algorithms for sampling expensive functions. The actual computation used our own software implementation which was made publicly available on Zenodo 

[27]. We use the square exponential covariance kernel to compute the covariance, i.e. correlation, between the samples:

where and are DECAF fingerprints, and the distance between norm as computed by Eq. 24 or Eq. 26

. The kernel is stationary, meaning that the covariance depends only on the relative distance between two samples but not their absolute position. The training process searches for the hyperparameters,

i.e. the output variance and the length scale , that maximizes the likelihood of the training data. A detailed tutorial on GPR can be found in Ref. [15]. An illustration on the complete workflow of using the density field fingerprint to perform regression tasks is given in Figure 9.

Figure 9: Shown here is the workflow of regression using the density field fingerprint. The key stages, i.e. kernel minisum optimization, density field generation, and quadrature-based distance computation, are covered in detail in Sec. 2 and Sec. 3. The fingerprints can also be readily used to train artificial neural networks.

4.2 Potential Energy Surface

First, we attempt to fit the potential energy surface of a protonated water dimer system, in a head-to-head configuration, as a function of the oxygen-oxygen distance and the dihedral angle between the two planes each formed by a water molecule. As shown in Figure 10A, the system contains an improperly rotational symmetry, which we wish to capture with the kernel minisum algorithm. A GPR model was seeded with 8 training points corresponding to the combinations of and . Subsequently, an active learning protocol was used to greedily absorb points with the highest uncertainty into the training set. Despite that we restricted all training data to be within the subdomain , as shown by Figure 10B and 10C, we are able to accurately reproduce the target function over the entire parameter space after a few active learning steps.

The DECAF fingerprint used here is constructed with 3 spherical layers within a cutoff distance of 6.0 Å, each consisting of 14, 26, and 38 Lebedev quadrature nodes, respectively. The weight of integral was chosen as , where is the bell-shaped polynomial as defined in Appendix Eq. 35. The density scaling function , where is the vector from the atom to the fingerprint center, is the tent-like kernel as defined in Eq. 27 with . The density kernel that sits on the oxygen atoms assumes the form of a non-stationary Gaussian as discussed in Section 3.3: with being the vector from the atom to the fingerprint center and being the vector from the atom to the quadrature node. The density kernel for the hydrogen atoms has a different weight and width to ensure discriminability: .

Figure 10: Gaussian process regression is carried out to fit the potential energy surface of a protonated water dimer as a function of two internal variables, i.e. the oxygen-oxygen distance and the dihedral angle between the plane determined by the two water molecules. This system contains an improperly rotational symmetry, which can be correctly recognized by the kernel minisum-based algorithm given in Alg. 3. Only a quarter of the domain was used to train the GP, yet the model can accurately predict the energy of the entire parameter space thanks to symmetry detection.

4.3 Geometry Optimization and Vibrational Analysis

Next, we demonstrate the usability of fingerprint for fitting vector-valued quantities by performing geometry optimization and vibrational analysis on a single water molecule. The process involves the simultaneous regression of: 1) energy, a molecular scalar quantity; 2) force, a per-atom vector quantity; and 3) dipole, a molecular vector quantity. Correspondingly, we performed GPR of energy and dipole using fingerprints extracted from the center of mass of the molecule, and GPR of force using fingerprints extracted from each atom. Each component of the vector properties is modeled independently as a scalar Gaussian process. The training set consists of 45 configurations uniformly covering the range and . As shown in Table 3, the GPR model can successful drive calculations of the infrared spectrum of the molecule from randomly perturbed initial structures in arbitrary orientation. The fingerprint configuration is the same as in the previous section.

GPR DFT
Zero-point energy 0.591 0.003 eV 0.583 eV
Static dipole 2.1580 0.0001 D 2.159 D
Residual Force 0.0016 0.0005 eV/ 0.0003 eV/
Mode Frequency (cm^-1) Intensity (D/A)^2 amu^-1
GPR DFT GPR DFT
0 1576.5 1.4 1602.4 1.5726 0.0005 1.5767
1 3819.3 0.9 3817.5 0.2516 0.0005 0.2159
2 3916.7 1.6 3922.6 1.3349 0.0028 1.3401
Table 3: Geometry optimization and vibrational analysis of a single water molecule using GPR and our proposed fingerprint algorithm. 256 independent trials were performed using coordinates of water perturbed from equilibrium by a Gaussian noise followed by a randomly chosen rigid-body rotation.

4.4 Molecular Dyanmics Trajectory

As shown in Figure 11, here we attempt to fit for the forces felt by the atoms in a benzene molecule along the MD Trajectories as obtained from a sibling database of QM7 [28, 29]. The density kernel for the carbon atoms assumes the same functional form with that of the oxygen atoms, but uses a different smoothing length function . The rest of the parameters are inherited from the previous examples. The training configurations were chosen adaptively in an iterative process using the sum of the GPR posterior variance and the prediction error as the acquisition function.

Figure 11: The force exerted on the carbon atoms in a benzene molecule was fit using 100 out of the 1200 configurations in a 100-ns molecular dynamics trajectory at 0.5 ns time step size. Color coding corresponds to carbon atom id, while crosses indicate the atomistic configurations used in the GPR training.

5 Connection to Other Fingerprint Algorithms

In Figure 12, we compare the ability to distinguish atomistic configurations of our fingerprint as well as SOAP and the Coulomb matrix. Our work is inspired by the SOAP descriptor [12], which proposes the use of smoothed densities to represent atomistic neighborhoods. However, instead of converting the density field into the frequency domain using spherical harmonics, we perform density field sampling and comparison directly in the real space. This is enabled thanks to the available of canonical coordinate frame as computed through the kernel minisum optimization. We have mainly used the norm to compute the distance between atomistic neighborhoods. However, our fingerprint exhibits very similar behavior to SOAP when used together with an inner product formula

(31)

as demonstrated in Figure 12A. Thus, our fingerprint could be used in conjunction with a wide variety of covariance functions based on either the Euclidean distance or the inner product similarity.

Figure 12: A comparison between the distance measure used DECAF, SOAP, and the Coulomb matrix. Fingerprint distances shown in the plots are measured against in (A) and an randomly chosen initial state in (B).

At first sight, DECAF is very different from the Coulomb matrix fingerprint and GRAPE, which are both graph-based algorithms [17, 18]. However, instead of trying to capture the overall density field, if we measure the contribution from each individual atom on the quadrature nodes at as a row vector, and stacked up the results to yield the matrix

(32)

Then can be regarded as an incidence matrix [30] between atoms and the quadrature nodes. This is similar to the graph-based abstraction as seen in the Coulomb matrix and the GRAPE kernel. However, in both cases the vertices in the graph represent atoms while the edges represent pairwise interatomic interactions. Here, the density-based incidence matrix adopts the opposite pattern and constructs a graph with the quadrature nodes being vertices and atoms being edges. The adjacency matrix in this case is computed as the inner product :

(33)

The weight on the edges, as represented by the elements of the adjacency matrix , can be interpreted as the total flux as contributed by all paths each bridged by an atom . We have numerically found that the smallest eigenvalues (except for the 0 eigenvalue) of the symmetric normalized Laplacian

(34)

is invariant under rotation up to a certain noise level, even if the quadrature nodes do not rotate with the atoms. Nonetheless, this detour appears to represent a pure theoretical interest rather than any practical value.

Figure 13: A comparison between graph-based molecular fingerprints. The Coulomb Matrix and the GRAPE kernel construct graphs where nodes corresponds to atoms while the weights on the edges are determined by some pairwise inter-atomic interactions. In contrast, in the density-based incidence matrix we construct a graph on a set of quadrature nodes whose connectivity is weighted by a sum of contributions from individual atoms.

6 Conclusion

In this paper, we presented the Density-Encoded Canonically Aligned Fingerprint (DECAF) by exploring the idea of using smoothed density fields to represent and compare atomistic neighborhoods. One of the key enabling technique in DECAF is a kernel minisum algorithm, which allows the unambiguous identification of a canonically aligned coordinate frame that can be used for rotationally invariant projection of the density field as well as any associated vector quantities. We have performed detailed analysis to study the behavior of the fingerprint by changing various parameter, such as resolution, smoothing length, and the choice of weight functions. We demonstrate that the fingerprint algorithm can be used to implement highly accurate regressions of both scalar and vector properties of atomistic systems including energy, force and dipole moment, and could be a useful building block for constructing data-driven next generation force fields to accelerate molecular mechanics calculations with an accuracy comparable to those driven by quantum mechanical theories and calculators.

Acknowledgment

This work was supported by the Department of Energy (DOE) Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). This work was also supported by the Army Research Laboratory under Cooperative Agreement Number W911NF-12-2-0023.

References

  • [1] G. Zhao, J. R. Perilla, E. L. Yufenyuy, X. Meng, B. Chen, J. Ning, J. Ahn, A. M. Gronenborn, K. Schulten, C. Aiken, and Others. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature, 497(7451):643–646, 2013.
  • [2] K. Lindorff-Larsen, P. Maragakis, S. Piana, and D. E. Shaw. Picosecond to Millisecond Structural Dynamics in Human Ubiquitin. The Journal of Physical Chemistry B, 120(33):8313–8320, 2016.
  • [3] A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Goddard, and W. M. Skiff. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American Chemical Society, 114(25):10024–10035, 1992.
  • [4] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. Journal of the American Chemical Society, 117(19):5179–5197, 1995.
  • [5] W. L. Jorgensen, D. S. Maxwell, and J. Tirado-Rives. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. Journal of the American Chemical Society, 118(15):11225–11236, 1996.
  • [6] D. Frenkel and B. Smit. Understanding molecular simulation: from algorithms to applications, volume 1. Academic press, 2001.
  • [7] A. R. Leach. Molecular modelling: principles and applications. Pearson education, 2001.
  • [8] T. Cheng, A. Jaramillo-Botero, W. A. Goddard, and H. Sun. Adaptive accelerated ReaxFF reactive dynamics with validation from simulating hydrogen combustion. Journal of the American Chemical Society, 136(26):9434–9442, 2014.
  • [9] D. Braun, S. Boresch, and O. Steinhauser. Transport and dielectric properties of water and the influence of coarse-graining: Comparing BMW, SPC/E, and TIP3P models. The Journal of Chemical Physics, 140(6):064107, 2014.
  • [10] S. Boonstra, P. R. Onck, and E. van der Giessen. CHARMM TIP3P Water Model Suppresses Peptide Folding by Solvating the Unfolded State. The Journal of Physical Chemistry B, 120(15):3692–3698, 2016.
  • [11] J. Behler. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. The Journal of Chemical Physics, 134(134), 2011.
  • [12] A. P. Bartók, R. Kondor, and G. Csányi. On representing chemical environments. Physical Review B, 87(18):184115, 2013.
  • [13] Z. Li, J. R. Kermode, and A. De Vita. Molecular Dynamics with On-the-Fly Machine Learning of Quantum-Mechanical Forces. Physical Review Letters, 114(9):096405, 2015.
  • [14] A. Khorshidi and A. A. Peterson. Amp: A modular approach to machine learning in atomistic simulations. Computer Physics Communications, 207:310–324, 2016.
  • [15] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. 2006.
  • [16] D. Specht. A general regression neural network. IEEE Transactions on Neural Networks, 2(6):568–576, 1991.
  • [17] M. Rupp, A. Tkatchenko, K.-R. Muller, O. A. von Lilienfeld, K.-R. R. Müller, O. Anatole Von Lilienfeld, and O. A. von Lilienfeld. Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters, 108(5):58301, 2012.
  • [18] G. Ferré, T. Haut, and K. Barros. Learning molecular energies using localized graph kernels. The Journal of Chemical Physics, 146(11):114107, 2017.
  • [19] H. Y. Sun. Learning over Molecules : Representations and Kernels. PhD thesis, Harvard University, 2014.
  • [20] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426–31, 2005.
  • [21] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi. Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons. Physical Review Letters, 104(13):136403, 2010.
  • [22] V. Botu and R. Ramprasad. Learning scheme to predict atomic forces and accelerate materials simulations. Physical Review B, 92(9):094306, 2015.
  • [23] J. Barzilai and J. M. Borwein. Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis, 8(1):141–148, 1988.
  • [24] D. Lebedev, VI and Laikov. A quadrature formula for the sphere of the 131st algebraic order of accuracy. Doklady. Mathematics, 59(3):477–481, 1999.
  • [25] P. Rabinowitz and G. Weiss. Tables of Abscissas and Weights for Numerical Evaluation of Integrals of the Form int_0^infty e^{-x} x^n f(x) dx. Mathematical Tables and Other Aids to Computation, 13(68):285–294, 1959.
  • [26] B. Leistedt and J. D. McEwen. Exact Wavelets on the Ball. IEEE Transactions on Signal Processing, 60(12):6257–6269, 2012.
  • [27] Y.-H. Tang. Reference implementation for the algorithms presented in ”An Atomistic Fingerprint Algorithm for Learning Ab Initio Molecular Force Fields”, 2017. DOI: 10.5281/ZENODO.1054550.
  • [28] K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and A. Tkatchenko.

    Quantum-chemical insights from deep tensor neural networks.

    Nature Communications, 8:13890, 2017.
  • [29] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt, and K.-R. Müller. Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3(5):e1603015, 2017.
  • [30] J. L. Gross and J. Yellen. Graph theory and its applications. Chapman & Hall/CRC, 2005.
  • [31] M. Liu, G. Liu, and K. Lam. Constructing smoothing functions in smoothed particle hydrodynamics with applications. Journal of Computational and Applied Mathematics, 155(2):263–284, 2003.
  • [32] L. B. Lucy. A numerical approach to the testing of the fission hypothesis. The Astronomical Journal, 82(12), 1977.

Appendix

Polynomial Smoothing Functions with Compact Support

As candidates for the weight of integral and density scaling functions (Section 3.2), a class of compact polynomials that satisfy the criteria [31]:

  1. is compactly supported,

  2. is strictly positive within some cutoff distance ,

  3. decreases monotonically,

  4. is at least twice continuously differentiable

with minimal number of non-zero terms are:

(35)

where is the normalized complementary coordinate within the span of the kernel, and

(36)

is an optional normalization factor to ensure that the integral of the kernel in a 3D ball of radius is unity. The parameters and are free parameters that can be used to adjust the smoothness and width of the kernel, and can take any real numbers satisfying the condition . Note that the kernel is equivalent to the Lucy kernel commonly used in Smoothed Particle Hydrodynamics simulations [32]. The kernel can be evaluated very efficiently using only multiplication and addition when both and are integers.

Figure 14: Visualization of the polynomial kernels as given in Eq. 35 with a unit support radius. The kernels are bell-shaped with a derivative of 0 at the origin. Both the first and second derivatives of the kernels transition smoothly to 0 at its support radius. In contrast, the Gaussian kernel and its derivatives does not decay to zero at any finite distance, while the second derivative of the Cosine kernel as mentioned in previous work [11, 12] is not zero at the cutoff distance.

Table of Quadrature Nodes and Weights

In Table 4, we list the nodes and weights of the Laguerre quadrature rules up to , using notations from Eq. 22. In Table 5, we list the nodes and weights of the Lebedev quadrature rules up to , using notations from Eq. 21. The Laguerre and Lebedev quadrature nodes can be combined using Eq. 23-26 into composite grids for sampling the atomistic density field.

================================================================================
                   n = 0      n = 1      n = 2      n = 3      n = 4      n = 5
--------------------------------------------------------------------------------
 Nr = 2    r_n    2.0000     6.0000
           a_n    1.5000     0.5000
--------------------------------------------------------------------------------
 Nr = 3    r_n    1.5174     4.3116     9.1710
           a_n    1.0375     0.9058     0.0568
--------------------------------------------------------------------------------
 Nr = 4    r_n    1.2268     3.4125     6.9027    12.4580
           a_n    0.7255     1.0634     0.2067     0.0044
--------------------------------------------------------------------------------
 Nr = 5    r_n    1.0311     2.8372     5.6203     9.6829    15.8285
           a_n    0.5209     1.0667     0.3835     0.0286     0.0003
--------------------------------------------------------------------------------
 Nr = 6    r_n    0.8899     2.4331     4.7662     8.0483    12.6004    19.2620
           a_n    0.3844     0.9971     0.5361     0.0795     0.0029     0.0000
================================================================================
Table 4: Laguerre quadrature nodes and weights up to 6 points.
================================================================================
 Na = 6            x_m               y_m               z_m               b_m
 m  = 0,1       +-1.00000           0.00000           0.00000          0.16667
 m  = 2,3         0.00000         +-1.00000           0.00000          0.16667
 m  = 4,5         0.00000           0.00000         +-1.00000          0.16667
--------------------------------------------------------------------------------
 Na = 14           x_m               y_m               z_m               b_m
 m  = 0,1       +-1.00000           0.00000           0.00000          0.06667
 m  = 2,3         0.00000         +-1.00000           0.00000          0.06667
 m  = 4,5         0.00000           0.00000         +-1.00000          0.06667
 m  = 6-13      +-0.57735         +-0.57735         +-0.57735          0.07500
--------------------------------------------------------------------------------
 Na = 26           x_m               y_m               z_m               b_m
 m  = 0,1       +-1.00000           0.00000           0.00000          0.04762
 m  = 2,3         0.00000         +-1.00000           0.00000          0.04762
 m  = 4,5         0.00000           0.00000         +-1.00000          0.04762
 m  = 6-9         0.00000         +-0.70711         +-0.70711          0.03810
 m  = 10-13     +-0.70711           0.00000         +-0.70711          0.03810
 m  = 14-17     +-0.70711         +-0.70711           0.00000          0.03810
 m  = 18-25     +-0.57735         +-0.57735         +-0.57735          0.03214
--------------------------------------------------------------------------------
 Na = 38           x_m               y_m               z_m               b_m
 m  = 0,1       +-1.00000           0.00000           0.00000          0.00952
 m  = 2,3         0.00000         +-1.00000           0.00000          0.00952
 m  = 4,5         0.00000           0.00000         +-1.00000          0.00952
 m  = 6-13      +-0.57735         +-0.57735         +-0.57735          0.03214
 m  = 14-17     +-0.45970         +-0.88807           0.00000          0.02857
 m  = 18-21     +-0.88807         +-0.45970           0.00000          0.02857
 m  = 22-25     +-0.45970           0.00000         +-0.88807          0.02857
 m  = 26-29     +-0.88807           0.00000         +-0.45970          0.02857
 m  = 30-33       0.00000         +-0.45970         +-0.88807          0.02857
 m  = 34-37       0.00000         +-0.88807         +-0.45970          0.02857
--------------------------------------------------------------------------------
 Na = 50           x_m               y_m               z_m               b_m
 m  = 0,1       +-1.00000           0.00000           0.00000          0.01270
 m  = 2,3         0.00000         +-1.00000           0.00000          0.01270
 m  = 4,5         0.00000           0.00000         +-1.00000          0.01270
 m  = 6-9         0.00000         +-0.70711         +-0.70711          0.02257
 m  = 10-13     +-0.70711           0.00000         +-0.70711          0.02257
 m  = 14-17     +-0.70711         +-0.70711           0.00000          0.02257
 m  = 18-25     +-0.57735         +-0.57735         +-0.57735          0.02109
 m  = 26-33     +-0.30151         +-0.30151         +-0.90453          0.02017
 m  = 34-41     +-0.30151         +-0.90453         +-0.30151          0.02017
 m  = 42-49     +-0.90453         +-0.30151         +-0.30151          0.02017
================================================================================
Table 5: Lebedev quadrature nodes and weights up to 50 points.