1 Introduction
Molecular Dynamics (MD) simulations have been widely used for studying atomistic systems, e.g. proteins and catalysts, due to their ability to precisely capture transient events and to predict macroscopic properties from microscopic details [1, 2]. In its most prevalent implementation, the trajectory of an atomistic system is integrated in time according to the Newton’s law of motion using forces calculated as the negative gradient of a Hamiltonian, whose functional form and parameters are collectively referred to as a force field [3, 4, 5]. Traditionally, the pairwise and manybody terms that comprise a force field are derived empirically by fitting to quantum mechanical calculations and experimental data.
Three properties directly relate to the applicability of a force field: accuracy, transferrability, and complexity [6, 7]
. Over the years, a large number of force fields have been developed, each carrying a particular emphasis over these three properties. However, the combinatorial complexity of atomistic systems can easily outpace force field development efforts, the difficulty of which explodes following the curse of dimensionality
[8]. A deceptively simple system that can demonstrate the situation is water, a triatomic molecule with a wellcharacterized molecular structure. In fact, all common water models, such as SPCE, TIP3P, and TIP4P, have only succeeded in reproducing a small number of structural and dynamical properties of water due to the difficulty in modeling strong intermolecular manybody effects such as hydrogen bonding and polarization [9, 10].In lieu of a force field, quantum mechanical (QM) calculations can be employed straightforwardly to drive molecular dynamics simulations. The method achieves significantly better accuracy and transferrability by solving for the electronic structure of the system. However, the computational complexity of QM methods is at least cubic in the number of electrons, and consequently the time and length scales accessible by QMdriven molecular dynamics are severely constrained.
Assuming that there is smoothness in the potential energy surface of the atomistic system, one possible strategy to accelerate QMdriven molecular dynamics is to use QM calculations on only a subset of the time steps, and to interpolate for similar atomic configurations
[11, 12, 13, 14]. A schematic overview of the process is given in Figure 1, which is enabled by the recent development of highdimensional nonlinear statistical learning and regression techniques such as Gaussian process regression [15]and artificial neural networks
[16].This paper focuses on a particular aspect of the machinelearningdriven molecular computation pipeline, i.e. fingerprint algorithms, whose importance arises naturally from the aforementioned regression protocol. A fingerprint is an encoding of an atomistic configuration that can facilitate regression tasks such as similarity comparison across structures consisting of variable numbers of atoms and elements. As has been pointed out previously [12], a good fingerprint should possess the following properties:

It can be encoded as a fixedlength vector so as to facilitate regression (particularly for artificial neural networks).

It is complete, i.e. different atomistic neighborhood configurations lead to different fingerprints and vice versa, and the ‘distance’ between the fingerprints should be proportional to the intrinsic difference between the atomistic neighborhood configurations.

It is continuous with regard to atomistic coordinates, and the change in fingerprint should be approximately proportional to the structural variation as characterized by, for example, some internal coordinates.

It is invariant under permutation, rotation, and translation.

It is computationally feasible and straightforward to implement.
Before we proceed to the details of our work, we will first briefly review several fingerprints that are closely related to our work, i.e. the Smooth Overlap of Atomic Positions (SOAP) kernel [12], the Coulomb matrix [17], and the Graph Approximated Energy (GRAPE) kernel [18].
Smooth Overlap of Atomic Positions (SOAP):
The SOAP kernel is built on the idea of representing atomistic neighborhoods as smoothed density fields using Gaussian kernels each centered at a neighbor atom. Similarity is measured as the inner product between density fields, while rotational invariance is achieved by integrating over all possible 3D rotations, which can be performed analytically using the power spectrum of the density field. In fact, our fingerprint algorithm is inspired by this idea of treating atoms as smoothed density fields. However, we take a different approach to endorse the fingerprint with rotational invariance, and use the Euclidean distance instead of inner product as a distance metric.
Coulomb Matrix:
The practice of using graphs to represent atomistic neighbor configurations was first implied by the Coulomb matrix, and later further formulated in the GRAPE kernel, where the diffusion distance was proposed as a similarity measure between different local chemical environments [19]. The idea is to construct an undirected, unlabeled graph with atoms serving as the vertices and pairwise interactions weighting the edges. For example, the Coulomb matrix can be treated as a physicallyinspired Laplacian matrix [20]
(1)  
(2)  
(3) 
where the degree matrix encodes a polynomial fit of atomic energies to the nuclear charge, while the adjacency matrix
corresponds to the Coulombic interactions between all pairs of atoms. Due to the use of only relative positions between atoms in the adjacency matrix, the Coulomb matrix is automatically invariant under translation and rotation. However, the matrix itself is not invariant under permutation, as swapping the order of two atoms will result in an exchange of the corresponding columns and the rows. To address this, the sorted list of eigenvalues of the Coulomb matrix can be used instead as a feature vector, while an
norm can be used as a distance metric. In practice, due to the fact that the number of neighbor atoms may change, the shorter eigenvalue list is padded with zeros in a distance computation.
Graph Approximated Energy (GRAPE):
The GRAPE kernel evaluates the simultaneous random walks on the direct product of the two graphs representing two atomistic neighborhood configurations. Permutational invariance is achieved by choosing a uniform starting and stopping distribution across nodes of both graphs. However, the cost of distance computation between two graphs scales as with a onetime pergraph diagonalization cost of .
In the sections below, we present our new fingerprint algorithm, namely the DensityEncoded Canonically Aligned Fingerprint (DECAF). The paper is organized as follows: in Section 2, we introduce a robust algorithm that can determine canonical coordinate frames for obtaining symmetryinvariant projections; in Section 3, we present numerical recipes to use smoothed atomistic density fields as a fingerprint for molecular configuration; in Section 4, we demonstrate the capability of the fingerprint via examples involving the regression of atomistic potential energy surfaces; in Section 5, we discuss the connection between our algorithm and previously proposed ones; we conclude with a discussion in Section 6.
2 Localized Canonical Coordinate Frame for Rotationally Invariant Description of Atomistic Neighborhood
2.1 Kernel Minisum Approach
To improve model generalization while minimizing data redundancy, a fingerprint algorithm should be able to recognize atomistic structures that differ only by a rigidbody transformation or a permutation of atoms of the same element, and to extract feature vectors invariant under these transformations. As summarized in Table 1, a variety of strategies have been successfully employed by common fingerprint algorithms to achieve rotational invariance.
However, these approaches do not provide a means for the acquisition of vectorvalued quantities in a rotational invariant form. One approach is to only acquire and interpolate the potential energy, a scalar quantity, and then take the derivative of the regression model. This approach, however, triggers the need for methods to decompose the total energy among the atoms, which is a property of the entire system rather than individual atoms [21].
Another approach proposed by Li et al. [13, 22] is to project vector quantities onto a potentially overcomplete set of nonorthogonal basis vectors obtained from a weighted sum of the atomic coordinate vectors:
(4) 
However, the approach may suffer from robustness issues. For example, all of the generated with different will point in the same direction if the radial distance of the atoms are all equal. Further, the configuration with 4 atoms at leads to
(5)  
(6) 
Thus, if gets close to zero, will always point toward either or , even if the vector quantity of interest may point in other directions.
Fingerprint  Invariance  
Translation  Permutation  Rotation  
Coulomb matrix[17]  relative distance  sorting eigenvalues  all vs. all graph 
Behler[11]  relative distance  summation  ignoring angular information 
SOAP[12]  relative distance  summation  integrating over all rotations 
GRAPE[18]  relative distance  uniform distribution  uniform distribution 
Here, we present a robust kernel PCAinspired algorithm for the explicit determination of a canonical coordinate frame, within which the projection of the atomistic neighborhood is invariant under rigidbody rotation. Furthermore, the canonical coordinate frame can be directly used to capture vectorvalued quantities in a rotationalinvariant form. Given atoms with position , we first formulate the PCA algorithm as an optimization problem where we seek a unit vector that maximizes the sum of the projections:
(7)  
(8) 
where is the distance from the origin to atom , is the unit vector pointing toward atom , respectively. The optimization process can only uniquely determine the orientation of a projection vector up to a line, because
. As a consequence, further heuristics are needed to identify a specific direction for the PCA vectors.
To overcome this difficulty, we generalize the term into a weight function of radial distance and the term into a bivariate kernel function between two vectors. We then attempt to seek a unit vector that minimizes the kernel summation:
(9) 
In particular, we have found a square angle (SA) kernel and an exponentiated cosine (EC) kernel that perform well in practice:
(10)  
(11) 
As shown in Figure 2, both kernels are minimal when and are parallel, and monotonically reach maximum when and are antiparallel. Intuitively, optimizing the minisum objective function generated by the SA kernel will yield a vector that, loosely speaking, bisects the sector occupied by the atoms. The EC kernel exhibits very similar behavior but leads to a smoother objective function. As shown in Figure 2, this allows for the determination of a projection vector without ambiguity, even if the atom configuration contains perfect symmetry.
A major advantage of the kernel minisum approach versus normbased PCA, lies in its 1) robustness in the presence of structural symmetry; and 2) continuity of the resulting principal axes with respect to angular movement of the input data. As shown in Figure 3, kernel minisum is particularly suitable for atomistic systems where strong symmetries are common and the continuity against angular movement is desired. The minisum framework can also be used with other customized kernels to suit for the characteristics of specific application scenarios.
2.2 Solving the Kernel Minisum Optimization Problems
The optimization problem can be solved very efficiently using a gradient descent algorithm as detailed below.
Square Angle:
The objective function of the minisum problem using the square angle (SA) kernel is
(12) 
The gradient of with respect to is
(13) 
Note that is singular when . This can be treated numerically by replacing the removable singularities at with the leftlimit , while truncating the gradient at a finite threshold near the poles at .
A local minimum can be iteratively searched for with gradient descent while renormalizing after each iteration. Moreover, due to the locally quadratic nature of the objective function, we have found that the BarzilaiBorwein algorithm [23] can significantly accelerate the convergence at a minimal cost. The algorithm is presented in Alg. 1.
Exponentiated Cosine:
The objective function of the minisum problem using the exponentiated cosine (EC) kernel is:
(14) 
The gradient of with respect to is
(15) 
The gradient contains no singularity. However, it is not always locally quadratic or convex. This can cause the BarzilaiBorwein algorithm to generate negative step sizes and consequently divert the search towards a maximum. Luckily, this can be easily overcome by always using the absolute value of the step size generated by the BarzilaiBorwein algorithm. Such enforcement prevents the minimization algorithm from going uphill. The complete algorithm is given in Alg. 2.
As shown in Table 2, both Alg. 1 and Alg. 2 converge quickly and consistently across a wide range of representative point configurations commonly found in molecular systems. However, the gradient descent method can only find local optima. Thus, multiple trials should be performed using different initial guesses to ensure that a global minimum can be located.
Kernel  Square Angle  Exponentiated Cosine  
Configuration  Itrs./Guess  Guesses  Itrs./Guess  Guesses 
Single point  8.8  1  7.9  1 
Two points, angle  7.1  1  7.7  1 
Two points, angle  6.1  1  6.3  1 
Two points, angle  6.0  1  5.6  1 
Planar  6.2  1  6.3  1 
Planar  5.6  1  6.9  1 
Tetrahedra  10.6  1  7.7  1 
Octahedra  11.7  1  9.4  1 
Improper  17.9  1  23.1  1 
Improper  14.7  1  18.0  1 
2D random 10 points  7.2  1.1  8.9  1 
3D random 50 points  16.9  1.2  14.4  1 
2.3 Complete Set of Orthogonal Projection Vectors as A Canonical Coordinate Frame
In 3D, a complete set of orthogonal bases can be found greedily using the protocol as described in Alg. 3. Specifically, we use the globally optimal solution of the minisum optimization problem as the first basis , and the constrained optimal solution in a plane orthogonal to as the second basis . Special care must be taken for determining the third basis
, as the only degree of freedom now is its sign due to the orthogonality constraint. The straightforward approach of choosing the direction that gives the smaller objective function value may fail, for example, when the system contains improper rotational symmetry. In that case,
and are interchangeable and both perpendicular to the rotationreflection axis. As a result, the two candidates of will both align with the rotationreflection axis and are thus indistinguishable by kernel minisum. However, the projection of the atoms into the two seemingly equivalent coordinate frames are not identical, but rather mirror images of each other. Fortunately, this can be addressed by choosing the direction of the halfspace, as created by the plane , that yields the smaller kernel objective function between the bisector of and versus the points lies in that halfspace. This rule can also handle general situations with/without symmetry.It is difficult to prove global uniqueness of the kernel minisum solution given the nonconvex nature of the exponentiated cosine and square angle kernels. In fact, it seems that the only kernel that allows analytical proof of solution uniqueness is , whose solution simply corresponds to the weighted center of mass of the neighbor atoms. Unfortunately, this simple kernel is not robust against reflectional and rotational symmetry. Luckily, the rare cases where two global optimal solutions do coexist can be safely captured by the repeated searching procedure starting from different seeds. Thus, a fingerprint can be extracted using each of the resulting coordidate frame. This may mildly increase the size of the training set, which could even be advantagenous when training data is scarce.
3 DensityEncoded Canonically Aligned Fingerprint
3.1 Density Field and Approximation of Volume Integral
The local density field around a point is formulated as a superimposition of smoothing kernel functions each centered at a neighbor atom with relative displacement with regard to and within a cutoff distance :
(16) 
This density field, as has been pointed out previously [12], may be used as a fingerprint of the local atomistic environment. Here, we assume that the smoothing kernel takes the form of a stationary Gaussian . We also assume that the density scaling function , which ensures the continuity of the density field when atoms enter or exit the cutoff distance, is a bellshaped function with compact support. Further discussion on both and can be found in Section 3.2 and Section 3.3, respectively.
To achieve rotational invariance, we project the atom coordinates into the canonical coordinate frame as determined by the kernel minisum algorithm, when generating the density field:
(17) 
Depending on the specific application, may not necessarily overlap with any of the . Scalar properties can be acquired directly from the target atom, while vectorvalued properties, such as force, can be acquired and interpolated in the local orthogonal coordinates as .
We define the distance between two density fields and as a weighted volume integral of their pointwise difference:
(18)  
(19) 
The weight function provides additional flexibility for emphasizing particular regions of the atomistic neighborhood. It could play an important role for fitting properties with steep gradients, e.g. the repulsive part of the LennardJones potential.
We now introduce an optimal quadrature rule to approximate the integral in Eq. 19 in a computationally tractable manner. A quadrature rule is a numerical recipe in the form , which numerically approximates a definite integral using only discrete evaluations of the integrand. To determine the quadrature nodes and weights, we decompose the volume integral in Eq. 19 into a surface integral over spherical shells and a 1D integral along the radial direction:
(20) 
The surface integral can be optimally approximated using the Lebedev quadrature rule [24]:
(21) 
where , , and are the weights, positional unit vectors, and number of the Lebedev nodes, respectively. The radial integral fits well into the generalized LaguerreGauss quadrature formula with weight function [25]:
(22) 
where , , and are the weights, coordinates, and number of the Laguerre nodes, respectively. Combining Eq. 20–22, a composite quadrature rule can be generated consisting of several spherical layers of nodes. As shown in Figure 5, the radial coordinates of the quadrature nodes are determined by the Laguerre quadrature nodes, while the angular positions are determined by the Lebedev quadrature nodes, respectively. This composite quadrature formula translates the 3D volume integral into a summation over discrete grid points:
(23) 
Using the right hand side of Eq. 23 to replace the integral in Eq. 19, and use the multiindex notation to enumerate over the quadrature nodes located at with weights , we obtain the final discretized formula for computing the distance between the fingerprints:
(24) 
For quick reference, we tabulated in Appendix the values for and in the Laguerre quadrature of up to 6 points, and the values for , in the Lebedev quadrature of up to 50 points.
In addition, the quadrature nodes could be radially scaled such that the outer most nodes lie at a radius within some cutoff distance . This allows us to fit a Laguerre quadrature of any order within an arbitrary cutoff distance. The scaled quadrature rule is given by:
(25)  
(26) 
Since the scaling is simply constant among all nodes, it can be safely omitted in many regression tasks where only the relative distance between the fingerprints are of significance.
3.2 Radial Weight Functions
In this section, we examine two radial weight functions that can be used to finetune the density field fingerprint: the density scaling function as appears in Eq. 17 and the weight of integral as appears in Eq. 19.
Driven by the interest of reducing computational cost, we would like to use a cutoff distance to select atoms involved in constructing the density field. However, it is important to ensure that atoms will enter and exit the neighborhood smoothly. This naturally requests that the contribution of an atom to be zero outside of the cutoff, and to increase continuously and smoothly when the atom approaching entrance. Correspondingly, should: 1) become unity at the origin; 2) smoothly approach zero at the cutoff distance; and 3) be twicedifferentiable to ensure the differentiability of regression models based on the fingerprint. Candidates satisfying the above conditions include, for example, a tentlike kernel
(27) 
and a bellshaped polynomial kernel with compact support
(28) 
as detailed in Appendix.
The approximation of the radial integral using a Laguerre quadrature requires that the integrand, i.e. the pointwise difference between the atomistic density fields, decays sufficiently fast beyond the outermost quadrature nodes in order to achieve acceptable convergence. In addition, the steeply repulsive yet flat attractive interatomic shortrange interactions prompt that the sensitivity of fingerprint be adjusted correspondingly in order to avoid numerical difficulties in training a regression model. The weight of the integral, , provides a convenient means for achieving the purpose. Different from , should instead satisfy the following conditions: 1) is normalized such that ; 2) decays sufficiently fast, but not necessarily to 0, beyond the outer most quadrature node; and 3) be sufficiently smooth beyond the outermost quadrature node. Candidates for includes the tentlike kernel and the bellshaped kernel for , albeit with a different normalization factor. The Laplacian kernel
(29) 
with a properly sized length scale also appears to be a candidate due to its similarity with part of the weight function of the Laguerre quadrature. Note that the constant kernel
(30) 
may also be a choice as long as the density field already decays fast enough due to the density scaling function .
In Figure 6, we demonstrate the effect of the density scaling function and the weight of integral on the distance matrices between fingerprint obtained from a pair of atoms. The comparison between panel A and B shows that a bellshaped integration weight allows the distance between fingerprints to change more rapidly when the atoms are closer but more slowly when the atoms are farther apart. The visible discontinuity in the second row clearly demonstrates the importance of the damping function when only atoms within a finite cutoff distance are used to compute the fingerprint.
We further examine the impact of the weight functions on the performance of Gaussian process regression (GPR) using the fingerprint of the interatomic force of a minimal system containing two nitrogen atoms. Despite the simplicity of the system, this case is of fundamental importance because of its ubiquity, and because the fastgrowing repulsive regime of the LennardJones potential could cause difficulty as a small change in the system configuration can trigger large changes in the regression target function. In Figure 7, we compare the performance among the combination of four weights of the integral and two density scaling functions. The initial training set consists of two samples collected at and
. The regression is then refined using a greedy strategy that consecutively learns the point with the largest posterior variance until the largest uncertainty, defined as twice the posterior standard deviation, is less than 0.1
. The active learning scheme is able to delivery a GPR model, for each and every combination of the weight functions, that closely fits the target function. However, the numbers of refinement iterations and the achieved accuracy do vary. Therefore, it is important to evaluate and choose the weight functions in the context of specific application scenarios.3.3 Quadrature Resolution and Density Kernel
Despite the formal convergence of the composite quadrature in DECAF, a cost of distance calculations and kernel evaluations are needed to sample a density field generated by atoms using quadrature nodes. A less prominent cost is associated with the distance calculation, which comes at a cost of floating point operations. Thus, in practice it is often desirable to use as few nodes as possible to capture only information of the density field within a certain band limit[26]. Accordingly, the integral cutoff , the number of quadrature nodes, and the width of the density kernel need to be tuned to obtain an optimal balance between resolution and computational speed.
When designing the composite quadrature rule, we chose the Laguerre quadrature for the radial direction because its nodes are denser near the origin but sparser farther away. This is consistent with our physical intuition that the near field generally has a stronger influence than the far field in an atomistic neighborhood. For example, the Van de Waals potential grows rapidly when atoms are in direct contact, but flattens out of the first coordinate shell. Accordingly, it may be possible for us to use sparser outerlayer grids to reduce the total number of quadrature nodes, while still keeping enough nodes in the inner layers to maintain the sensitivity of the quadrature toward close neighbors. Cooperatively, we can also use nonstationary Gaussian density kernels whose width dependent on the distance from the atom to the origin. In this way, the sparser nodes should still sufficiently sample the smoother far field. Wider kernels at remote atoms also reduce the total difference between the far fields of two fingerprints in a statistical sense. Thus, the contribution of the far field in the integral can be effectively tuned even though the weights on the quadrature nodes remain the same.
In Figure 8, we demonstrate how a variableresolution quadrature can be combined with a widening smoothing density kernel to simultaneously reduce the computational complexity and preserve the quality of the fingerprint. In column A, a dense grid is used to sample density fields generated by a wide smoothing length. By examining the distance matrices of fingerprints sampled during bond stretching and angular stretching movements, we note that the radial similarity decreases monotonically while the angular similarity changes nearly constantly. In column B, the number of quadrature nodes is kept the same, but the smoothing length is reduced as an attempt to increase fingerprint sensitivity. Better response in the near field of the radial direction is obtained, but the linearity in the far field in the angular direction is compromised. In column C, the fingerprint performs even worse due to the combination of a sparser quadrature grid and a small smoothing length. In column D, the performance recovered because we let the smoothing length parameter of the Gaussian density kernels depend on the distance from the origin to each atom, and simultaneously adjust the quadrature node density according to this pattern.
4 Demonstration
4.1 Method
Regression tasks throughout this work are performed using Gaussian process regression (GPR), a nonlinear kernel method that treats training data points as discrete observations from an instantiation of a Gaussian process. Predictions are made using the posterior mean and variance of the joint distribution between the test data and the training data. One particular interesting property about Gaussian process is that the posterior variance may be interpreted as a measure of prediction uncertainty, which can be exploited to design active learning algorithms for sampling expensive functions. The actual computation used our own software implementation which was made publicly available on Zenodo
[27]. We use the square exponential covariance kernel to compute the covariance, i.e. correlation, between the samples:where and are DECAF fingerprints, and the distance between norm as computed by Eq. 24 or Eq. 26
. The kernel is stationary, meaning that the covariance depends only on the relative distance between two samples but not their absolute position. The training process searches for the hyperparameters,
i.e. the output variance and the length scale , that maximizes the likelihood of the training data. A detailed tutorial on GPR can be found in Ref. [15]. An illustration on the complete workflow of using the density field fingerprint to perform regression tasks is given in Figure 9.4.2 Potential Energy Surface
First, we attempt to fit the potential energy surface of a protonated water dimer system, in a headtohead configuration, as a function of the oxygenoxygen distance and the dihedral angle between the two planes each formed by a water molecule. As shown in Figure 10A, the system contains an improperly rotational symmetry, which we wish to capture with the kernel minisum algorithm. A GPR model was seeded with 8 training points corresponding to the combinations of and . Subsequently, an active learning protocol was used to greedily absorb points with the highest uncertainty into the training set. Despite that we restricted all training data to be within the subdomain , as shown by Figure 10B and 10C, we are able to accurately reproduce the target function over the entire parameter space after a few active learning steps.
The DECAF fingerprint used here is constructed with 3 spherical layers within a cutoff distance of 6.0 Å, each consisting of 14, 26, and 38 Lebedev quadrature nodes, respectively. The weight of integral was chosen as , where is the bellshaped polynomial as defined in Appendix Eq. 35. The density scaling function , where is the vector from the atom to the fingerprint center, is the tentlike kernel as defined in Eq. 27 with . The density kernel that sits on the oxygen atoms assumes the form of a nonstationary Gaussian as discussed in Section 3.3: with being the vector from the atom to the fingerprint center and being the vector from the atom to the quadrature node. The density kernel for the hydrogen atoms has a different weight and width to ensure discriminability: .
4.3 Geometry Optimization and Vibrational Analysis
Next, we demonstrate the usability of fingerprint for fitting vectorvalued quantities by performing geometry optimization and vibrational analysis on a single water molecule. The process involves the simultaneous regression of: 1) energy, a molecular scalar quantity; 2) force, a peratom vector quantity; and 3) dipole, a molecular vector quantity. Correspondingly, we performed GPR of energy and dipole using fingerprints extracted from the center of mass of the molecule, and GPR of force using fingerprints extracted from each atom. Each component of the vector properties is modeled independently as a scalar Gaussian process. The training set consists of 45 configurations uniformly covering the range and . As shown in Table 3, the GPR model can successful drive calculations of the infrared spectrum of the molecule from randomly perturbed initial structures in arbitrary orientation. The fingerprint configuration is the same as in the previous section.
GPR  DFT  
Zeropoint energy  0.591 0.003 eV  0.583 eV  
Static dipole  2.1580 0.0001 D  2.159 D  
Residual Force  0.0016 0.0005 eV/  0.0003 eV/  
Mode  Frequency (cm^1)  Intensity (D/A)^2 amu^1  
GPR  DFT  GPR  DFT  
0  1576.5 1.4  1602.4  1.5726 0.0005  1.5767 
1  3819.3 0.9  3817.5  0.2516 0.0005  0.2159 
2  3916.7 1.6  3922.6  1.3349 0.0028  1.3401 
4.4 Molecular Dyanmics Trajectory
As shown in Figure 11, here we attempt to fit for the forces felt by the atoms in a benzene molecule along the MD Trajectories as obtained from a sibling database of QM7 [28, 29]. The density kernel for the carbon atoms assumes the same functional form with that of the oxygen atoms, but uses a different smoothing length function . The rest of the parameters are inherited from the previous examples. The training configurations were chosen adaptively in an iterative process using the sum of the GPR posterior variance and the prediction error as the acquisition function.
5 Connection to Other Fingerprint Algorithms
In Figure 12, we compare the ability to distinguish atomistic configurations of our fingerprint as well as SOAP and the Coulomb matrix. Our work is inspired by the SOAP descriptor [12], which proposes the use of smoothed densities to represent atomistic neighborhoods. However, instead of converting the density field into the frequency domain using spherical harmonics, we perform density field sampling and comparison directly in the real space. This is enabled thanks to the available of canonical coordinate frame as computed through the kernel minisum optimization. We have mainly used the norm to compute the distance between atomistic neighborhoods. However, our fingerprint exhibits very similar behavior to SOAP when used together with an inner product formula
(31) 
as demonstrated in Figure 12A. Thus, our fingerprint could be used in conjunction with a wide variety of covariance functions based on either the Euclidean distance or the inner product similarity.
At first sight, DECAF is very different from the Coulomb matrix fingerprint and GRAPE, which are both graphbased algorithms [17, 18]. However, instead of trying to capture the overall density field, if we measure the contribution from each individual atom on the quadrature nodes at as a row vector, and stacked up the results to yield the matrix
(32) 
Then can be regarded as an incidence matrix [30] between atoms and the quadrature nodes. This is similar to the graphbased abstraction as seen in the Coulomb matrix and the GRAPE kernel. However, in both cases the vertices in the graph represent atoms while the edges represent pairwise interatomic interactions. Here, the densitybased incidence matrix adopts the opposite pattern and constructs a graph with the quadrature nodes being vertices and atoms being edges. The adjacency matrix in this case is computed as the inner product :
(33) 
The weight on the edges, as represented by the elements of the adjacency matrix , can be interpreted as the total flux as contributed by all paths each bridged by an atom . We have numerically found that the smallest eigenvalues (except for the 0 eigenvalue) of the symmetric normalized Laplacian
(34) 
is invariant under rotation up to a certain noise level, even if the quadrature nodes do not rotate with the atoms. Nonetheless, this detour appears to represent a pure theoretical interest rather than any practical value.
6 Conclusion
In this paper, we presented the DensityEncoded Canonically Aligned Fingerprint (DECAF) by exploring the idea of using smoothed density fields to represent and compare atomistic neighborhoods. One of the key enabling technique in DECAF is a kernel minisum algorithm, which allows the unambiguous identification of a canonically aligned coordinate frame that can be used for rotationally invariant projection of the density field as well as any associated vector quantities. We have performed detailed analysis to study the behavior of the fingerprint by changing various parameter, such as resolution, smoothing length, and the choice of weight functions. We demonstrate that the fingerprint algorithm can be used to implement highly accurate regressions of both scalar and vector properties of atomistic systems including energy, force and dipole moment, and could be a useful building block for constructing datadriven next generation force fields to accelerate molecular mechanics calculations with an accuracy comparable to those driven by quantum mechanical theories and calculators.
Acknowledgment
This work was supported by the Department of Energy (DOE) Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). This work was also supported by the Army Research Laboratory under Cooperative Agreement Number W911NF1220023.
References
 [1] G. Zhao, J. R. Perilla, E. L. Yufenyuy, X. Meng, B. Chen, J. Ning, J. Ahn, A. M. Gronenborn, K. Schulten, C. Aiken, and Others. Mature HIV1 capsid structure by cryoelectron microscopy and allatom molecular dynamics. Nature, 497(7451):643–646, 2013.
 [2] K. LindorffLarsen, P. Maragakis, S. Piana, and D. E. Shaw. Picosecond to Millisecond Structural Dynamics in Human Ubiquitin. The Journal of Physical Chemistry B, 120(33):8313–8320, 2016.
 [3] A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Goddard, and W. M. Skiff. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American Chemical Society, 114(25):10024–10035, 1992.
 [4] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. Journal of the American Chemical Society, 117(19):5179–5197, 1995.
 [5] W. L. Jorgensen, D. S. Maxwell, and J. TiradoRives. Development and Testing of the OPLS AllAtom Force Field on Conformational Energetics and Properties of Organic Liquids. Journal of the American Chemical Society, 118(15):11225–11236, 1996.
 [6] D. Frenkel and B. Smit. Understanding molecular simulation: from algorithms to applications, volume 1. Academic press, 2001.
 [7] A. R. Leach. Molecular modelling: principles and applications. Pearson education, 2001.
 [8] T. Cheng, A. JaramilloBotero, W. A. Goddard, and H. Sun. Adaptive accelerated ReaxFF reactive dynamics with validation from simulating hydrogen combustion. Journal of the American Chemical Society, 136(26):9434–9442, 2014.
 [9] D. Braun, S. Boresch, and O. Steinhauser. Transport and dielectric properties of water and the influence of coarsegraining: Comparing BMW, SPC/E, and TIP3P models. The Journal of Chemical Physics, 140(6):064107, 2014.
 [10] S. Boonstra, P. R. Onck, and E. van der Giessen. CHARMM TIP3P Water Model Suppresses Peptide Folding by Solvating the Unfolded State. The Journal of Physical Chemistry B, 120(15):3692–3698, 2016.
 [11] J. Behler. Atomcentered symmetry functions for constructing highdimensional neural network potentials. The Journal of Chemical Physics, 134(134), 2011.
 [12] A. P. Bartók, R. Kondor, and G. Csányi. On representing chemical environments. Physical Review B, 87(18):184115, 2013.
 [13] Z. Li, J. R. Kermode, and A. De Vita. Molecular Dynamics with OntheFly Machine Learning of QuantumMechanical Forces. Physical Review Letters, 114(9):096405, 2015.
 [14] A. Khorshidi and A. A. Peterson. Amp: A modular approach to machine learning in atomistic simulations. Computer Physics Communications, 207:310–324, 2016.
 [15] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. 2006.
 [16] D. Specht. A general regression neural network. IEEE Transactions on Neural Networks, 2(6):568–576, 1991.
 [17] M. Rupp, A. Tkatchenko, K.R. Muller, O. A. von Lilienfeld, K.R. R. Müller, O. Anatole Von Lilienfeld, and O. A. von Lilienfeld. Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters, 108(5):58301, 2012.
 [18] G. Ferré, T. Haut, and K. Barros. Learning molecular energies using localized graph kernels. The Journal of Chemical Physics, 146(11):114107, 2017.
 [19] H. Y. Sun. Learning over Molecules : Representations and Kernels. PhD thesis, Harvard University, 2014.
 [20] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426–31, 2005.
 [21] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi. Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons. Physical Review Letters, 104(13):136403, 2010.
 [22] V. Botu and R. Ramprasad. Learning scheme to predict atomic forces and accelerate materials simulations. Physical Review B, 92(9):094306, 2015.
 [23] J. Barzilai and J. M. Borwein. TwoPoint Step Size Gradient Methods. IMA Journal of Numerical Analysis, 8(1):141–148, 1988.
 [24] D. Lebedev, VI and Laikov. A quadrature formula for the sphere of the 131st algebraic order of accuracy. Doklady. Mathematics, 59(3):477–481, 1999.
 [25] P. Rabinowitz and G. Weiss. Tables of Abscissas and Weights for Numerical Evaluation of Integrals of the Form int_0^infty e^{x} x^n f(x) dx. Mathematical Tables and Other Aids to Computation, 13(68):285–294, 1959.
 [26] B. Leistedt and J. D. McEwen. Exact Wavelets on the Ball. IEEE Transactions on Signal Processing, 60(12):6257–6269, 2012.
 [27] Y.H. Tang. Reference implementation for the algorithms presented in ”An Atomistic Fingerprint Algorithm for Learning Ab Initio Molecular Force Fields”, 2017. DOI: 10.5281/ZENODO.1054550.

[28]
K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and
A. Tkatchenko.
Quantumchemical insights from deep tensor neural networks.
Nature Communications, 8:13890, 2017.  [29] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt, and K.R. Müller. Machine learning of accurate energyconserving molecular force fields. Science Advances, 3(5):e1603015, 2017.
 [30] J. L. Gross and J. Yellen. Graph theory and its applications. Chapman & Hall/CRC, 2005.
 [31] M. Liu, G. Liu, and K. Lam. Constructing smoothing functions in smoothed particle hydrodynamics with applications. Journal of Computational and Applied Mathematics, 155(2):263–284, 2003.
 [32] L. B. Lucy. A numerical approach to the testing of the fission hypothesis. The Astronomical Journal, 82(12), 1977.
Appendix
Polynomial Smoothing Functions with Compact Support
As candidates for the weight of integral and density scaling functions (Section 3.2), a class of compact polynomials that satisfy the criteria [31]:

is compactly supported,

is strictly positive within some cutoff distance ,

decreases monotonically,

is at least twice continuously differentiable
with minimal number of nonzero terms are:
(35) 
where is the normalized complementary coordinate within the span of the kernel, and
(36) 
is an optional normalization factor to ensure that the integral of the kernel in a 3D ball of radius is unity. The parameters and are free parameters that can be used to adjust the smoothness and width of the kernel, and can take any real numbers satisfying the condition . Note that the kernel is equivalent to the Lucy kernel commonly used in Smoothed Particle Hydrodynamics simulations [32]. The kernel can be evaluated very efficiently using only multiplication and addition when both and are integers.
Table of Quadrature Nodes and Weights
In Table 4, we list the nodes and weights of the Laguerre quadrature rules up to , using notations from Eq. 22. In Table 5, we list the nodes and weights of the Lebedev quadrature rules up to , using notations from Eq. 21. The Laguerre and Lebedev quadrature nodes can be combined using Eq. 2326 into composite grids for sampling the atomistic density field.