Automatic differentiation is a collection of techniques used to evaluate, up to machine precision, the derivative of a function specified by a computer program. It allows software developers to focus solely on designing the best model for a given problem, without having to worry about implementing any derivatives of the model with respect to its various mathematical parameters. It has already had a transformative effect in machine learning, enabling the development of many new techniques over the past decade, such as batch normalizationioffe2015batchnorm, attention layers vaswani2017attention
, and unique neural network architecturesschutt2018schnet; ronneberger2015unet.
Automatic differentiation is still relatively new in the context of computational sciences, but is already showing promise across a diverse set of applications including tensor networksliao2019differentiable-tensor-network, computational fluid dynamics schenck2018spnets-diff-cfd, and molecular dynamics simulations schoenholz2020jaxmd. Automatic differentiation is also growing increasingly popular in quantum chemistry, where it has been used to optimize molecular basis sets tamayo2018automatic-diffiqult, to calculate higher derivatives of various exchange-correlation (xc) functionals ekstrom2010arbitrary-xc of density functional theory (DFT) hohenberg1964inhomogeneous; kohn1965self-ks, and to determine arbitrary-order nuclear coordinate derivatives of electronic energies abbott2021arbitrary-geom-qc.
Automatic differentiation is also an essential stepping stone to enable direct integration of quantum chemistry methods with machine learning models and their training. In this context, a differentiable implementation of DFT was recently used to learn the xc functional li2021kohn-sham-regularizer from accurate reference calculations within the density matrix renormalization group approach, or from a mixture of computational and experimental data kasim2021learning-xc
, showing a promising new approach to developing transferable and robust xc functionals via deep learning.
Although differentiation in quantum chemistry can be done via finite-difference schemes, calculating the gradient of a parameter with respect to some other input parameter can be time consuming as one has to run the simulation as many times as the number of input parameters. Moreover, finite-difference methods are prone to numerical instabilities and are very sensitive to the chosen step-size. Automatic differentiation addresses these challenges by calculating the analytical gradient via the chain rule, eliminating both the need to run the simulation many times, and the need for step-size tuning.
To address the growing need for automatic differentiation in quantum chemistry, we introduce Differentiable Quantum Chemistry (DQC), a DFT and Hartree–Fock (HF) hartree1928wave
simulation code. DQC is written in Python using PyTorchneurips2019-pytorch and xitorch kasim2020xitorch. While PyTorch provides gradient calculations for elementary operations such as matrix multiplication and explicit eigendecomposition, xitorch provides gradient calculations for functional operations such as root finding, optimization, and implicit eigendecomposition. The use of PyTorch and xitorch in DQC enables various applications in quantum chemistry that would otherwise be far more difficult to pursue. For example, the work by Kasim and Vinko kasim2021learning-xc on learning the xc functional directly from a set of heterogeneous experimental data and calculated density profiles within the constraints of Kohn–Sham DFT already mentioned above was enabled by the use of DQC.
We begin this paper by outlining the basic theory behind DQC in section II. The implementation is described in section III. Applications that exemplify the present approach are given in section IV. We discuss the challenges in the use of automatic differentiation techniques in computational science in section V, and summarize our work in section VI.
Quantum chemical calculations of the electronic structure typically require the evaluation of abstract functionals, such as root finding for self-consistent field (SCF) iterations, and minimization for geometry optimizations and the direct minimization approach to the SCF problem. We discuss the handling of these functionals in the context of DQC in this section.
ii.1 SCF iterations
DQC is based on the use of a Gaussian basis set within the linear combination of atomic orbitals approach (LCAO). For simplicity, we will only present the theory for the spin-restricted formalism, as the spin-unrestricted (and restricted open-shell) formalisms are analogous. The SCF iterations proceed in the LCAO approach by solving Roothaan’s equation roothaan1951new; lehtola2020overview
where is the Fock matrix which is a function of the density matrix , is the orbital matrix, is the overlap matrix, and
is a diagonal matrix containing the orbital energies. The generalized eigenvalue equation in (LABEL:roothaan) can be reduced to the normal form by orthogonalizing the overlap matrix lowdin1956quantum
, and operating in the linearly independent basis spanned by the eigenvectors ofwith large enough eigenvalues; any ill-conditioning in the overlap matrix can be removed by choosing a well-defined sub basis with the help of a pivoted Cholesky decomposition lehtola2019curing. The density matrix , required to construct the Fock matrix , can be obtained by
with a diagonal matrix containing the occupation numbers of the orbitals. As discussed in ref. lehtola2020overview, the SCF procedure requires repeatedly solving (LABEL:roothaan,_dm) until self-consistency is achieved.
The Roothaan iteration in (LABEL:roothaan
) can be written mathematically as the process of finding a vectorsuch that
where is a function that takes the vector and other parameters , and returns the expected value of vector . In DQC, the parameter is represented by the Fock matrix , so the function is the procedure that solves equation (1), calculating the density matrix from equation (2) and constructing back the Fock matrix from the density matrix. The parameters represent the other variables involved in the calculation, such as the overlap matrix and the occupation number matrix . The algorithm and gradient calculation for (LABEL:roothaan,_self-consistent) are already available in PyTorch and xitorch.
ii.2 Direct minimization
An alternative approach to solving the self-consistent field equations is to directly find the orbitals that minimize the total energy by the use of optimization algorithms head1988optimization. The energy can be calculated from the density matrix which can be obtained in turn from the orbital coefficients using (LABEL:dm). This relation makes the energy a function of the orbital matrix, . However, as the orbitals must remain orthonormal, , we introduce a new variable , defined in terms of its relation with as
uq-relation-1 is the QR decomposition of the matrix
into an orthogonal matrixand an upper triangular matrix . uq-relation-2 involves the inverse square root of the overlap matrix , which can be computed using the eigendecomposition of . The energy can then be parameterized by , reducing the direct minimization problem to an unbounded optimization problem:
The gradient of the energy with respect to , i.e. , is required for an efficient solution. It is automatically computed by PyTorch and xitorch.
Once the optimum in equation (6) is found, can still be differentiated with respect to any other variables, such as the nuclear coordinates or the occupation number matrix . This is made possible by the gradient of the optimization functional provided by xitorch.
We note that our approach of using QR decomposition is slightly different from common direct minimization techniques that use the matrix exponential of a skew-Hermitian matrix such as the geometric direct minimization algorithm of ref.van2002geometric-variational.
|CHN (density fit PW92/cc-pVTZ)|
|CHO (density fit PW92/cc-pVDZ)|
DQC implements restricted and unrestricted HF and Kohn–Sham DFT calculations without periodic boundary conditions. The energy can be minimized using either SCF iterations with Broyden’s good method broyden1965class for Fock matrix updates, or with direct minimization using gradient descent with momentum pearlmutter1991gd-momentum; more elaborate SCF convergence accelerators will be implemented at a later stage of the project. All parameters are differentiable with respect to any other parameter present in the calculation, including nuclear coordinates, atomic numbers, electron occupation numbers, as well as the basis set exponents and contraction coefficients. The differentiability of these parameters allows for the exploration of new applications with DQC, a few of which are presented in what follows.
Although execution speed is not a top priority for DQC, the program shows good overall performance. speed compares the running times of DQC with PySCF sun2020recent-pyscf, an established quantum chemistry code. For small systems, we find that DQC is as efficient as PySCF. However, for moderate-sized molecules (e.g. CHN), DQC is about 4–6 times slower than PySCF. This slowdown can be attributed to DQC currently not taking advantage of the symmetry of the 2-electron integral tensor, which can reduce the tensor size by roughly a factor of 8 when the basis functions are real-valued. In contrast, DQC is only about 50% slower than PySCF for systems with density fitting. The difference is mainly caused by the higher number of SCF iterations required in DQC over PySCF, due to the use of different SCF convergence acceleration schemes in the two codes.
iv.1 Direct minimization
The automatic availability of gradients makes it easy to implement the direct minimization method in DQC. Direct minimization is known to be more robust than SCF iterations in finding a converged solution lehtola2020overview, and is particularly useful in difficult cases such as for molecules near their dissociation limits. We illustrate this in variational-vs-scf, where we show the PW92/cc-pVDZ perdew1992accurate-pw92; dunning1989gaussian-ccbasis total energy of H as a function of the internuclear distance. The SCF method does not converge for internuclear distances greater than around 17 Bohr. In contrast, the direct minimization method continues to converge even for substantially larger distances in excess of 40 Bohr.
iv.2 Checking SCF stability
One difficulty in SCF calculations is that the calculation can converge onto a saddle point, which corresponds to an excited state lehtola2020overview. If the calculation has converged onto a local minimum, the Hessian of the energy with respect to orbital rotations must be positive semidefinite. Saddle point solutions, instead, correspond to one or more negative eigenvalues of the orbital Hessian. To maintain the orthonormality of the orbitals, the Hessian is calculated based on the variable. Conveniently, we do not need to derive the expression for the Hessian matrix as it is automatically obtained by PyTorch.
Moreover, only the lowest eigenvalue needs to be calculated for the stability check, so the full Hessian matrix does not need to be constructed. We obtain the lowest eigenvalue using Davidson’s iterative algorithm davidson1975iterative, as implemented in xitorch. Note that the gradients of the lowest energy eigenvalue with respect to all other parameters in the calculation are automatically available, which may prove useful in further future applications.
We show some examples of SCF stability checks for HF/pc-1 jensen2001 calculations on diatomic molecules in scf-stability. As can be seen from the data, the lowest eigenvalues of the orbital Hessian are very close to zero for the ground state, while the excited states yield large negative eigenvalues. This illustrates that it is straightforward to check whether or not a state corresponds to a true minimum with DQC.
iv.3 Molecular properties
A key advantage of writing quantum chemistry software with automatic differentiation is that calculations of molecular properties can be implemented efficiently: all we need to know is how a specific property relates to some derivative expression. For example, vibrational modes and frequencies of a molecule can be written as
where is an matrix containing the nuclear coordinates of the atoms, is the eigendecomposition, is one of the vibrational modes, and is its frequency. The expression shown in (LABEL:vibr) is all that is needed to calculate the vibrational characteristics of the molecule; the explicit form of the derivative expression is not required.
|IR intensities (km/mol)||80.69||80.70|
|Raman activities (A/amu)||4.79||4.79|
Another useful example is the calculation of the electric dipole and quadrupole moments of a molecule. The electric dipole moment is given by
while the electric quadrupole tensor is given by
In both expressions, is the total energy of the molecule, is the electric field, is the atomic number of -th atom, and are its coordinates.
Having these vibrational and electric multipole properties readily available makes obtaining the Raman vibrational spectrum straightforward. For example, the intensity of the infrared vibrational transition for a normal mode at a frequency is given by
where is the electric dipole moment and is the matrix of atomic coordinates.
Similarly, the Raman activity at the same frequency and normal mode is proportional to p2007analytic-raman
where is the Kronecker delta.
The HF/cc-pVDZ perturbation properties of HO, found using the expressions above, are displayed in Table 3. The values are in excellent agreement with data from the the computational chemistry comparison and benchmark database (CCCBDB) NIST_CCCBDB, even though DQC does not have any explicit code to calculate the gradients required for these properties.
iv.4 Basis set optimization
The differentiability of DQC with respect to the basis set parameters enables the optimization of system-specific basis sets. Here we show this capability by optimizing a basis set for hydrocarbons within Kohn–Sham DFT using the PW92 functional. We reoptimized the cc-pVDZ basis dunning1989gaussian-ccbasis for a training set of molecules consisting of CH (methylidyne), CH (methyl radical), CH (methane), CH (acetylene), and CH (ethylene). The accuracy of the reoptimized basis was then tested on a set of hydrocarbons that were not included in the training set. The results are shown in basis-opt. The reoptimization of the cc-pVDZ basis set leads to a marked decrease of the total energies of all the molecules in the test set, as the cc-pVXZ basis set series is designed to capture correlation energies instead of polarization energies dunning1989gaussian-ccbasis, and as hydrocarbons are chemically similar.
iv.5 Alchemical perturbation
One of the differentiable quantities in DQC is the atomic number, allowing us to perform alchemical perturbation studies to predict the properties of molecules without actually needing to simulate them balawender2019exploring-alchemical; von2020alchemical-dft
. As a simple example, we will show here that some properties of diatomic molecules CO (atomic numbers 6 and 8) and BF (atomic numbers 5 and 9) can be estimated directly from the properties of N(atomic number 7) and its alchemical perturbations. To do this, we parametrize the atomic number of the atoms in the diatomic molecule by a continuous variable , so that the atoms have atomic numbers and . The molecules N, CO, and BF thus correspond to , , and , respectively.
The properties we aim to predict are the bond length and the energy at the equilibrium position, which can be mathematically expressed as
Performing HF/pc-1 calculations, we evaluate the equilibrium distance and the energy at the equilibrium position in two ways. The first way is to optimize the geometry for various fixed values of , and calculate the above properties directly. The calculations were performed separately, but with the same basis (pc-1 nitrogen basis on all atoms) for different molecules, in analogy to the procedure used in refs balawender2019exploring-alchemical and von2020alchemical-dft. The second way is to estimate those properties using a Taylor series expansion to second order in :
As and are calculated at equilibrium, calculating the perturbation terms requires propagating the gradient through the optimization process. However, automatic differentiation makes the propagation trivial, as it is automatically handled by the optimization routine in xitorch.
The results obtained via these two approaches are compared in alchemy. As we can see from these results, the properties of CO and BF can be estimated accurately from alchemical perturbations of N. The estimated equilibrium distances for CO and BF differ by and 0.024 Bohr, respectively, while the estimated equilibrium energies deviate by 33 and 587 m, respectively. This shows that the equilibrium position of new molecules can be estimated well with the alchemical gradient calculated by DQC, without actually having to calculate those molecules.
Implementing quantum chemistry with automatic differentiation libraries is a promising way to accelerate simulation workflows and to enable novel applications. However, the implementation comes with several challenges. An overarching challenge is that the automatic differentiation library used here, PyTorch, is primarily designed for deep learning rather than for scientific computing. As deep learning only focuses on low-order derivatives, accessing high-order gradients that are commonly required for scientific applications can be difficult.
Detecting numerical instabilities in high-order gradient calculations can also be demanding. Instabilities that produce NaN (not-a-number) in PyTorch are relatively straightforward to manage with its debugging feature since version 1.8, but other instabilities that do not produce a NaN can be challenging to detect.
Another challenge is debugging and profiling higher-order gradient calculations. As the gradient is generated automatically, it is hard to find the slowest part of the code, or the part that requires the most memory, because this information is not readily provided by the available profiling tools.
Besides higher-order gradient calculations, using automatic differentiation libraries also poses unique challenges in terms of code optimization. For example, quantum chemistry codes usually save the two-electron integrals on disk due to their large size and process them only in blocks small enough to fit easily into memory. This scheme can only be used in DQC if the gradients with respect to the nuclear positions and the basis are not required. To the authors’ best knowledge, there is currently no obvious structure in PyTorch to allow gradient-propagating operations to work with large tensors saved on disk.
Implementing quantum chemical calculations using automatic differentiation liberates us from needing to derive analytical gradient expressions. With gradients automatically generated by the program, software developers can focus on designing better, more detailed computational models, and on applying them to problems at hand. We have shown how automatic differentiation allows us to easily explore various applications in quantum chemistry, and are confident that further exploration of this approach will unveil new applications that have not been considered to date.
The Differentiable Quantum Chemistry (DQC) code can be found at https://github.com/diffqc/dqc/. The repository that contains the applications presented in this paper is located at https://github.com/diffqc/dqc-apps/.