Machine Learning a Molecular Hamiltonian for Predicting Electron Dynamics

07/19/2020 ∙ by Harish S. Bhat, et al. ∙ University of California, Merced 0

We develop a mathematical method to learn a molecular Hamiltonian from matrix-valued time series of the electron density. As we demonstrate for each of three small molecules, the resulting Hamiltonians can be used for electron density evolution, producing highly accurate results even when propagating 1000 time steps beyond the training data. As a more rigorous test, we use the learned Hamiltonians to simulate electron dynamics in the presence of an applied electric field, extrapolating to a problem that is beyond the field-free training data. The resulting electron dynamics predicted by our learned Hamiltonian are in close quantitative agreement with the ground truth. Our method relies on combining a reduced-dimensional, linear statistical model of the Hamiltonian with a time-discretization of the quantum Liouville equation. Ultimately, our model can be trained without recourse to numerous, CPU-intensive optimization steps. For all three molecules and both field-free and field-on problems, we quantify training and propagation errors, highlighting areas for future development.



There are no comments yet.


page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An intriguing new application of machine learning (ML) is to predict the dynamical properties of a molecular system Häse et al. (2016); Gastegger et al. (2017); Chen et al. (2018), which is essential to understand phenomena such as charge transfer and response to an applied laser field. Here we develop a method to learn a molecular Hamiltonian from a single trajectory. This learned Hamiltonian can then be used to evolve the electron density forward in time beyond the training interval, and also to predict the electronic response to an applied field. The latter shows that the learned Hamiltonian can be used to solve problems with substantially different dynamics than the problem on which the model was trained. To our knowledge, despite the surge of interest in applying ML to molecular simulation Snyder et al. (2012); Montavon et al. (2013); Ramakrishnan et al. (2015); Bartók et al. (2017); Grisafi et al. (2018); Nebgen et al. (2018); Paruzzo et al. (2018); Pronobis et al. (2018); Sifain et al. (2018); Rodríguez and Kramer (2019); Christensen et al. (2019); Ghosh et al. (2019); Wilkins et al. (2019); Ye et al. (2019); Chandrasekaran et al. (2019); Schleder et al. (2019); Jørgensen et al. (2019); Smith et al. (2019); Ceriotti (2019); Lu et al. (2020)

, there are no other procedures in the literature to estimate molecular Hamiltonians from density matrix time series.

Our work shares certain goals with other efforts to learn Hamiltonians, or energy functions and functionals that are ingredients in Hamiltonians. In this space, we primarily see efforts to learn classical Hamiltonians from time series Bhat (2020); Bertalan et al. (2019); Zhong et al. (2020) as well as efforts to learn quantum Hamiltonians for time-independent problems Behler and Parrinello (2007); Behler (2016); Li et al. (2018); Fujita et al. (2018); Innocenti et al. (2020)

. Recently, a neural network method to learn the exchange-correlation functional in TDDFT (time-dependent density functional theory) has been developed

Suzuki et al. (2020); solutions of the corresponding TDSE (time-dependent Schrödinger equation) are used to train the networks. While neural networks can attain superior predictive power on many tasks, they can also be difficult to interpret and error-prone when extrapolating beyond the training set, particularly for regression tasks of interest in the physical sciences.

Our overarching goal is to develop methods that take density matrix time series and produce Hamiltonians and/or energy functionals that are highly accurate and interpretable for a set of problems that includes but also goes beyond the training data. For time-dependent physical systems, learning Hamiltonians allows us to incorporate known time-evolution methods into a predictive model. To put it simply, if we are trying to predict the motion of an unknown spring, it may be far more accurate and direct to use with a machine learning model for , rather than to feed everything into a giant neural network. In this paper, we consider small molecular systems modeled with a small basis set in order to focus on methodological development and careful analysis of errors. The present work forms a foundation on which we can build towards studying systems and theories (such as TDDFT) in which the underlying functionals have yet to be completely determined.

The TDSE (or Dirac equation) governs the time evolution of a quantum electronic system,


where is the time-dependent many-body electronic wave function and is the electronic Hamiltonian. All equations use atomic units, with . An external perturbation, such as an applied electric field, within the Hamiltonian will give rise to the time-evolution of the wave function that dictates all properties of a quantum electronic system.

This many-body equation can only be solved for the simplest systems, so molecular electron dynamics simulations generally use a simplified, mean-field approach based on an anti-symmetrized product of single-particle orbitals or the electron density, namely TDHF (time-dependent Hartree-Fock) or TDDFT Micha and Runge (1994); Yabana and Bertsch (1996, 1997, 1999); Bertsch et al. (2000); Dreuw and Head-Gordon (2005); Marques et al. (2012); Maitra (2016). At a theoretical level, TDHF and TDDFT differ significantly, but the main, practical difference is the treatment of the electron-electron interaction term in the Hamiltonian . For both TDHF and TDDFT, this term depends on the time-dependent orbitals or time-dependent density, and hence becomes a time-dependent operator. For Hartree-Fock, the electron-electron interaction term contains Coulomb and exchange operators that describe average electron-electron interactions within the single particle picture. For DFT, the electron-electron interaction term contains the same average Coulombic electron-electron interaction (often called the Hartree term in the physics community) and an exchange-correlation term. In this study, we use the Hartree-Fock mean-field approach.

The molecules studied are closed-shell systems—all electrons in the system are spin-paired. Each pair of electrons with opposite spins is described by the same spatial function. This is referred to as restricted formalism. For closed-shell systems within such a formalism, the need to solve for spatial orbitals occupied by electrons reduces to solving for spatial orbitals, each of these doubly occupied to give a total of electrons Szabo and Ostlund (1996).

Given this choice, our mean-field Hartree-Fock Hamiltonian is then


where the first group includes one-electron terms summed over half the number of electrons (which is the total number of spatially unique electrons): the electron kinetic energy, the electron-nuclear attraction to all nuclei with nuclear charge , and the external potential . In this work, the external perturbation is an electric field treated classically within the dipole approximation . The first term in the second group is the electron-electron Coulomb repulsion. The operator , used in the last term denotes the operation of permutation between electrons represented by coordinate-variables and . This term is known as the exchange operator and arises from the antisymmetry of the electronic wave function. The second group collectively represents the electron-electron interaction operator. This operator depends on the instantaneous charge distribution of all other electrons, resulting in an implicit time-dependence of the electronic Hamiltonian (in addition to the explicit time-dependence due to ).

We next define a (reduced one-body) density operator, , that allows us to represent the total density of electronsDirac (1930):


where is the occupation of orbital : in a restricted, closed-shell system, = 2 (if is occupied) or 0 (if is unoccupied). The corresponding density matrix () is represented in the basis of as:


Let denote the commutator of the operators and . Then, the Liouville equation generalizes the time-dependent Schrödinger equation to


The time-dependent molecular orbitals are often created from a linear combination of basis functions , as , where are the time-dependent coefficients. The elements of the density matrix are given in this basis by


We transform P to an orthonormal basis, yielding (see Appendix). We can then express the TDHF or TDDFT equation as


where is the Hamiltonian matrix (or the Fock matrix in the case of TDHF) that results from integrating (2) over in the orthonormal basis. In this work, primed notations (e.g., ) are used for matrices in the orthonormal basis and unprimed notations for matrices (e.g., ) in the atomic orbital (AO) basis. Equation (7) is used in atomic, molecular, and materials calculations to simulate the dynamic electronic response to a perturbation, including predicting charge transfer and spectroscopic properties Li et al. (2005); Isborn et al. (2007); Eshuis et al. (2008); Lopata and Govind (2011); Provorse and Isborn (2016); Zhu and Herbert (2018); Nascimento and DePrince III (2016).

Note that we write to encapsulate two types of dependence on time . First, can of course depend explicitly on time—we will see this below when we consider , an external, time-dependent potential. Second, even if does not depend explicitly on time, it is in general a function of the density . In summary, is shorthand for . This implies that (7) is in fact a nonlinear system.

In this paper, our main contribution is a mathematical method to determine the molecular field-free Hamiltonian from time series observations of density matrices . By (i) using a linear model for

, (ii) formulating a quadratic loss function that stems from discretizing (


), and (iii) eliminating unnecessary degrees of freedom, we reduce the problem of training our model to a least-squares problem. We demonstrate this method using training data consisting of density matrices

for three small molecules. Among other tests, we use the machine-learned Hamiltonian to propagate, i.e., to solve (7) forward in time. We find that using the ML Hamiltonian instead of the exact Hamiltonian results in a small, acceptable level of propagation error, even on a time interval that goes beyond the training data. We then add a time-dependent external potential to our machine-learned, field-free Hamiltonian; we propagate forward in time using this augmented Hamiltonian. For each of the three molecules we consider, the resulting solutions are in close quantitative agreement with simulations that use the exact Hamiltonian. In short, our machine-learned Hamiltonian extrapolates well to a dynamical setting that differs from that of the training data.

2 Molecules and Exact Hamiltonian

The systems studied are three diatomic molecules: , , and LiH. The atoms in each of these diatomic systems are placed along the -axis, equidistant from the origin. The interatomic separations for ,  and LiH are 0.74 Å, 0.772 Å, and 1.53 Å, respectively. These simple molecular systems increase in complexity, going from a symmetric two-electron homonuclear diatomic, to a two-electron heteronuclear diatomic, to a four-electron heteronuclear diatomic. The basis set used for these calculations is STO-3G, a minimal basis set made of and atomic orbitals. For   and , this results in two basis functions (a matrix for and ), and for LiH this results in six basis functions (a matrix for and , although some elements of the matrix are zeroes due to the linear symmetry of the molecule, as discussed later).

For each molecule, the electronic structure code provides the components of the exact Hamiltonian matrix , expressed in the same AO basis set as the density matrices. Specifically, we obtain real, symmetric, constant-in-time matrices for the kinetic energy and electron-nuclear potential energy. We also obtain a

-index tensor of evaluated integrals, which we use together with the time-dependent density matrices

to compute the electron-electron potential energy term. These ingredients allow us to compute, for each molecule, the exact Hamiltonian. Electron density propagation with this exact Hamiltonian is compared to our machine learned model Hamiltonian.

3 Electron Density Matrix Data

There are two steps involved in generating the training and test sets:

  1. Generating an initial condition (the initial density matrix);

  2. Generating a trajectory using the initial condition and the differential equation (7) for propagation.

For the first step, the HF stationary state solution is determined self-consistently. The density matrix corresponding to the alpha-spin part of the solution, represented in the AO basis, is used as the initial condition. The second step involves propagating the initial density matrix using the TDHF equation.

Each of these steps was performed with the Gaussian electronic structure programFrisch et al. (2018), using a locally modified development version.

3.1 Initial Conditions

The initial density matrices have been calculated for field-free and static field conditions. For the field-free calculations, the term is set to 0. For the static field, = 0.05 a.u. (atomic units). Applying a static field creates an initial electron density that is not a stationary state of the field-free Hamiltonian and is often referred to as a delta-kick perturbation.

3.2 Trajectory Data

The density matrix from the initial condition calculation is used as the starting point for generating the real-time TDHF electron dynamics trajectory, i.e. .

For the field-free trajectories, is zero during propagation and the density matrix with the delta kick perturbation is used as the initial condition. These trajectories serve as the training data for the ML Hamiltonian.

For the field-on trajectories,the field-free initial density matrix is used and takes the following form during propagation:


where the time , the field-intensity along axis , and the field-frequency are expressed in a.u. The sinusoidal field is switched on for one full cycle (around 3.55 fs) starting at . These field-on trajectories test the ML Hamiltonian in a regime quite outside the field-free training regime.

Using a propagation step-size of 0.002 fs, the total length of each trajectory is 20000 time-steps (thus, each trajectory is 40 fs long). The real-time TDHF implementation in Gaussian uses as its propagation scheme the modified midpoint unitary transformation (MMUT) algorithmLi et al. (2005).

4 Learning the Molecular Hamiltonian

For a particular molecule, suppose we are given time series sampled on an equispaced temporal grid . We assume that , the continuous-time trajectory corresponding to our time series, satisfies (7). Our goal is to learn the Hamiltonian . Assume that the Hamiltonian contains no explicit time-dependence—this can be ensured by generating training data with no external potentials (e.g., no external forcing). Then is a Hermitian matrix of functions of , the density matrix. Our strategy therefore consists of three steps: (i) develop a model of with a finite-dimensional set of parameters , (ii) derive from (7) a statistical model, and (iii) use the model with available data to estimate .

Note that in order to obtain from , we transform from the AO basis to its canonical orthogonalization Szabo and Ostlund (1996). We leave the details of this transformation to the Appendix.

Let us split into its real and imaginary parts: . By Hermitian symmetry, is determined completely by the upper-triangular component of (including the diagonal) and by the upper-triangular component of (not including the diagonal). If has size , there are elements of and elements of that we must model. Hence there are a total of real degrees of freedom, which we can represent as an vector . Note that we can apply this same real and imaginary splitting to ; since it is also Hermitian, it can also be determined completely by a real vector of dimension . Then we formulate the following linear model for —in what follows, we use to denote either statistical models or their parameters:


Here has size , while has maximal size . For the smaller molecules in our study ( and ), where the STO-3G basis set leads to a dimension of , we use (9) with no modifications.

For LiH, a larger molecule, to handle entries of that are identically zero, and also to dramatically reduce the computational effort required for training, we modify the basic model (9). We can understand these modifications very succinctly by saying that we reduce both the number of columns of and the number of rows of all vectors in (9), namely , , and . In more detail, what we do is first form a set consisting of indices of, separately, the real and imaginary parts of training data matrices and that are not identically zero. Let us use the notation to denote the Hermitian matrix that corresponds to the real vector . For both and also for our model Hamiltonian , we restrict attention to upper-triangular matrix indices that are in . To illustrate this concretely, LiH in the STO-3G basis has dimension , and so, at each instant of time, both and are determined by entries, of which correspond to real parts and correspond to imaginary parts. Of these, only real parts and imaginary parts are not identically zero. In this way, we reduce the overall dimension of (9) from down to . We defer the algebraic details of this procedure to the Appendix.

Note that (9) is by no means the only possible model. We have explored higher-order polynomial models that, while remaining linear in the parameters , allow to depend nonlinearly on . We have also explored models in which is allowed to depend explicitly on time , including through Fourier terms such as and . None of these choices led to any improvement in validation or test error, so we focus on the linear model (9).

Now that we have (9), we turn our attention to (7). Then we use a centered-difference approximation to derive from (7) the statistical model


with denoting error. With the Frobenius norm , we form the sum of squared errors loss function


4.1 Reduction to Least Squares

The dependence of on is entirely through . We estimate by solving the optimization problem . Because (9) is linear in the parameters , we observe that (11) must be quadratic in . So, there exist constants (matrix), (vector), and (scalar) such that


Here we can identify as the gradient of with respect to evaluated at , and as the Hessian of with respect to . When is full rank, we have an exact minimizer . As is typically rank deficient, we replace with the Moore-Penrose pseudoinverse :


When , the loss achieves its global minimum at . For each of our molecules, we find that is small but non-zero. Still, we find empirically that (13) yields a nearly zero-norm gradient of , as good as what can be achieved via other numerical optimization methods.

Equation (13) constitutes the end of the training procedure. In particular, we use a method in NumPy, linalg.lstsq, to compute (13), and so we avoid the full computation of . Note that gradient-based optimization can also be used to minimize the loss (11), as we have verified. However, such a procedure requires millions of small steps, resulting in a training time (for LiH) that is - times larger than the time required to compute (13).

4.2 Error Metrics

Inserting (13) into (12) and using properties of the pseudoinverse, together with , we obtain the training error

the value of the loss function at the optimal set of parameters. The training error measures a local-in-time error, essentially equivalent to starting at the training data point , propagating one step forward in time with our learned Hamiltonian (9) and comparing with the very next training data point . Aggregating these one-step errors—squaring and summing their magnitudes—yields the training error .

We contrast the training error with the propagation error. Once we have solved for the optimal parameter values , the model Hamiltonian (9) is completely determined. Using this estimated Hamiltonian with the initial condition from our training time series, we solve (7) forward in time using a Runge-Kutta scheme, generating our statistical estimates of from up to , twice the length of the training data. For the Runge-Kutta integration, we set absolute and relative tolerances to . We then define the propagation error to be


In contrast to the training error, (14) measures the divergence between two trajectories— (training) and (propagation of ML Hamiltonian)—over many time steps. Both trajectories have exactly the same initial condition, and hence is excluded from the sum. For , the two trajectories are computed using different numerical schemes (modified midpoint for the training data and Runge-Kutta for the ML Hamiltonian propagation) and different Hamiltonians. To control for scheme-related error, we compute


where is the result of propagating forward in time using the same Runge-Kutta scheme with the exact Hamiltonian . This exact Hamiltonian is built by (i) extracting the Hamiltonian in the AO basis from the electronic structure output and then (ii) transforming to using the procedure described in the Appendix. In (15), the two trajectories being compared have the same Hamiltonian and differ only in the numerical schemes used to generate them. As a final error metric, we compute


The two trajectories compared here are computed using the same Runge-Kutta scheme, but with different Hamiltonians. By the triangle inequality, we have . We may conceptualize this as breaking down the total error into the error due to different schemes () and the error due to different Hamiltonians ().

5 Results

Table 1: After training, we report the training loss and the norm of its gradient, along with three forms of propagation error. All results are for the field-free problem. Note that the training error is a sum of squared errors; for each molecule, if we divide by the training data length , we obtain mean-squared training errors that are all on the order of , indicating approximately decimal places of accuracy. The propagation errors show a roughly even breakdown into error due to different schemes versus error due to different Hamiltonians.
Figure 1:   (left) and   (right) propagation with no field. For both molecules, we have plotted all unique real and imaginary parts of the time-dependent density matrices: actual training data (black), exact Hamiltonian propagation (blue), and ML Hamiltonian propagation (red). Note the close agreement of all three curves, on a time interval that is twice the length used for training.
Figure 2: LiH

 propagation with no field. We have plotted all unique real and imaginary parts of the time-dependent density matrices: actual training data (black), exact Hamiltonian propagation (blue), and ML Hamiltonian propagation (red). For density matrix elements with small variance, we discern slight disagreement especially at large times. For large-variance density matrix elements, the curves are in close agreement.

Figure 3: Time-dependent propagation errors in which we compare the training data against either , the result of propagating the ML Hamiltonian, or , the result of propagating the exact Hamiltonian. All calculations on the left (respectively, right) are for the field-free (respectively, field-on) problem. For each molecule, the error incurred by propagating with the ML Hamiltonian is within a constant factor of the error incurred by propagating with the exact Hamiltonian. At the final time, all errors are on the order of , except for the field-on calculations with LiH. The average values of these curves over all time correspond precisely to and —see (14), (15), and Table 1 for further details.

5.1 Training and Propagation Tests

We apply the procedure described in Section 4 to training time series of length for each of the three molecules , , and LiH. See Section 3 for details on the generation of training data. The only additional preprocessing step here was to omit the first two time steps, for each molecule, and to take the subsequent time steps as training data. This was carried out purely to avoid large numerical time-derivatives associated with the delta-kick perturbation at ; these time-derivatives form a critical part of our loss function (11). We emphasize that these training trajectories were generated with no external potential/field, using delta-kick initial conditions described in Section 3.1.

We report the value of the loss and the norm of its gradient, after training, in the first two rows of Table 1. For each molecule, the training loss is of the order of , which corresponds to an accuracy of roughly 4 decimal places. In order to visualize this accuracy, see Figures 1 and 2. For each molecule, we have plotted each of the non-zero real and imaginary components (note the -axis labels) that fully determine the Hermitian density matrices at each time step . In fact, in each panel, there are three curves: in black, we have plotted the actual training data produced by the electronic structure code; in blue, we have plotted , the result of propagating the exact Hamiltonian; and in red, we have plotted , the result of propagating the ML Hamiltonian.

For   and   (Fig. 1), the curves agree to a degree where they can hardly be distinguished. Though these curves may appear to be simple sinusoids, we assure the reader they are nonlinear oscillations, i.e., periodic solutions of the nonlinear system (7). For LiH  (Fig. 2), we can discern some divergence between the result of ML Hamiltonian propagation (red) and the other two curves, but only for those density matrix elements with relatively small variance. The sum of squares loss function (11) is biased in favor of fitting large-variance components; to avoid this, one could modify (11) to include weights that are inversely proportional to density element variances. The errors in Figure 2 consist primarily of oscillations about the black curve; the magnitudes of these oscillations are small and do not increase dramatically over time. Still, we should give the the machine-learned Hamiltonian credit for performing well when we use it to propagate for steps, twice the length of the training data used. This hints at being able to use the machine-learned Hamiltonian to extrapolate beyond the dynamical system used for training.

To understand more deeply the different sources of error, we refer to the final three rows of Table 1 together with the left panel of Figure 3. We think of as the overall RMS error between the training data and our predicted trajectory , broken down into two components and as explained above. If our goal is to track the training data, we incur errors of the same order of magnitude when we use either the ML Hamiltonian or the exact Hamiltonian. Consistent with Fig. 2, we find the largest gap between exact and ML Hamiltonian propagation for LiH.

Table 2: For the field-on problem, we report three forms of propagation error corresponding to field-on versions of (14), (15), and (16). Here measures the difference between (i) propagation of the ML Hamiltonian plus and (ii) the output of an electronic structure code for the field-on problem; measures the difference between (ii) and (iii) propagation of the exact Hamiltonian plus . Finally, measures the difference between (i) and (iii). Overall, we find that the errors are lower than in Table 1, indicating that the ML Hamiltonian succeeds in solving the field-on problem.
Figure 4:   (left) and   (right) propagation with field. The top panel of each plot gives the applied electric field (8). In subsequent panels, for both molecules, we plot all unique real and imaginary parts of the time-dependent density matrices: actual training data (black), exact Hamiltonian propagation (blue), and ML Hamiltonian propagation (red). By ML Hamiltonian, we mean the Hamiltonian trained on the field-free data plus given by (8). Note the close agreement of all three curves, on a time interval that is twice the length used for training. This is a true test of whether the learned Hamiltonian can extrapolate to problem settings beyond the one used for training.
Figure 5: LiH propagation with field. We plot all unique real and imaginary parts of the time-dependent density matrices: actual training data (black), exact Hamiltonian propagation (blue), and ML Hamiltonian propagation (red). By ML Hamiltonian, we mean the Hamiltonian trained on the field-free LiH  data plus given by (8). Note the close agreement of all curves, on a time interval that is twice the length used for training. This is a true test of whether the learned Hamiltonian can extrapolate to problem settings beyond the one used for training. We omit a plot of the electric field here—see the top panels of Figure 4.
Figure 6: Time-dependent propagation errors in which we compare , the result of propagating the ML Hamiltonian, with , the result of propagating the exact Hamiltonian. All results were computed using the same Runge-Kutta scheme, isolating the error due to the different Hamiltonians. We include both field-free and field-on calculations. Note that all results are plotted on a log scale. The results show that when we propagate both the ML and exact Hamiltonians using the same scheme, the errors between the two resulting trajectories remain small even as we take hundreds of time steps. The average values of these curves over all time correspond precisely to —see (16) and Table 1 for further details.

5.2 Electric Field Tests

After learning a field-free Hamiltonian for each of the three molecules, we compared the values of and along the training trajectories. We found that the ML Hamiltonian does not equal the exact Hamiltonian. This led us to question whether the ML Hamiltonian could solve a problem well outside the training regime. We therefore augmented the ML Hamiltonian with an applied electric field, i.e., the time-dependent external potential given in (8). Using the same Runge-Kutta scheme and tolerances described earlier, we propagated for steps. We compared these results with test data produced by an electronic structure code, and also the results of propagating the exact Hamiltonian, augmented with , via our Runge-Kutta method.

For a first view of the field-on results, see Table 2 and Figures 4 and 5. In particular, the top panels of Figure 4 show the applied electric field; note that it is switched off abruptly after one period. We can immediately discern that the applied field substantially alters the electron density from the field-off case. Still, in each panel, we see excellent agreement between all three curves in each plot: the ground truth solution produced by an electronic structure code (black), the result of propagating the exact Hamiltonian plus (blue), and the result of propagating the ML Hamiltonian plus (red). Table 2, in which we find errors that are roughly an order of magnitude lower than those in Table 1, confirms that all computed densities are in close quantitative agreement. To repeat, all field-on results are for a time interval that is twice the length used for training, and training was conducted using field-off data only. Overall, we take these results to indicate that the ML Hamiltonian can indeed extrapolate to problem settings beyond the one used for training.

For a deeper understanding of the field-on results, we focus on the right panel of Figure 3 and Figure 6. In the right panel of Figure 3, we compare (i) the result of propagating the ML Hamiltonian plus against (ii) the ground truth solution, the output of the electronic structure code for the field-on problem. We also compare (ii) with (iii) the result of propagating the exact Hamiltonian plus . The plots indicate that, for all three molecules and especially for LiH, the error between (i) and (ii) is almost identical to that between (ii) and (iii). This indicates that the bulk of the error is due to our use of a Runge-Kutta scheme instead of the MMUT scheme used in the electronic structure code. To confirm this, we consult Figure 6, in which we compare (i) and (iii) directly. All solutions here are computed using the same Runge-Kutta scheme. For each molecule, we see that the errors for the field-on problems are consistently smaller than those for the field-off problems. We conclude from these results that the ML Hamiltonian can be used to compute the electronic response to an applied electric field.

A short derivation will show that it is not automatic to expect the augmented ML Hamiltonian to propagate correctly. Let us work in continuous time, to eliminate error due to discrete-time propagation; in this idealized setting, we start with the statement that both of our field-free Hamiltonians, (exact) and (ML), satisfy the Liouville equation:

Subtracting these equations, and defining the error , we obtain


Now we augment both Hamiltonians with an external field . Let denote the true density for the problem with the external field. It must satisfy

Via , we obtain

As (17) does not in general imply that the starred term vanishes, we cannot conclude that the true density satisfies the Liouville equation with the augmented ML Hamiltonian . Based on the above derivation, if we solve the Liouville equation using the augmented ML Hamiltonian, we expect to obtain a time-dependent density that differs from . As we are able to use the ML Hamiltonian successfully on the problem with an applied electric field, we hypothesize that the error is structured in such a way that enables us to extrapolate to new problems. We plan to test this hypothesis in future work.

5.3 Reproducibility

All code required to reproduce all training and test results (including plots) is available on GitHub. Training data is available from the authors upon request. To anonymize the draft, we have omitted the specific URL and email addresses.

6 Discussion

Our current work demonstrates that, from a single time series consisting of time-dependent density matrices, we can effectively learn a Hamiltonian. This ML Hamiltonian can be used for propagation in both the field-off and field-on settings. Importantly, training with a single field-free trajectory, our ML Hamiltonian has the potential to predict an accurate electronic response to a large variety of field pulse perturbations, opening the door to laser-field controlled chemistry. The present work leads to two main areas of future work. The first area concerns technical improvements to the procedure itself, including (i) to replace (11) with a weighted loss function, to account for density elements that oscillate on different vertical scales, (ii) to propagate our ML Hamiltonian using the MMUT scheme, thus eliminating the kind of error quantified by , and (iii) to further explore reducing the number of degrees of freedom in the ML Hamiltonian. The second area concerns improving our overall understanding of the procedure, and applying it to systems of greater chemical and physical interest. In this area, further work is needed to understand the difference between the exact and ML Hamiltonians, whether this difference can be decreased by training on multiple trajectories, and how far outside the training regime we can push the ML Hamiltonian. We can also seek to learn the operator rather than the matrix representation. In this way, we can push this procedure beyond known physics (as explored here) to systems where the underlying potential energy terms are not known with sufficient accuracy or precision.

This work was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Award Number DE-SC0020203. We acknowledge computational time on the MERCED cluster (funded by NSF ACI-1429783), and on the Nautilus cluster, which is supported by the Pacific Research Platform (NSF ACI-1541349), CHASE-CI (NSF CNS-1730158), and Towards a National Research Platform (NSF OAC-1826967). Additional funding for Nautilus has been supplied by the University of California Office of the President.


Canonical Orthogonalization.

Let be the overlap matrix with . Because it is real and symmetric, we have where is diagonal and real, and

is a real orthogonal matrix. Now we form

. Then, we go from to via

If is the Hamiltonian in the AO basis, the Hamiltonian in the orthogonalized basis is

Dimensionality Reduction.

For LiH, certain elements of the density matrix are identically zero for all . We thus define a reduced state vector that consists of the non-zero upper-triangular degrees of freedom, i.e., the elements that are necessary to reconstruct all of . Out of these elements, we take the first to correspond to elements of and the remaining to correspond to elements of . Define by


We form a mapping , whose purpose is to map one-dimensional indices of the reduced vector

to ordered pair indices of the full matrix

. We define implicitly through (18) and the following:


Looping over the entries , this equation let us go back and forth from the full complex matrix to the reduced real state vector .

Importantly, we now follow precisely the same procedure, with the same mapping and set , to form a reduced Hamiltonian vector . We then formulate a reduced-dimensionality version of (9):


The matrix is now of dimension ; all other objects in this equation are vectors of dimension . The training procedure then holds without further modifications.


  • A. P. Bartók, S. De, C. Poelking, N. Bernstein, J. R. Kermode, G. Csányi, and M. Ceriotti (2017) Machine learning unifies the modeling of materials and molecules. Science Advances 3 (12). External Links: Document, Cited by: §1.
  • J. Behler and M. Parrinello (2007) Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98 (14), pp. 146401. External Links: ISBN 0031-9007, ISSN 00319007 Cited by: §1.
  • J. Behler (2016) Perspective: machine learning potentials for atomistic simulations. The Journal of Chemical Physics 145 (17), pp. 170901. External Links: Document Cited by: §1.
  • T. Bertalan, F. Dietrich, I. Mezić, and I. G. Kevrekidis (2019) On learning hamiltonian systems from data. Chaos: An Interdisciplinary Journal of Nonlinear Science 29 (12), pp. 121107. External Links: Document, Link Cited by: §1.
  • G. F. Bertsch, J.-I. Iwata, A. Rubio, and K. Yabana (2000) Real-space, real-time method for the dielectric function. prb 62 (12), pp. 7998–8002. External Links: Document Cited by: §1.
  • H. S. Bhat (2020) Learning and interpreting potentials for classical Hamiltonian systems. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019, P. Cellier and K. Driessens (Eds.), Note: Communications in Computer and Information Science 1167. Cited by: §1.
  • M. Ceriotti (2019) Unsupervised machine learning in atomistic simulations, between predictions and understanding. Journal of Chemical Physics 150 (15). External Links: Document, 1902.05158, ISSN 00219606 Cited by: §1.
  • A. Chandrasekaran, D. Kamal, R. Batra, C. Kim, L. Chen, and R. Ramprasad (2019) Solving the electronic structure problem with machine learning. npj Computational Materials 5 (1). External Links: Document, Link Cited by: §1.
  • W. K. Chen, X. Y. Liu, W. H. Fang, P. O. Dral, and G. Cui (2018) Deep Learning for Nonadiabatic Excited-State Dynamics. J. Phys. Chem. Lett. 9 (23), pp. 6702–6708. External Links: ISSN 19487185 Cited by: §1.
  • A. S. Christensen, F. A. Faber, and O. A. Von Lilienfeld (2019) Operators in quantum machine learning: Response properties in chemical space. J. Chem. Phys 150 (6), pp. 064105. External Links: ISSN 00219606 Cited by: §1.
  • P. A. M. Dirac (1930) Note on Exchange Phenomena in the Thomas Atom. Mathematical Proceedings of the Cambridge Philosophical Society 26 (3), pp. 376–385. External Links: Document Cited by: §1.
  • A. Dreuw and M. Head-Gordon (2005) Single-reference ab initio methods for the calculation of excited states of large molecules. cr 105 (11), pp. 4009–4037. Note: PMID: 16277369 External Links: Document, Link, Cited by: §1.
  • H. Eshuis, G. G. Balint-Kurti, and F. R. Manby (2008) Dynamics of molecules in strong oscillating electric fields using time-dependent Hartree–Fock theory. jcp 128 (11), pp. 114113. Cited by: §1.
  • M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery, J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman, and D. J. Fox (2018) Gaussian Development Version Revision I.14+. Note: Gaussian Inc. Wallingford CT Cited by: §3.
  • H. Fujita, Y. O. Nakagawa, S. Sugiura, and M. Oshikawa (2018)

    Construction of hamiltonians by supervised learning of energy and entanglement spectra

    Phys. Rev. B 97, pp. 075114. External Links: Document, Link Cited by: §1.
  • M. Gastegger, J. Behler, and P. Marquetand (2017) Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 8 (10), pp. 6924–6935. External Links: ISSN 20416539 Cited by: §1.
  • K. Ghosh, A. Stuke, M. Todorović, P. B. Jørgensen, M. N. Schmidt, A. Vehtari, and P. Rinke (2019) Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra. Adv. Sci. 6 (9), pp. 1801367–1801374. External Links: ISSN 21983844 Cited by: §1.
  • A. Grisafi, D. M. Wilkins, G. Csányi, and M. Ceriotti (2018) Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. Phys. Rev. Lett. 120 (3), pp. 36002. External Links: ISSN 10797114 Cited by: §1.
  • F. Häse, S. Valleau, E. Pyzer-Knapp, and A. Aspuru-Guzik (2016) Machine learning exciton dynamics. Chem. Sci. 7 (8), pp. 5139–5147. External Links: ISSN 20416539 Cited by: §1.
  • L. Innocenti, L. Banchi, A. Ferraro, S. Bose, and M. Paternostro (2020) Supervised learning of time-independent hamiltonians for gate design. New Journal of Physics 22 (6), pp. 065001. External Links: Document Cited by: §1.
  • C. M. Isborn, X. Li, and J. C. Tully (2007) TDDFT ehrenfest dynamics: collisions between atomic oxygen and graphite clusters. jcp 126, pp. 134307. Cited by: §1.
  • M. S. Jørgensen, H. L. Mortensen, S. A. Meldgaard, E. L. Kolsbjerg, T. L. Jacobsen, K. H. Sørensen, and B. Hammer (2019) Atomistic structure learning. Journal of Chemical Physics. External Links: Document, 1902.10501, ISSN 00219606 Cited by: §1.
  • H. Li, C. Collins, M. Tanha, G. J. Gordon, and D. J. Yaron (2018) A density functional tight binding layer for deep learning of chemical hamiltonians. Journal of Chemical Theory and Computation 14 (11), pp. 5764–5776. Note: PMID: 30351008 External Links: Document, Link Cited by: §1.
  • X. Li, S. M. Smith, A. N. Markevitch, D. A. Romanov, R. J. Levis, and H. B. Schlegel (2005) A time-dependent Hartree-Fock approach for studying the electronic optical response of molecules in intense fields. Physical Chemistry Chemical Physics 7 (2), pp. 233–239. External Links: Document, ISSN 14639076 Cited by: §1, §3.2.
  • K. Lopata and N. Govind (2011) Modeling fast electron dynamics with real-time time-dependent density functional theory: Application to small molecules and chromophores. jctc 7 (5), pp. 1344–1355. External Links: Document, ISBN 1549-9618 Cited by: §1.
  • C. Lu, Q. Liu, Q. Sun, C. Hsieh, S. Zhang, L. Shi, and C. Lee (2020) Deep Learning for Optoelectronic Properties of Organic Semiconductors. J. Phys. Chem. C 124, pp. 7048–7060. External Links: ISSN 1932-7447 Cited by: §1.
  • N. T. Maitra (2016) Perspective: Fundamental aspects of time-dependent density functional theory. jcp 144 (22), pp. 220901. External Links: Document Cited by: §1.
  • M. A. L. Marques, N. T. Maitra, F. M. S. Nogueira, E. K. U. Gross, and A. Rubio (2012) Fundamentals of time-dependent density functional theory. Springer-Verlag. Cited by: §1.
  • D. A. Micha and K. Runge (1994) Time-dependent many-electron approach to slow ion-atom collisions: the coupling of electronic and nuclear motions. pra 50, pp. 322–336. External Links: Document, Link Cited by: §1.
  • G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, A. Tkatchenko, K. R. Müller, and O. Anatole Von Lilienfeld (2013) Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 15, pp. 095003. External Links: ISSN 13672630 Cited by: §1.
  • D. R. Nascimento and A. E. DePrince III (2016) Linear absorption spectra from explicitly time-dependent equation-of-motion coupled-cluster theory. jctc 12 (12), pp. 5834–5840. External Links: Document Cited by: §1.
  • B. Nebgen, N. Lubbers, J. S. Smith, A. E. Sifain, A. Lokhov, O. Isayev, A. E. Roitberg, K. Barros, and S. Tretiak (2018) Transferable Dynamic Molecular Charge Assignment Using Deep Neural Networks. J. Chem. Theory Comput. 14 (9), pp. 4687–4698. External Links: ISSN 15499626 Cited by: §1.
  • F. M. Paruzzo, A. Hofstetter, F. Musil, S. De, M. Ceriotti, and L. Emsley (2018) Chemical shifts in molecular solids by machine learning. Nat. Commun. 9, pp. 4501. External Links: ISSN 20411723 Cited by: §1.
  • W. Pronobis, K. T. Schütt, A. Tkatchenko, and K. R. Müller (2018) Capturing intensive and extensive DFT/TDDFT molecular properties with machine learning. Eur. Phys. J. B 91 (8), pp. 178–184. External Links: ISSN 14346036 Cited by: §1.
  • M. R. Provorse and C. M. Isborn (2016) Electron dynamics with real-time time-dependent density functional theory. ijqc 116 (10), pp. 739–749. External Links: Document Cited by: §1.
  • R. Ramakrishnan, M. Hartmann, E. Tapavicza, and O. A. von Lilienfeld (2015) Electronic spectra from TDDFT and machine learning in chemical space. J. Chem. Phys. 143, pp. 084111. Cited by: §1.
  • M. Rodríguez and T. Kramer (2019) Machine learning of two-dimensional spectroscopic data. Chem. Phys. 520 (December 2018), pp. 52–60. External Links: ISSN 03010104 Cited by: §1.
  • G. R. Schleder, A. C. M. Padilha, C. M. Acosta, M. Costa, and A. Fazzio (2019) From DFT to machine learning: recent approaches to materials science–a review. Journal of Physics: Materials 2 (3), pp. 032001. External Links: Document, Link Cited by: §1.
  • A. E. Sifain, N. Lubbers, B. T. Nebgen, J. S. Smith, A. Y. Lokhov, O. Isayev, A. E. Roitberg, K. Barros, and S. Tretiak (2018) Discovering a Transferable Charge Assignment Model Using Machine Learning. J. Phys. Chem. Lett 9 (16), pp. 4495–4501. External Links: ISSN 19487185 Cited by: §1.
  • J. S. Smith, B. T. Nebgen, R. Zubatyuk, N. Lubbers, C. Devereux, K. Barros, S. Tretiak, O. Isayev, and A. E. Roitberg (2019)

    Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning

    Nature Communications. External Links: Document, ISSN 20411723 Cited by: §1.
  • J. C. Snyder, M. Rupp, K. Hansen, K. R. M??ller, and K. Burke (2012) Finding density functionals with machine learning. Physical Review Letters 108 (25), pp. 1–5. External Links: Document, 1112.5441, ISBN 8572533357217, ISSN 00319007 Cited by: §1.
  • Y. Suzuki, R. Nagai, and J. Haruyama (2020) Machine learning exchange-correlation potential in time-dependent density-functional theory. Phys. Rev. A 101, pp. 050501. External Links: Document Cited by: §1.
  • A. Szabo and N. S. Ostlund (1996) Modern quantum chemistry: introduction to advanced electronic structure theory. First edition, Dover Publications, Inc., Mineola. Note: This is the revised first edition, originally published in 1989 by McGraw-Hill Publishing Company, New York, with an additional section written by M. C. Zerner. First edition originally published in 1982. Cited by: §1, §4.
  • D. M. Wilkins, A. Grisafi, Y. Yang, K. U. Lao, R. A. DiStasio, and M. Ceriotti (2019) Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proc. Natl. Acad. Sci. U.S.A. 116 (9), pp. 3401–3406. External Links: ISSN 10916490 Cited by: §1.
  • K. Yabana and G. F. Bertsch (1996) Time-dependent local-density approximation in real time. prb 54 (7), pp. 4484–4487. External Links: Document Cited by: §1.
  • K. Yabana and G. F. Bertsch (1999) Time-dependent local-density approximation in real time: application to conjugated molecules. ijqc 75 (1), pp. 55–66. External Links: Document, ISBN 0020-7608 Cited by: §1.
  • K. Yabana and G.F. Bertsch (1997) Optical response of small carbon clusters. zpd 42 (3), pp. 219–225. External Links: Document Cited by: §1.
  • S. Ye, W. Hu, X. Li, J. Zhang, K. Zhong, G. Zhang, Y. Luo, S. Mukamel, and J. Jiang (2019) A neural network protocol for electronic excitations of N-methylacetamide. Proc. Natl. Acad. Sci. U.S.A. 116 (24), pp. 11612–11617. External Links: ISSN 10916490 Cited by: §1.
  • Y. D. Zhong, B. Dey, and A. Chakraborty (2020) Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control. In Proceedings of ICLR 2020, External Links: Link Cited by: §1.
  • Y. Zhu and J. M. Herbert (2018) Self-consistent predictor/corrector algorithms for stable and efficient integration of the time-dependent Kohn-Sham equation. jcp 148 (4), pp. 044117. Cited by: §1.