Log In Sign Up

Practical error bounds for properties in plane-wave electronic structure calculations

We propose accurate computable error bounds for quantities of interest in electronic structure calculations, in particular ground-state density matrices and energies, and interatomic forces. These bounds are based on an estimation of the error in terms of the residual of the solved equations, which is then efficiently approximated with computable terms. After providing coarse bounds based on an analysis of the inverse Jacobian, we improve on these bounds by solving a linear problem in a small dimension that involves a Schur complement. We numerically show how accurate these bounds are on a few representative materials, namely silicon, gallium arsenide and titanium dioxide.


page 1

page 2

page 3

page 4


Anderson acceleration with approximate calculations: applications to scientific computing

We provide rigorous theoretical bounds for Anderson acceleration (AA) th...

Efficient Ab-Initio Molecular Dynamic Simulations by Offloading Fast Fourier Transformations to FPGAs

A large share of today's HPC workloads is used for Ab-Initio Molecular D...

Numerical stability and efficiency of response property calculations in density functional theory

Response calculations in density functional theory aim at computing the ...

The error bounds and perturbation bounds of the absolute value equations and some applications

In this paper, by introducing a class of absolute value functions, we st...

Tight Error Bounds for Nonnegative Orthogonality Constraints and Exact Penalties

For the intersection of the Stiefel manifold and the set of nonnegative ...

A New Truncation Algorithm for Markov Chain Equilibrium Distributions with Computable Error Bounds

This paper introduces a new algorithm for numerically computing equilibr...

1. Introduction

This article focuses on providing practical error estimates for numerical approximations of electronic structure calculations. These simulations are key in many domains, as they allow for the simulation of systems at the atomic and molecular scale. They are particularly useful in the fields of chemistry, materials science, condensed matter physics, or molecular biology. Over the many electronic structure models available, Kohn–Sham (KS) Density Functional Theory (DFT) [19] with semilocal density functionals is one of the most used in practice, as it offers a good compromise between accuracy and computational cost. We will focus on this model in this article. Note that the mathematical formulation of this problem is similar to that of the Hartree–Fock [13] or Gross–Pitaevskii equations [26], so that what we present in the context of DFT can be easily extended to such contexts. We will focus on plane-wave discretizations within the pseudopotential approximation, which are most suited for the study of crystals; some (but not all) of our methodology can be applied in other contexts as well.

In this field, the first and most crucial problem is the determination of the electronic ground state of the system under consideration. Mathematically speaking, this problem is a constrained minimization problem. Writing the first-order optimality conditions of this problem leads to an eigenvalue problem that is nonlinear in the eigenvectors. The unknown is a subspace of dimension

, the number of electrons in the system; this subspace can be conveniently described using either the orthogonal projector on it (density matrix formalism) or an orthonormal basis of it (orbital formalism). This problem is well-known in the literature and the interested reader is referred to [7] and references therein for more information on how it is solved in practice.

Solving this problem numerically requires a number of approximations, so that only an approximation of the exact solution can be computed. Being able to estimate the error between the exact and the approximate solutions is crucial, as this information can be used to reduce the high computational cost of such methods by an optimization of the approximation parameters, and maybe more importantly, to add error bars to quantities of interest (QoI) calculated from the approximate solution. In our context, such QoI are typically the ground-state energy of the system and the forces on the atoms in the system, but may also include any information computable from the Kohn–Sham ground state.

While such error bounds have been developed already some time ago for boundary value problems, e.g. in the context of finite element discretization, using in particular a posteriori error estimation [29], the development of bounds in the context of electronic structure is quite recent, and still incomplete. Computable and guaranteed error bounds for linear eigenvalue problems have been proposed in the last decade [2, 3, 4, 8, 20]; we refer to [23, Chapter 10] for a recent monograph on the subject. Specifically for electronic structure calculations, some of us proposed guaranteed error bounds for linear eigenvalue equations [15]. For fully self-consistent (nonlinear) eigenvalue equations, an error bound was proposed for a simplified 1D Gross–Pitaevskii equation in [10]; however the computational cost of evaluating this error bound in this contribution is quite high. So far, no error bound has been proposed for the error estimation of general QoI in electronic structure calculations, in particular for the interatomic forces. This is what we are trying to achieve in this contribution.

In this article, we use a general approach based on a linearization of the Kohn–Sham equations. It is instructive to compare our approach to those used in a general context. Assume we want to find such that , for some nonlinear function (the residual). Near a solution , we have , and therefore, if is invertible, we have the error-residual relationship


This is the same approximation that leads to the Newton algorithm. Assume now that we want to compute a QoI , where is a function; then we have the approximate equality with computable right-hand side:


From here, we obtain the simple first estimate:

where is any chosen norm on , and the induced operator norms on (note that and ). This approximate bound can be turned into a rigorous one using information on the second derivatives of ; see for instance [28].

To extend this approach to Kohn–Sham models, we encounter several difficulties:

  • The structure of our problem is not easily formulated as above because of the presence of constraints and degeneracies. We solve this using the geometrical framework of [7] to identify the appropriate analog to the Jacobian .

  • The computation of the Jacobian and its inverse is prohibitively costly. We use iterative strategies to keep this cost manageable.

  • Choosing the right norm is not obvious in this context. For problems involving partial differential equations, where

    includes partial derivatives, it is natural to consider Sobolev-type norms, with the aim of making a bounded operator between the relevant function spaces. We explore different choices and their impacts on the error bounds.

  • The operator norm inequalities

    are very often largely suboptimal, even with appropriate norms. We quantify this on representative examples.

  • The structure of the residual plays an important role. For instance, when  results from a Galerkin approximation to a partial differential equation, is orthogonal to the approximation space. In the context of plane-wave discretizations, this means the residual only contains high-frequency Fourier components. We demonstrate how this impacts the quality of the above bounds when represents the interatomic forces, in which case mostly acts on low-frequency components.

Throughout the paper, we will provide numerical tests to illustrate our results. All these tests are performed with the DFTK software [16], a recent Julia package solving the Kohn–Sham equations in the pseudopotential approximation using a plane-wave basis, thus particularly well suited for periodic systems such as crystals [21]. We are mostly interested in three QoI: the ground-state energy, the ground-state density, and the interatomic forces, the latter being computed using the Hellmann–Feynman theorem. We will demonstrate the main points with simulations on a simple system (bulk silicon), then present results for more complicated systems.

We will be interested in this paper only in quantifying the discretization error. However, the general framework we develop can be used also to treat other types of error (such as the ones resulting from an iterative solving of the underlying minimization problem). We only consider insulating systems at zero temperature, and do not consider the error due to finite Brillouin zone sampling [6]; extending the formalism to finite temperature and quantifying the sampling error, particularly for metals, is an interesting topic for future work.

The outline of this article is as follows. In Section 2, we present the mathematical framework related to the solution of the electronic structure minimization problem, describing in particular objects related to the tangent space of the constraint manifold. In Section 3, we present the Kohn–Sham model and the numerical framework in which our tests will be performed. In Section 4, we propose a first crude bound of the error between the exact and the numerically computed solution based on a linearization argument as well as an operator norm inequality, both for the error on the ground-state density matrix and on the forces. In Section 5, we refine this bound by splitting between low and high frequencies, and using a Schur complement to refine the error bound on the low frequencies. Finally, in Section 6, we provide numerical simulations on more involved materials systems, namely on a gallium arsenide system (GaAs) and a titanium dioxide system (TiO2), showing that the proposed bounds work similarly well in those cases.

2. Mathematical framework

In this section, we present the models targeted by our study, as well as the elementary tools of differential geometry used to derive and compute the error bounds on the QoI.

2.1. General framework

The work we present here is valid for a large class of mean-field models including different instances of Kohn–Sham models, the Hartree–Fock model, and the time-independent Gross–Pitaevskii model and its various extensions. To study them in a unified way, we use a mathematical framework similar to the one in [7]. To keep the presentation simple, we will work in finite dimension and consider that the solutions of the problem can be very accurately approximated in a given finite-dimensional space of (high) dimension , which we identify with . We denote by

the inner product of

, seen as a vector space over

. We equip the -vector space of square Hermitian matrices

with the Frobenius inner product . Note that although it is important in applications to allow for complex orbitals and density matrices, the space of Hermitian matrices is not a vector space over , and therefore we will always consider vector spaces over .

The density-matrix formulation of the mean-field model in this reference approximation space reads


is the manifold of rank- orthogonal projectors (density matrices), and is a nonlinear energy functional. The parameter (with ) is a fixed integer depending on the physical model, and not of its discretization in a finite-dimensional space. For mean-field electronic structure models, is the number of electrons or electron pairs in the system (hence the notation ); in the standard Gross–Pitaevskii model, . The energy functional is of the form

where is the linear part of the mean-field Hamiltonian, and a nonlinear contribution depending on the considered model (see Section 3 for the expressions in the Kohn–Sham model). For simplicity of the presentation we will ignore spin in the formalism, but we will include it in the numerical simulations. The set is diffeomorphic to the Grassmann manifold of -dimensional complex vector subspaces of .

Problem (3) always has a minimizer since it consists in minimizing a continuous functions on a compact set. This minimizer may be unique, or not, depending on the model and/or the physical system under study. We will not elaborate here on this uniqueness issue, and assume for the sake of simplicity that (3) has a unique minimizer, which we denote by .

Besides the ground-state energy , we can compute from various other physical quantities of interest (QoI), for instance, the ground-state energy and density, and the interatomic forces. We denote such a QoI by .

We consider the case when is too large for problem (3) to be solved completely in the reference approximation space. To solve problem (3), we therefore consider a finite-dimensional subspace of of dimension and solve instead the variational approximation of (3) in , namely


Our goal is then to estimate the errors on the QoI , where is typically the minimizer of (4), given the variational space , and the norm is specific to the QoI. The latter can be a scalar, e.g. the ground-state energy, or a finite or infinite dimensional vector, e.g. the interactomic forces, or the ground-state density.

2.2. First-order geometry

The manifold is a smooth manifold. Its tangent space at is given by

where is the orthogonal projection on for the canonical inner product of . The set is the set of Hermitian matrices that are off-diagonal in the block decomposition induced by and ; more explicitly, if for some unitary , then

The orthogonal projection on for the Frobenius inner product is given by


where is the commutator of and . Linear operators acting on spaces of matrices are sometimes referred to as super-operators in the physics literature. Throughout this paper, super-operators will be written in bold fonts.

The mean-field Hamiltonian is the gradient of the energy at a given point (again for the Frobenius inner product):

To simplify the notation, we set


The first-order optimality condition for (3) is that is orthogonal to the tangent space , which can be written, using (5) and (6), as . This corresponds to the residual

being zero at . The residual function can be seen as a nonlinear map from to itself, and its restriction to as a vector field on since for all , .

2.3. Second-order geometry

We introduce the super-operators and , defined at and acting on . These operators were already introduced in [7, Section 2.2], but we recall here their definitions for completeness. To simplify the notation, we will set , .

The super-operator is the Hessian of the energy projected onto the tangent space to at :


By construction, is an invariant subspace of . Note that for linear eigenvalue problems, i.e. when .

The super-operator is defined by


The tangent space is also an invariant subspace of .

It is shown in [7] that the energy of a density matrix with is . The restriction of the operator to the invariant subspace can therefore be identified with the second-order derivative of on the manifold . Since is a minimum, it follows that

Note that in general, the second-order derivative of a function defined on a smooth manifold is not an intrinsic object; it depends not only on the tangent structure of the manifold, but also on the chosen affine connection. However, at the critical point of on the manifold, the contributions to the second derivative due the connection vanish and the second-order derivative becomes intrinsic.

For our purposes, it will be convenient to define this second-order derivative also outside of . Relying on (7)-(8), we define for any the super-operators and through


Both and admit as an invariant subspace and their restrictions to are Hermitian for the Frobenius inner product. The map is smooth and the restriction of to provides a computable approximation of the second-order derivative of in a neighborhood of (whatever the choice of the affine connection).

2.4. Density matrix and orbitals

The framework we have outlined above is particularly convenient for stating the second-order conditions, but much too expensive computationally as it requires the storage and manipulation of (low-rank) large matrices. In practice, it is more effective to work directly with orbitals, i.e. write for any


where is a collection of orbitals satisfying and , and where we used Dirac’s bra-ket notation: for , and . Problem (3) can be reformulated as

Note that the orbitals are only defined up to a unitary transform: if

is a unitary matrix, then

and give rise to the same density matrix. This means that the minimizers of this minimization problem are never isolated, which creates technical difficulties that are not present in the density matrix formalism.

Let us fix a with , and consider an element of the tangent plane . By completing to an orthogonal basis and writing out in this basis, it is easy to see that the constraints , , imply that can be put in the form


where is a set of orbital variations satisfying . Furthermore, under this condition, is unique, so that (11) establishes a bijection between and . We will therefore treat equivalently elements of the tangent space either in the density matrix representation or the orbital representation , writing


This orbital representation of by is more economical computationally, only requiring the storage and manipulation of orbitals satisfying . Similarly, the manipulation of objects in the tangent plane is more efficiently done through their orbital variations satisfying .

All operations on density matrices or their variations can indeed be carried out in this orbital representation. For instance, the computation of the energy can be performed efficiently in practice, as explained in Section 3, and the residual at also has a nice representation in terms of orbitals:

which is easily recognized as similar to the residual of a linear eigenvalue problem.

Likewise, operators on can be identified in this fashion. For instance,


The computation of can be performed similarly. All the numerical results in this article are performed using the orbital formalism.

Remark 1.

Note that the condition that is not necessary for to define an element of . However, without this gauge condition, is not unique. This is simply a manifestation at the infinitesimal level of the noninjectivity of the map between and . Because of this, the derivative is not injective between the tangent spaces and . In more concrete terms, in the example case where are the first basis vectors, any element is of the form which can be written in the form (11) with . However, such a can also be written in the form (11) with for any anti-hermitian matrix . The gauge condition forces to be zero, making unique. In more formal terms, the map induces a principal bundle structure on the base space (the Grassmann manifold) with total space (the Stiefel manifold) and characteristic fiber . This naturally splits the tangent space into the vertical space , and a complementary horizontal space, which we take to be the orthogonal complement, .

The orbital formalism can be used to give a more concrete interpretation of the first-order optimality condition . Indeed, this condition can be rewritten as

from which it follows that and can be jointly diagonalized in an orthonormal basis:


In many applications, the orbitals spanning the range of (see (14)) are those corresponding to the lowest eigenvalues of . This is called the Aufbau principle in physics and chemistry. This principle is always satisfied in the (unrestricted) Hartree–Fock setting, and most of the times in the Kohn–Sham setting. Under the Aufbau principle, we can assume that the ’s are ranked in nondecreasing order. The orbitals , , are called occupied, and the orbitals , , are called virtual (it is customary to label the occupied orbitals by indices , and virtual orbitals by indices ). The operator

can be written explicitly using the tensor basis

. We have indeed

and it follows that the lowest eigenvalue of the restriction of to is . The operator is therefore positive on , and coercive if there is an energy gap between the and eigenvalues of (see e.g. [7]).

2.5. Metrics on the tangent space

The isomorphism between and the set of orbital variations with is unitary under the Frobenius inner product up to a factor of : .

In practice, it is often advantageous to work using different inner products. This is in particular the case for partial differential equations involving unbounded operators, where using Sobolev-type metrics better respects the natural analytic structure of the problem and therefore allows for better bounds, compare e.g. the results of (5.34) and (5.35) on Figure 4 in [4]. To that end, consider a metric on given by

Here is a coercive Hermitian operator on representing the metric; for instance, taking to be a discretization of the operator we recover the classical Sobolev norm. A basic problem is that the projection on the orthogonal of does not necessarily commute with . As a result, there are various nonequivalent ways to lift this metric to one on the tangent space . We select here the computationally simplest. The operator


is positive definite on the subspace of , and induces a metric on that space. The point of this formulation is to make it easily to compute . Note that, since and do not commute, . However, is well-conditioned, so that computing the action of on a vector can be performed efficiently by an iterative algorithm involving repeated applications of the operators and . The same holds for . Furthermore, practical numerical results are typically not very sensitive to these issues, so that other (nonequivalent) reasonable alternatives to (15) yield similar results.

The metric on immediately induces a metric on given by, in the orbital representation associated with ,

for . This defines an operator on through the relationship when . Similarly to , we can compute powers and inverses of easily.

This formalism has the disadvantage that the same metric is used for every orbital variation. In practice this may not be sensible, as different orbitals can correspond to different energy ranges. Therefore we slightly modify the above formalism by applying a different metric on each individual orbital variations, following standard practice used in preconditioners for plane-wave density functional theory [24]. Introducing a family of coercive Hermitian operators on , we set


3. The periodic Kohn–Sham problem

3.1. The continuous problem

We consider a -periodic system, being a Bravais lattice with unit cell and reciprocal lattice (the set of vectors such that for all ). For the sake of simplicity, we present here the formalism for the (artificial) Kohn–Sham model for a finite system of electrons on the unit cell with periodic boundary conditions. This is distinct from the more physical periodic Kohn–Sham problem for an infinite crystal with electrons by unit cell, which is usually treated by using the Bloch theorem. Practical computations are performed for the latter model using Monkhorst-Pack Brillouin zone sampling [22] (see also [6] for a mathematical analysis of this method). The mathematical framework is very similar, with additional sums over points.

At the continuous level, a Kohn–Sham state is described by a density matrix , a rank- orthogonal projector acting on the space of square integrable periodic functions. Ignoring constant terms modeling interactions between ions (i.e. atomic nuclear and frozen core electrons), the Kohn–Sham energy of is given by (the superscript Hxc stands for Hartree-exchange-correlation), with

In the above expressions, is the density associated with the trace-class operator (formally where is the integral kernel of ), a given exchange-correlation energy, and the Hartree potential, defined as the unique periodic solution with zero mean of the Poisson equation . Here, we ignore spin for simplicity; in practice however, spin is taken into account by multiplying by the density (for spin-unpolarized systems) and modifying accordingly. In the pseudopotential approximation that we use in our numerical results, is a local potential given by


and a nonlocal potential in Kleinmann-Bylander [17] form given by


where is the number of atoms in , the ’s are the positions of the atoms inside the unit cell , is a local radial potential, denotes the number of projectors for atom , and the are given smooth functions. We use in particular the Goedecker–Teter–Hutter (GTH) pseudopotentials [12, 14] whose functional forms for the and are analytic ( is a radial Gaussian function multiplied by a radial polynomial, and is a radial Gaussian function multiplied by a solid spherical harmonics).

The Kohn–Sham Hamiltonian associated to a density matrix is given by

where and are interpreted as local (multiplication) operators. Similarly, we have

3.2. Discretization

For each vector of the reciprocal lattice , we denote by the Fourier mode with wave-vector :

where is the Lebesgue measure of the unit cell . The family is an orthonormal basis of , the space of locally square integrable -periodic functions (and an orthogonal basis of the -periodic Sobolev space , endowed with its usual inner product, for any ). In the so-called plane-wave discretization methods, the Kohn–Sham model is discretized using the finite-dimensional approximation spaces

where is a given energy cut-off chosen by the user.

The connection with the formalism introduced in Section 2 is the following:

  • we choose a large reference energy cut-off and set

  • we identify with by labelling the reciprocal lattice vectors from to in such a way that for all , ;

  • the set of rank- orthogonal projectors on such that can then be identified with the manifold defined in (3) through the mapping

  • the noninteracting Hamiltonian matrix has entries

    and the nonlinear component of the energy is any -extension of the function defined on by

The entries of the core Hamiltonian matrix can be computed explicitly:

where the above inner products can be computed exactly through the Fourier transforms of the

and (known exactly for GTH pseudopotentials). Note also that the density can be expanded on a finite number of Fourier modes and can therefore be easily stored in memory. Since the Poisson equation is trivially solvable in the plane-wave basis, this enables the exact computation of the Hartree energy. The exchange-correlation energy however cannot be computed explicitly, and is approximated using numerical quadrature. In all the numerical results, we select the parameters of this numerical quadrature such that it does not affect too much the results, see Remark 5.

3.3. Forces

The total ground-state energy depends on the atomic positions both explicitly (ion-ion interaction energy and ion-electron interaction potentials and ) and through the fact that the solution depends on :

The force acting on atom is defined as . Because of the Hellman–Feynman theorem, the term involving the derivative of with respect to vanishes[21], and the final result is


This involves the partial derivatives of the matrix elements of with respect to the atomic positions, which can be computed analytically from (17) and (18).

3.4. Numerical setup

For all the computations and examples on silicon, we use the DFTK software [16] within the LDA approximation, with Teter 93 exchange-correlation functional [12] and a -point grid, and a reference solution computed with Ha, to which we compare results obtained with smaller ’s. We use the usual periodic lattice for the FCC phase of silicon, with lattice constant Bohrs, close to the equilibrium configuration. Note that the discretization grid of the Brillouin zone is not fine enough to have fully converged results, but is still sufficient to illustrate our points.

The two atoms of silicon inside a cell are placed at first at their equilibrium positions with fractional coordinates and , and then the second one is slightly displaced by to get nonzero interatomic forces.

The discretized Kohn–Sham equations are solved by a standard SCF procedure. The main computational bottleneck is the partial diagonalization of the mean-field Hamiltonian at each SCF step. This is done using an iterative eigenvalue solver, which only requires applying mean-field Hamiltonian matrices to a set of trial orbitals and simple operations on vectors. In a plane-wave basis set of size , the former operation can be done efficiently through the use of the fast Fourier transform for a total cost of . We refer to [21] for more details. The application of the super-operators and to a set of orbital variations (see (13)) involves additional linear algebra operations, for an additional cost of .

4. A first error bound using linearization

Now that the mathematical and numerical frameworks are laid down, we turn to the estimation of the error between the reference solution computed with a large energy cut-off and approximations thereof. We first start by deriving a linearization estimate and illustrating numerically its applicability. We then propose a very coarse bound on the error on the density matrix and the forces, based on the (expensive) evaluation of an operator norm. We will show in the next section how to improve this bound.

4.1. Linearization in the asymptotic regime

We assume that is a nondegenerate local minimizer of in the sense that there exists such that on the tangent space . This implies in particular that is invertible on the invariant subspace .

Recall that for any trial density matrix , the residual of the problem is

so that defines a smooth vector field on (a section of the tangent bundle ) which vanishes at . For in the vicinity of , we have


If follows from the definitions (9) of and , the optimality condition , and the above expansions that for all close enough to ,

By continuity, on the tangent space for close enough to , so that the restriction of the super-operator to the invariant subspace is self-adjoint and invertible. Using again (20) and the fact that , we obtain


This error-residual equation is the analog in our case of the linearization (1), which identifies the super-operator as the fundamental object in our study.

Based on this expansion, we can formulate the Newton algorithm to solve the equation :

where and and is a suitable retraction on . A possible retraction is given in [7]. This Newton algorithm is expensive in practice, as it requires to solve iteratively a linear system; the cost of a Newton step is comparable to that of a full self-consistent field cycle. It is however a useful theoretical tool, and a starting point for further analysis and approximations.

To check the validity of the linearization (21), we focus on three quantities of interest: the ground state energy, the ground-density density, and the interatomic forces acting on the two atoms in . The reference values , and of these QoIs are those obtained with the very large energy cut-off Ha, defining a “fine grid” in real space via the discrete Fourier transform. For defining a “coarse grid” in real space, we compute two approximations of the three QoIs:

  1. , and denote the approximations obtained from the variational solution of the Kohn–Sham problem on the coarse grid;

  2. , and denote the ones computed from the Kohn–Sham state obtained by one Newton step on the fine grid, starting from the variational solution of the Kohn–Sham problem on the coarse grid.

The errors between these approximations and the reference values are plotted in Figure 1 as functions of . The errors on the ground-state density are measured with the metric, while the errors on the forces are measured with the Euclidean metric on .

For the simple case of a silicon crystal at the LDA level of theory, the linearization works very well, even for very small values of ’s of the order of Ha. Indeed the Kohn–Sham ground-state obtained by variational approximation on a coarse grid is significantly improved by one Newton step: the errors on the QoIs obtained with the latter are orders of magnitude smaller than the ones obtained with the former. This means that if is the variational solution on the coarse grid, is a much better approximation of (for the metrics adapted to the chosen three QoIs).

Figure 1. Errors for the ground-state energy, ground-state density and interatomic forces for Si as a function of , for both the variational solution of the Kohn–Sham problem on the coarse grid defined by (solid line) and the post-processed solution obtained with one Newton step on the fine grid (dashed line). This shows that the linearization approximation is excellent, even for energy cut-offs as low as Ha.

4.2. A simple error bound based on operator norms

From (21) one can extract an error bound: