1. Introduction
This article focuses on providing practical error estimates for numerical approximations of electronic structure calculations. These simulations are key in many domains, as they allow for the simulation of systems at the atomic and molecular scale. They are particularly useful in the fields of chemistry, materials science, condensed matter physics, or molecular biology. Over the many electronic structure models available, Kohn–Sham (KS) Density Functional Theory (DFT) [19] with semilocal density functionals is one of the most used in practice, as it offers a good compromise between accuracy and computational cost. We will focus on this model in this article. Note that the mathematical formulation of this problem is similar to that of the Hartree–Fock [13] or Gross–Pitaevskii equations [26], so that what we present in the context of DFT can be easily extended to such contexts. We will focus on planewave discretizations within the pseudopotential approximation, which are most suited for the study of crystals; some (but not all) of our methodology can be applied in other contexts as well.
In this field, the first and most crucial problem is the determination of the electronic ground state of the system under consideration. Mathematically speaking, this problem is a constrained minimization problem. Writing the firstorder optimality conditions of this problem leads to an eigenvalue problem that is nonlinear in the eigenvectors. The unknown is a subspace of dimension
, the number of electrons in the system; this subspace can be conveniently described using either the orthogonal projector on it (density matrix formalism) or an orthonormal basis of it (orbital formalism). This problem is wellknown in the literature and the interested reader is referred to [7] and references therein for more information on how it is solved in practice.Solving this problem numerically requires a number of approximations, so that only an approximation of the exact solution can be computed. Being able to estimate the error between the exact and the approximate solutions is crucial, as this information can be used to reduce the high computational cost of such methods by an optimization of the approximation parameters, and maybe more importantly, to add error bars to quantities of interest (QoI) calculated from the approximate solution. In our context, such QoI are typically the groundstate energy of the system and the forces on the atoms in the system, but may also include any information computable from the Kohn–Sham ground state.
While such error bounds have been developed already some time ago for boundary value problems, e.g. in the context of finite element discretization, using in particular a posteriori error estimation [29], the development of bounds in the context of electronic structure is quite recent, and still incomplete. Computable and guaranteed error bounds for linear eigenvalue problems have been proposed in the last decade [2, 3, 4, 8, 20]; we refer to [23, Chapter 10] for a recent monograph on the subject. Specifically for electronic structure calculations, some of us proposed guaranteed error bounds for linear eigenvalue equations [15]. For fully selfconsistent (nonlinear) eigenvalue equations, an error bound was proposed for a simplified 1D Gross–Pitaevskii equation in [10]; however the computational cost of evaluating this error bound in this contribution is quite high. So far, no error bound has been proposed for the error estimation of general QoI in electronic structure calculations, in particular for the interatomic forces. This is what we are trying to achieve in this contribution.
In this article, we use a general approach based on a linearization of the Kohn–Sham equations. It is instructive to compare our approach to those used in a general context. Assume we want to find such that , for some nonlinear function (the residual). Near a solution , we have , and therefore, if is invertible, we have the errorresidual relationship
(1) 
This is the same approximation that leads to the Newton algorithm. Assume now that we want to compute a QoI , where is a function; then we have the approximate equality with computable righthand side:
(2) 
From here, we obtain the simple first estimate:
where is any chosen norm on , and the induced operator norms on (note that and ). This approximate bound can be turned into a rigorous one using information on the second derivatives of ; see for instance [28].
To extend this approach to Kohn–Sham models, we encounter several difficulties:

The structure of our problem is not easily formulated as above because of the presence of constraints and degeneracies. We solve this using the geometrical framework of [7] to identify the appropriate analog to the Jacobian .

The computation of the Jacobian and its inverse is prohibitively costly. We use iterative strategies to keep this cost manageable.

Choosing the right norm is not obvious in this context. For problems involving partial differential equations, where
includes partial derivatives, it is natural to consider Sobolevtype norms, with the aim of making a bounded operator between the relevant function spaces. We explore different choices and their impacts on the error bounds. 
The operator norm inequalities
are very often largely suboptimal, even with appropriate norms. We quantify this on representative examples.

The structure of the residual plays an important role. For instance, when results from a Galerkin approximation to a partial differential equation, is orthogonal to the approximation space. In the context of planewave discretizations, this means the residual only contains highfrequency Fourier components. We demonstrate how this impacts the quality of the above bounds when represents the interatomic forces, in which case mostly acts on lowfrequency components.
Throughout the paper, we will provide numerical tests to illustrate our results. All these tests are performed with the DFTK software [16], a recent Julia package solving the Kohn–Sham equations in the pseudopotential approximation using a planewave basis, thus particularly well suited for periodic systems such as crystals [21]. We are mostly interested in three QoI: the groundstate energy, the groundstate density, and the interatomic forces, the latter being computed using the Hellmann–Feynman theorem. We will demonstrate the main points with simulations on a simple system (bulk silicon), then present results for more complicated systems.
We will be interested in this paper only in quantifying the discretization error. However, the general framework we develop can be used also to treat other types of error (such as the ones resulting from an iterative solving of the underlying minimization problem). We only consider insulating systems at zero temperature, and do not consider the error due to finite Brillouin zone sampling [6]; extending the formalism to finite temperature and quantifying the sampling error, particularly for metals, is an interesting topic for future work.
The outline of this article is as follows. In Section 2, we present the mathematical framework related to the solution of the electronic structure minimization problem, describing in particular objects related to the tangent space of the constraint manifold. In Section 3, we present the Kohn–Sham model and the numerical framework in which our tests will be performed. In Section 4, we propose a first crude bound of the error between the exact and the numerically computed solution based on a linearization argument as well as an operator norm inequality, both for the error on the groundstate density matrix and on the forces. In Section 5, we refine this bound by splitting between low and high frequencies, and using a Schur complement to refine the error bound on the low frequencies. Finally, in Section 6, we provide numerical simulations on more involved materials systems, namely on a gallium arsenide system (GaAs) and a titanium dioxide system (TiO2), showing that the proposed bounds work similarly well in those cases.
2. Mathematical framework
In this section, we present the models targeted by our study, as well as the elementary tools of differential geometry used to derive and compute the error bounds on the QoI.
2.1. General framework
The work we present here is valid for a large class of meanfield models including different instances of Kohn–Sham models, the Hartree–Fock model, and the timeindependent Gross–Pitaevskii model and its various extensions. To study them in a unified way, we use a mathematical framework similar to the one in [7]. To keep the presentation simple, we will work in finite dimension and consider that the solutions of the problem can be very accurately approximated in a given finitedimensional space of (high) dimension , which we identify with . We denote by
the inner product of
, seen as a vector space over
. We equip the vector space of square Hermitian matriceswith the Frobenius inner product . Note that although it is important in applications to allow for complex orbitals and density matrices, the space of Hermitian matrices is not a vector space over , and therefore we will always consider vector spaces over .
The densitymatrix formulation of the meanfield model in this reference approximation space reads
(3) 
is the manifold of rank orthogonal projectors (density matrices), and is a nonlinear energy functional. The parameter (with ) is a fixed integer depending on the physical model, and not of its discretization in a finitedimensional space. For meanfield electronic structure models, is the number of electrons or electron pairs in the system (hence the notation ); in the standard Gross–Pitaevskii model, . The energy functional is of the form
where is the linear part of the meanfield Hamiltonian, and a nonlinear contribution depending on the considered model (see Section 3 for the expressions in the Kohn–Sham model). For simplicity of the presentation we will ignore spin in the formalism, but we will include it in the numerical simulations. The set is diffeomorphic to the Grassmann manifold of dimensional complex vector subspaces of .
Problem (3) always has a minimizer since it consists in minimizing a continuous functions on a compact set. This minimizer may be unique, or not, depending on the model and/or the physical system under study. We will not elaborate here on this uniqueness issue, and assume for the sake of simplicity that (3) has a unique minimizer, which we denote by .
Besides the groundstate energy , we can compute from various other physical quantities of interest (QoI), for instance, the groundstate energy and density, and the interatomic forces. We denote such a QoI by .
We consider the case when is too large for problem (3) to be solved completely in the reference approximation space. To solve problem (3), we therefore consider a finitedimensional subspace of of dimension and solve instead the variational approximation of (3) in , namely
(4) 
Our goal is then to estimate the errors on the QoI , where is typically the minimizer of (4), given the variational space , and the norm is specific to the QoI. The latter can be a scalar, e.g. the groundstate energy, or a finite or infinite dimensional vector, e.g. the interactomic forces, or the groundstate density.
2.2. Firstorder geometry
The manifold is a smooth manifold. Its tangent space at is given by
where is the orthogonal projection on for the canonical inner product of . The set is the set of Hermitian matrices that are offdiagonal in the block decomposition induced by and ; more explicitly, if for some unitary , then
The orthogonal projection on for the Frobenius inner product is given by
(5) 
where is the commutator of and . Linear operators acting on spaces of matrices are sometimes referred to as superoperators in the physics literature. Throughout this paper, superoperators will be written in bold fonts.
The meanfield Hamiltonian is the gradient of the energy at a given point (again for the Frobenius inner product):
To simplify the notation, we set
(6) 
The firstorder optimality condition for (3) is that is orthogonal to the tangent space , which can be written, using (5) and (6), as . This corresponds to the residual
being zero at . The residual function can be seen as a nonlinear map from to itself, and its restriction to as a vector field on since for all , .
2.3. Secondorder geometry
We introduce the superoperators and , defined at and acting on . These operators were already introduced in [7, Section 2.2], but we recall here their definitions for completeness. To simplify the notation, we will set , .
The superoperator is the Hessian of the energy projected onto the tangent space to at :
(7) 
By construction, is an invariant subspace of . Note that for linear eigenvalue problems, i.e. when .
The superoperator is defined by
(8) 
The tangent space is also an invariant subspace of .
It is shown in [7] that the energy of a density matrix with is . The restriction of the operator to the invariant subspace can therefore be identified with the secondorder derivative of on the manifold . Since is a minimum, it follows that
Note that in general, the secondorder derivative of a function defined on a smooth manifold is not an intrinsic object; it depends not only on the tangent structure of the manifold, but also on the chosen affine connection. However, at the critical point of on the manifold, the contributions to the second derivative due the connection vanish and the secondorder derivative becomes intrinsic.
For our purposes, it will be convenient to define this secondorder derivative also outside of . Relying on (7)(8), we define for any the superoperators and through
(9) 
Both and admit as an invariant subspace and their restrictions to are Hermitian for the Frobenius inner product. The map is smooth and the restriction of to provides a computable approximation of the secondorder derivative of in a neighborhood of (whatever the choice of the affine connection).
2.4. Density matrix and orbitals
The framework we have outlined above is particularly convenient for stating the secondorder conditions, but much too expensive computationally as it requires the storage and manipulation of (lowrank) large matrices. In practice, it is more effective to work directly with orbitals, i.e. write for any
(10) 
where is a collection of orbitals satisfying and , and where we used Dirac’s braket notation: for , and . Problem (3) can be reformulated as
Note that the orbitals are only defined up to a unitary transform: if
is a unitary matrix, then
and give rise to the same density matrix. This means that the minimizers of this minimization problem are never isolated, which creates technical difficulties that are not present in the density matrix formalism.Let us fix a with , and consider an element of the tangent plane . By completing to an orthogonal basis and writing out in this basis, it is easy to see that the constraints , , imply that can be put in the form
(11) 
where is a set of orbital variations satisfying . Furthermore, under this condition, is unique, so that (11) establishes a bijection between and . We will therefore treat equivalently elements of the tangent space either in the density matrix representation or the orbital representation , writing
(12) 
This orbital representation of by is more economical computationally, only requiring the storage and manipulation of orbitals satisfying . Similarly, the manipulation of objects in the tangent plane is more efficiently done through their orbital variations satisfying .
All operations on density matrices or their variations can indeed be carried out in this orbital representation. For instance, the computation of the energy can be performed efficiently in practice, as explained in Section 3, and the residual at also has a nice representation in terms of orbitals:
which is easily recognized as similar to the residual of a linear eigenvalue problem.
Likewise, operators on can be identified in this fashion. For instance,
(13) 
The computation of can be performed similarly. All the numerical results in this article are performed using the orbital formalism.
Remark 1.
Note that the condition that is not necessary for to define an element of . However, without this gauge condition, is not unique. This is simply a manifestation at the infinitesimal level of the noninjectivity of the map between and . Because of this, the derivative is not injective between the tangent spaces and . In more concrete terms, in the example case where are the first basis vectors, any element is of the form which can be written in the form (11) with . However, such a can also be written in the form (11) with for any antihermitian matrix . The gauge condition forces to be zero, making unique. In more formal terms, the map induces a principal bundle structure on the base space (the Grassmann manifold) with total space (the Stiefel manifold) and characteristic fiber . This naturally splits the tangent space into the vertical space , and a complementary horizontal space, which we take to be the orthogonal complement, .
The orbital formalism can be used to give a more concrete interpretation of the firstorder optimality condition . Indeed, this condition can be rewritten as
from which it follows that and can be jointly diagonalized in an orthonormal basis:
(14) 
In many applications, the orbitals spanning the range of (see (14)) are those corresponding to the lowest eigenvalues of . This is called the Aufbau principle in physics and chemistry. This principle is always satisfied in the (unrestricted) Hartree–Fock setting, and most of the times in the Kohn–Sham setting. Under the Aufbau principle, we can assume that the ’s are ranked in nondecreasing order. The orbitals , , are called occupied, and the orbitals , , are called virtual (it is customary to label the occupied orbitals by indices , and virtual orbitals by indices ). The operator
can be written explicitly using the tensor basis
. We have indeedand it follows that the lowest eigenvalue of the restriction of to is . The operator is therefore positive on , and coercive if there is an energy gap between the and eigenvalues of (see e.g. [7]).
2.5. Metrics on the tangent space
The isomorphism between and the set of orbital variations with is unitary under the Frobenius inner product up to a factor of : .
In practice, it is often advantageous to work using different inner products. This is in particular the case for partial differential equations involving unbounded operators, where using Sobolevtype metrics better respects the natural analytic structure of the problem and therefore allows for better bounds, compare e.g. the results of (5.34) and (5.35) on Figure 4 in [4]. To that end, consider a metric on given by
Here is a coercive Hermitian operator on representing the metric; for instance, taking to be a discretization of the operator we recover the classical Sobolev norm. A basic problem is that the projection on the orthogonal of does not necessarily commute with . As a result, there are various nonequivalent ways to lift this metric to one on the tangent space . We select here the computationally simplest. The operator
(15) 
is positive definite on the subspace of , and induces a metric on that space. The point of this formulation is to make it easily to compute . Note that, since and do not commute, . However, is wellconditioned, so that computing the action of on a vector can be performed efficiently by an iterative algorithm involving repeated applications of the operators and . The same holds for . Furthermore, practical numerical results are typically not very sensitive to these issues, so that other (nonequivalent) reasonable alternatives to (15) yield similar results.
The metric on immediately induces a metric on given by, in the orbital representation associated with ,
for . This defines an operator on through the relationship when . Similarly to , we can compute powers and inverses of easily.
This formalism has the disadvantage that the same metric is used for every orbital variation. In practice this may not be sensible, as different orbitals can correspond to different energy ranges. Therefore we slightly modify the above formalism by applying a different metric on each individual orbital variations, following standard practice used in preconditioners for planewave density functional theory [24]. Introducing a family of coercive Hermitian operators on , we set
(16) 
3. The periodic Kohn–Sham problem
3.1. The continuous problem
We consider a periodic system, being a Bravais lattice with unit cell and reciprocal lattice (the set of vectors such that for all ). For the sake of simplicity, we present here the formalism for the (artificial) Kohn–Sham model for a finite system of electrons on the unit cell with periodic boundary conditions. This is distinct from the more physical periodic Kohn–Sham problem for an infinite crystal with electrons by unit cell, which is usually treated by using the Bloch theorem. Practical computations are performed for the latter model using MonkhorstPack Brillouin zone sampling [22] (see also [6] for a mathematical analysis of this method). The mathematical framework is very similar, with additional sums over points.
At the continuous level, a Kohn–Sham state is described by a density matrix , a rank orthogonal projector acting on the space of square integrable periodic functions. Ignoring constant terms modeling interactions between ions (i.e. atomic nuclear and frozen core electrons), the Kohn–Sham energy of is given by (the superscript Hxc stands for Hartreeexchangecorrelation), with
In the above expressions, is the density associated with the traceclass operator (formally where is the integral kernel of ), a given exchangecorrelation energy, and the Hartree potential, defined as the unique periodic solution with zero mean of the Poisson equation . Here, we ignore spin for simplicity; in practice however, spin is taken into account by multiplying by the density (for spinunpolarized systems) and modifying accordingly. In the pseudopotential approximation that we use in our numerical results, is a local potential given by
(17) 
and a nonlocal potential in KleinmannBylander [17] form given by
(18) 
where is the number of atoms in , the ’s are the positions of the atoms inside the unit cell , is a local radial potential, denotes the number of projectors for atom , and the are given smooth functions. We use in particular the Goedecker–Teter–Hutter (GTH) pseudopotentials [12, 14] whose functional forms for the and are analytic ( is a radial Gaussian function multiplied by a radial polynomial, and is a radial Gaussian function multiplied by a solid spherical harmonics).
The Kohn–Sham Hamiltonian associated to a density matrix is given by
where and are interpreted as local (multiplication) operators. Similarly, we have
3.2. Discretization
For each vector of the reciprocal lattice , we denote by the Fourier mode with wavevector :
where is the Lebesgue measure of the unit cell . The family is an orthonormal basis of , the space of locally square integrable periodic functions (and an orthogonal basis of the periodic Sobolev space , endowed with its usual inner product, for any ). In the socalled planewave discretization methods, the Kohn–Sham model is discretized using the finitedimensional approximation spaces
where is a given energy cutoff chosen by the user.
The connection with the formalism introduced in Section 2 is the following:

we choose a large reference energy cutoff and set

we identify with by labelling the reciprocal lattice vectors from to in such a way that for all , ;

the set of rank orthogonal projectors on such that can then be identified with the manifold defined in (3) through the mapping

the noninteracting Hamiltonian matrix has entries
and the nonlinear component of the energy is any extension of the function defined on by
The entries of the core Hamiltonian matrix can be computed explicitly:
where the above inner products can be computed exactly through the Fourier transforms of the
and (known exactly for GTH pseudopotentials). Note also that the density can be expanded on a finite number of Fourier modes and can therefore be easily stored in memory. Since the Poisson equation is trivially solvable in the planewave basis, this enables the exact computation of the Hartree energy. The exchangecorrelation energy however cannot be computed explicitly, and is approximated using numerical quadrature. In all the numerical results, we select the parameters of this numerical quadrature such that it does not affect too much the results, see Remark 5.3.3. Forces
The total groundstate energy depends on the atomic positions both explicitly (ionion interaction energy and ionelectron interaction potentials and ) and through the fact that the solution depends on :
The force acting on atom is defined as . Because of the Hellman–Feynman theorem, the term involving the derivative of with respect to vanishes[21], and the final result is
(19) 
This involves the partial derivatives of the matrix elements of with respect to the atomic positions, which can be computed analytically from (17) and (18).
3.4. Numerical setup
For all the computations and examples on silicon, we use the DFTK software [16] within the LDA approximation, with Teter 93 exchangecorrelation functional [12] and a point grid, and a reference solution computed with Ha, to which we compare results obtained with smaller ’s. We use the usual periodic lattice for the FCC phase of silicon, with lattice constant Bohrs, close to the equilibrium configuration. Note that the discretization grid of the Brillouin zone is not fine enough to have fully converged results, but is still sufficient to illustrate our points.
The two atoms of silicon inside a cell are placed at first at their equilibrium positions with fractional coordinates and , and then the second one is slightly displaced by to get nonzero interatomic forces.
The discretized Kohn–Sham equations are solved by a standard SCF procedure. The main computational bottleneck is the partial diagonalization of the meanfield Hamiltonian at each SCF step. This is done using an iterative eigenvalue solver, which only requires applying meanfield Hamiltonian matrices to a set of trial orbitals and simple operations on vectors. In a planewave basis set of size , the former operation can be done efficiently through the use of the fast Fourier transform for a total cost of . We refer to [21] for more details. The application of the superoperators and to a set of orbital variations (see (13)) involves additional linear algebra operations, for an additional cost of .
4. A first error bound using linearization
Now that the mathematical and numerical frameworks are laid down, we turn to the estimation of the error between the reference solution computed with a large energy cutoff and approximations thereof. We first start by deriving a linearization estimate and illustrating numerically its applicability. We then propose a very coarse bound on the error on the density matrix and the forces, based on the (expensive) evaluation of an operator norm. We will show in the next section how to improve this bound.
4.1. Linearization in the asymptotic regime
We assume that is a nondegenerate local minimizer of in the sense that there exists such that on the tangent space . This implies in particular that is invertible on the invariant subspace .
Recall that for any trial density matrix , the residual of the problem is
so that defines a smooth vector field on (a section of the tangent bundle ) which vanishes at . For in the vicinity of , we have
(20) 
If follows from the definitions (9) of and , the optimality condition , and the above expansions that for all close enough to ,
By continuity, on the tangent space for close enough to , so that the restriction of the superoperator to the invariant subspace is selfadjoint and invertible. Using again (20) and the fact that , we obtain
(21) 
This errorresidual equation is the analog in our case of the linearization (1), which identifies the superoperator as the fundamental object in our study.
Based on this expansion, we can formulate the Newton algorithm to solve the equation :
where and and is a suitable retraction on . A possible retraction is given in [7]. This Newton algorithm is expensive in practice, as it requires to solve iteratively a linear system; the cost of a Newton step is comparable to that of a full selfconsistent field cycle. It is however a useful theoretical tool, and a starting point for further analysis and approximations.
To check the validity of the linearization (21), we focus on three quantities of interest: the ground state energy, the grounddensity density, and the interatomic forces acting on the two atoms in . The reference values , and of these QoIs are those obtained with the very large energy cutoff Ha, defining a “fine grid” in real space via the discrete Fourier transform. For defining a “coarse grid” in real space, we compute two approximations of the three QoIs:

, and denote the approximations obtained from the variational solution of the Kohn–Sham problem on the coarse grid;

, and denote the ones computed from the Kohn–Sham state obtained by one Newton step on the fine grid, starting from the variational solution of the Kohn–Sham problem on the coarse grid.
The errors between these approximations and the reference values are plotted in Figure 1 as functions of . The errors on the groundstate density are measured with the metric, while the errors on the forces are measured with the Euclidean metric on .
For the simple case of a silicon crystal at the LDA level of theory, the linearization works very well, even for very small values of ’s of the order of Ha. Indeed the Kohn–Sham groundstate obtained by variational approximation on a coarse grid is significantly improved by one Newton step: the errors on the QoIs obtained with the latter are orders of magnitude smaller than the ones obtained with the former. This means that if is the variational solution on the coarse grid, is a much better approximation of (for the metrics adapted to the chosen three QoIs).
4.2. A simple error bound based on operator norms
From (21) one can extract an error bound: