 # Gradient Flow Based Discretized Kohn-Sham Density Functional Theory

In this paper, we propose and analyze a gradient flow based Kohn-Sham density functional theory. First, we prove that the critical point of the gradient flow based model can be a local minimizer of the Kohn-Sham total energy. Then we apply a midpoint scheme to carry out the temporal discretization. It is shown that the critical point of the Kohn-Sham energy can be well-approximated by the scheme. In particular, based on the midpoint scheme, we design an orthogonality preserving iteration scheme to minimize the Kohn-Sham energy and show that the orthogonality preserving iteration scheme produces approximations that are orthogonal and convergent to a local minimizer under reasonable assumptions. Finally, we report numerical experiments that support our theory.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Kohn-Sham density functional theory (DFT) is the most widely used model in electronic structure calculations . We see that to solve the Kohn-Sham equation, which is a nonlinear eigenvalue problem, some self-consistent field (SCF) iterations are demanded [7, 9, 22]. However, the convergence of SCF iterations is not guaranteed, especially for large scale systems with small band gaps, for which the performance of the SCF iterations is unpredictable [8, 32]. It has been shown by numerical experiments that the SCF iterations usually converge for systems with larger gap between the occupied orbitals and the remainder . We understand that there are a number of works trying to illustrate this phenomenon and see that SCF iterations do converge if the gap is uniformly large enough locally or globally [4, 16, 17, 31].

In order to obtain approximations of the Kohn-Sham DFT that are convergent, in recent two decades, people pay much attention to study the direct energy minimization model. Instead of solving the Kohn-Sham equation, people minimize the Kohn-Sham total energy under an orthogonality constraint [8, 12, 16, 17, 24, 25, 26, 27, 28, 32, 33]. It is shown in  that a monotonic optimization approach may produce a locally convergent approximations. We observe that the iterations based on the optimization should be carefully carried out due to the orthogonality constraint, for which the existing methods are indeed either retraction (see, e.g., [8, 32]) or manifold path optimization approaches (see, e.g., [8, 28, 32]). We see that some backtracking should be applied in a monotonic optimization method, due to not only theory but also practice.

In this paper, we introduce and analyze a gradient flow based discretized Kohn-Sham DFT for electronic structure calculations. First, we prove that our gradient flow based discretized Kohn-Sham DFT preserves orthogonality and models the ground state well. We then propose a midpoint scheme to carry out the temporal discretization, which is of orthogonality preserving, too. We mention that our numerical scheme avoids a retraction process and does not need any backtracking. Based on the midpoint scheme, finally, we design and analyze an orthogonality preserving iteration scheme for solving the discretized Kohn-Sham energy. It is shown by theory and numerics that our orthogonality preserving iteration scheme is convergent provided some reasonable assumption.

For illustration, we provide Figure 1 to show the differences among the three approaches. In the midpoint scheme of the gradient flow based model (blue dashed line with square symbol endpoint), the auxiliary point of midpoint scheme of the dynamical system is inside the manifold. In the manifold path method (black solid line with circle symbol endpoint), the path is on the manifold and the energy is decreasing when the iteration is moving along the path. In the retraction method (red solid line with triangle symbol endpoint), the auxiliary point is in the tangent space and outside the manifold. Figure 1: Comparison among gradient flow scheme, manifold path method and retraction method: blue dashed line with square symbol endpoint – midpoint scheme of gradient flow model, black solid line with circle symbol endpoint – manifold path method, red solid line with triangle symbol endpoint– retraction method. Diamond symbol – auxiliary point of each method.

We observe that there are some existing works on the gradient flow methods of eigenvalue problems. We refer to [6, 14, 29] and references cited therein for linear eigenvalue problems and 

for the ground state of Bose-Einstein condensate (which requires the smallest eigenvalue and its associated eigenfunction only). We point out that our gradient flow based model is different from the gradient flow model proposed in

 for the Kohn-Sham DFT, in which the numerical scheme is either the retraction approach or the manifold path approach.

We organize the rest of the paper as follows. In section 2, we introduce some necessary notation and the Kohn-Sham DFT models. Then we come up with our gradient flow based model and prove its local convergence and convergence rate of the asymptotic behaviours in section 3. In section 4, we propose a midpoint scheme to realize temporal discretization of the gradient flow based model and investigate the relevant properties including preserving orthogonality automatically, updating inside the manifold as well as the local convergence. Based on the midpoint scheme, in section 5, we design and analyze an orthogonality preserving iteration scheme for solving the discretized Kohn-Sham energy. In section 6, we provide numerical experiments that support our theory. Finally, we present some concluding remarks.

## 2 Preliminaries

In this section, we introduce some basic notation and the Kohn-Sham models.

### 2.1 Basic notation

We apply the standard -inner product , which is defined as

 (u,v)L2(R3)=∫R3u(x)v(x)dx, (1)

denote -norm by , and -norm by

 ∥u∥L1(R3)=∫R3|u(x)|dx. (2)

We define -norm as

 ∥u∥2H1(R3)=∥u∥2L2(R3)+∥∇u∥2L2(R3)

and use Sobolev space

 H1(R3)={u∈L2(R3):∥u∥H1(R3)<+∞}.

Let    and    . Define product matrix

 Ψ⊙Φ=(ψiφj)Ni,j=1∈(L1(R3))N×N

and inner product matrix

 ⟨Ψ⊤Φ⟩=((ψi,φj)L2(R3))Ni,j=1∈RN×N.

For , we set

 ⟨F,Ψ⟩=(⟨Fi,ψj⟩)Ni,j=1∈RN×N. (3)

We then introduce the Stiefel manifold defined as

 MN={U∈(H1(R3))N:⟨U⊤U⟩=IN}.

For and any matrix , we denote

 UP=(N∑j=1pj1uj,N∑j=1pj2uj,…,N∑j=1pjNuj).

We see that

 U∈MN⇔UP∈MN,∀P∈ON,

where

 ON={P∈RN×N:P⊤P=IN}.

We define an equivalent relation “” on as

 U∼^U⇔∃P∈ON, ^U=UP,

and get a Grassmann manifold, which is a quotient of

 GN=MN/∼.

We introduce an equivalent class of by

 [U]={UP:P∈ON},

an inner product as

 (U,^U)=tr(⟨U⊤^U⟩)

together with an associated norm

 |||U|||=(U,U)1/2

on .

Give a finite-dimensional space spanned by . We denote . We see that for any , there exists such that

 U=ΦC=(Ng∑j=1cj1ϕj,Ng∑j=1cj2ϕj,…,Ng∑j=1cjNϕj). (4)

We define a closed -neighborhood of by

 B(U,δ)={^U∈(VNg)N:dist(U,^U)⩽δ}

where

 dist(U,^U)=|||U−^U|||,

and for introduce a closed -neighborhood of on by

 B([U],δ)={[^U]∈GN:^U∈(VNg)N⋂MN,dist([U],[^U])⩽δ},

where

 dist([U],[^U])=infP∈ON|||U−^UP|||.

For simplicity, we use notation

 {U,W}=UW⊤−WU⊤,∀U,W∈(VNg)N (5)

where and denote operators on :

 (UW⊤)V=U⟨W⊤V⟩,(WU⊤)V=W⟨U⊤V⟩,

for any .

Obviously

 {U,W}+{W,U}=0,∀U,W∈(VNg)N. (6)

Namely

is skew-symmetric.

### 2.2 Kohn-Sham models

The energy based Kohn-Sham DFT model for a system of electron orbitals with external potential contributed by M nuclei of charges is the following constrained optimization problem on the Stiefel manifold

 infU∈(H1(R3))NE(U)s.t. U∈MN (7)

where is the Kohn-Sham energy

 E(U)=12N∑i=1fi∫R3|∇ui(r)|2dr+∫R3Vext(r)ρU(r)dr+12∫R3∫R3ρU(r)ρU(r′)|r−r′|drdr′+Exc(ρU). (8)

Here are Kohn-Sham orbitals,

 ρU(r)=N∑i=1fi|ui(r)|2=tr(U⊙UF) (9)

is the associated electron density with being the occupation number of the -th orbital and . is the external potential generated by the nuclei: for full potential calculations,

 Vext(r)=−M∑I=1ZI|r−RI|,

and are the nuclei charge and position of the -th nuclei respectively; while for pseudo potential approximations, the formula for the energy is still (8) (see, e.g., ). The fourth term in (8) is the exchange-correlation energy, to which some approximations, such as LDA(Local Density Approximation), GGA(General Gradient Approximation) and so on[18, 20], should be applied. We assume that is bounded from below with orthogonality constraint of , which is of physics. For simplicity, we consider the case of .

We see that for any and all , there hold

 ρUP=tr((UP)⊙UPF)=2tr(U⊙UPP⊤)=2tr(U⊙U)=ρU

and

 E(UP)=E(U). (10)

Instead we consider an optimization problem on

 infU∈(H1(R3))NE(U)s.t. [U]∈GN (11)

and define level set

 LE={[U]∈GN:E(U)⩽E}.

To introduce the gradient on , we suppose

 Exc(ρU)=∫R3εxc(ρU)(r)ρU(r)dr

and assume that the exchange-correlation energy is differentiable and the exchange-correlation potential

 vxc(ρ)=δ(ρεxc(ρ))δρ.

We may write the gradient of as

 ∇E(U)=(Eu1,Eu2,…,EuN)∈(H−1(R3))N,

where is defined by

 (12)

Obviously

 (13)

We see from  that the gradient on Grassmann manifold of at is

 ∇GE(U)=∇E(U)−U⟨∇E(U),U⟩⊤, ∀U∈MN. (14)

To propose a gradient flow based model preserving orthogonality, we need to extend the domain of from to . We then define extended gradient as follows

 (15)

Note that (15) is consistent with (14) for since .

We see from [8, 10] that the tangent space on is

 T[U]GN={W∈(H1(R3))N:⟨W⊤U⟩=0} (16)

and the Hessian of on is

 HessGE(U)[V,W]=tr(⟨V⊤∇2E(U)W⟩)−tr(⟨V⊤W⟩⟨U⊤∇E(U)⟩),∀V,W∈T[U]GN. (17)

If , then we may view in the sense of isomorphism and

 ⟨∇E(U),V⟩=⟨(∇E(U))⊤V⟩, ∀V∈(VNg)N. (18)

As a result, and we may write

 ∇GE(U)=AUU, ∀U∈(VNg)N, (19)

where

 AU={∇E(U),U}.

## 3 Gradient flow based model

In this section, we propose and analyze a gradient flow based model.

### 3.1 The model

Different from the Kohn-Sham equation and the Kohn-Sham energy minimization model, we propose a gradient flow based model of Kohn-Sham DFT as follows:

 ⎧⎨⎩dUdt=−∇GE(U),0

where and . We see that (20) is different from the standard gradient flow model presented in , which applies the in (14) rather than (19). We point out that whether the solution of (14) keeps on the Stiefel manifold is unclear. However, we see from Proposition 3.1 that our new defined by (19) guarantees that the solution keeps on the Stiefel manifold. Namely, (20) is an orthogonality preserving model whenever the initial is orthogonal.

If and

 A⊤=A,B⊤=−B,

then

 tr(AB)=0.

We see that

 tr(AB)=tr(AB)⊤=tr(B⊤A⊤)=−tr(BA)=−tr(AB),

which indicates

 tr(AB)=0.
###### Proposition

The solution of (20) satisfies . Moreover, there holds

 dE(U(t))dt=−∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣2⩽0,0

A direct calculation shows that

which indicates

 ⟨U(t)⊤U(t)⟩=IN

due to . Consequently, we see from Lemma 3.1 that

 ∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣2−tr⟨∇E(U(t))⊤∇GE(U(t))⟩=tr(⟨∇E(U(t))⊤U(t)⟩⟨U(t)⊤∇GE(U(t))⟩)=0. (22)

As a result,

 dE(U(t))dt=δEδU⋅dUdt=−tr⟨∇E(U(t))⊤∇GE(U(t))⟩=−∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣2⩽0. (23)

### 3.2 Critical points

We denote Lagrange function of (11)

 L(U,Λ)=E(U)−12(⟨U⊤U⟩−IN)Λ (24)

for and , then the corresponding first-order necessary condition is as follows

 ∇UL(U,Λ) ≡ ∇E(U)−UΛ=0, (25) ∇ΛL(U,Λ) ≡ 12(IN−⟨U⊤U⟩)=0. (26)

We call a critical point of (11) if

 ∇GE(U)=0.

Obviously, for such a critical point, we have

 ∇E(U)=U⟨U⊤∇E(U)⟩,

which suggests

 ∇UL(U,⟨U⊤∇E(U)⟩)=0, ∇ΛL(U,⟨U⊤∇E(U)⟩)=0.

Thus we see that such a critical point may be a local minimizer.

As , we know that energy decreases monotonically, thus exists provided that is bounded from below. The following statement tells us the asymptotical behavior of the extended gradient flow(c.f. ). If is a solution of (20), then

 liminft→∞∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣=0. (27)

We see from Proposition 3.1 that

 ∫+∞0∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣2dt=−∫+∞0dE(U)dtdt=E(U(0))−limt→∞E(U(t))<+∞.

Since is nonnegative function, we have

 liminft→∞∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣=0.

Suppose that the local minimizer is the unique critical point of (11) in . For a fixed constant , we define

 E0=min{E([~U]) | [~U]∈¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯B([U∗],δ1)∖B([U∗],δ2)}.

Here and hereafter, we assume that as an operator from to , is continuous in .

If the initial value satisfies

 E(U0)⩽E0+E(U∗)2≡E1,

then

 limt→∞∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣=0, limt→∞E(U(t))=E(U∗), limt→∞dist([U(t)],[U∗])=0.

We obtain from Theorem 3.2 that there exists a sequence so that and . The uniqueness of critical point in implies . Due to we have

 [U(τk)]∈B([U∗],δ2)⋂LE1,

where is the level set. Since set

 S={~U∈(VNg)N:[~U]∈B([U∗],δ)⋂LE1}

is compact, there exist a subsequence and that . Since is continuous, then is also continuous, so . By the uniqueness of critical point in again, we get and

 limt→∞E(U(t))=liml→∞E(U(τkl))=E(U∗).

We claim that . Otherwise, there exists a subsequence that for some fixed , . Since is compact, there exist a subsequence and that . Therefore

 E(¯U)=limq→∞E(U(τpq))=E([U∗]),

and , which contradicts the assumption .

Clearly, there exists that

 |||U(t)P(t)−U∗|||=dist([U(t)],[U∗]), (28)

then

 limt→∞∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣=limt→∞∣∣∣∣∣∣∇GE(U(t)P(t))∣∣∣∣∣∣=∣∣∣∣∣∣∇GE(U∗)∣∣∣∣∣∣=0.

Indeed, we may have some convergence rate. If and

 HessGE(U)[D,D]⩾σ|||D|||2∀[U]∈B([U∗],δ3),∀D∈T[U]GN⋂(VNg)N (29)

for some and , then there exists such that

 ∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣⩽e−σ(t−^T), E(U(t))−E(U∗)⩽12σe−2σ(t−^T)

for all .

We see that

 12ddt∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣2=tr(⟨∇GE(U(t))⊤ddt∇GE(U(t))⟩)=tr(⟨∇GE(U(t))⊤∇2E(U(t))ddtU(t)⟩)−tr(⟨∇GE(U(t))⊤ddtU(t)⟩⟨U(t)⊤∇E(U(t))⟩)−tr(⟨∇GE(U(t))⊤U(t)⟩ddt(⟨U(t)⊤∇E(U(t))⟩)),

which together with Lemma 3.1 leads to

Note that Theorem 3.2 implies that there exists such that

 U(t)∈B([U∗],δ3),∀t⩾^T. (30)

Hence, we obtain from (29) that

 (31)

Using Grönwall’s inequality we arrive at

 ∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣2⩽e−2σ(t−^T),t⩾^T. (32)

Therefore, for all , there hold

 ∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣⩽e−σ(t−^T) (33)

and

 E(U(t))−E(U∗)=∫+∞t∣∣∣∣∣∣∇GE(U(t))∣∣∣∣∣∣2dt⩽12σe−2σ(t−^T). (34)

We understand that (29) has been already applied in [8, 23]. We observe that in (29) is related to the gap between the -th eigenvalue and the -th eigenvalue of the Kohn-Sham equation.

## 4 Temporal discretization

We may apply various temporal discretization approaches to solve (20). In this section, we propose and analyze a midpoint point scheme. Our analysis shows that the midpoint point scheme is quite efficient and recommended.

### 4.1 A midpoint scheme

Let be discrete points such that

 0=t0

and . Set

 Δtn=tn+1−tn, (36)

and consider a midpoint scheme as follows

 Un+1−UnΔtn=−∇GE(Un+1/2), (37)

where . Equivalently

 Un+1−UnΔtn=−AUn+1/2Un+1/2. (38)

Our midpoint scheme is an implicit method and we will propose and analyze a practical scheme to solve (38) in the next section.

First, we investigate the existence of the solution of (38) in a neighborhood of , which requires that is Lipschitz continuous locally

 |||∇E(U1)−∇E(U2)|||⩽L0|||U1−U2|||, ∀U1,U2∈B(U∗,δ1),

which is true for LDA when . However, it is still open whether . There exist such and a unique function which satisfies

 (39)

for some and . We define on by

 H(X,Y,t):=Y−X+t∇GE(Y+X2). (40)

Obviously, and exists. Since

 ∂∂YH(U∗,U∗,0)=I, (41)

we see from implicit function theory that there exists a unique function which satisfies for some . Thus we complete the proof.

Due to Lemma 4.1, we see that is the solution of (37). Then we arrive at the following Algorithm 1 and refer to Theorem 4.2 for the choice of .