1 Introduction
KohnSham Density functional theory (DFT) is one of the most successful approximate models in the study of manybody system. Its application has covered many practical application areas , ranging from chemistry, physics, materials science, chemical engineering, etc. With rapid development of the hardware, new algorithms and acceleration techniques are desired to keep improving the efficiency of the simulations in density functional theory.
So far, lots of numerical methods for solving KohnSham equation have been developed. For instance, planewave method is the most popular method in the computational quantum chemistry community. Owing to the independence of the basis function to the ionic position, planewave method has advantage on calculating intermolecular force. Combined with the pseudopotential method, planewave method plays an important role in the study of the ground and excited states calculations, and geometry optimization of the electronic structures. Although the planewave method is popular in the computational quantum chemistry community, it is inefficient in the treatment of nonperiodic systems like molecules, nanoclusters, etc., or materials systems with defects, where higher basis resolution is often required in some spatial regions and a coarser resolution suffices elsewhere. Furthermore, the planewave method uses the global basis which significantly affect the scalability of computations on parallel computing platforms. The atomicorbitaltype basis sets [16, 19, 42] are also widely used for simulating materials systems such as molecules and clusters. However, they are well suited only for isolated systems with special boundary conditions. It is difficult to develop a systematic basisset for all materials systems. Thus over the past decade, more and more attention have been attracted to develop efficient and scalable realspace techniques for electronic structure calculations. For more information, we refer to [5, 6, 9, 10, 15, 28, 35, 36] and references therein for a comprehensive overview.
Among all those realspace methods for KohnSham equation, the finite element method is a very competitive one. The advantages of finite element method for solving the partial differential equations include that it can use unstructured meshes and local basis sets, and it is scalable on parallel computing platforms. So far, the application of the finite element method on solving KohnSham equation has been studied systematically. Please refer to
[3, 7, 14, 17, 23, 24, 27, 30, 31, 32, 34, 37, 38, 39, 40, 41, 47], and references therein.Efficiency improvement is a key issue in the development of the method towards the practical simulations. Many acceleration techniques such as adaptive mesh methods, multigrid preconditioning in solving the eigenvalue problems, parallelization, have been studied in depth for the purpose. It is noted that Xie and his coworkers proposed and developed a multilevel correction technique for solving eigenvalue problems
[12, 20, 25, 43, 44, 45]. With this technique, solving the nonlinear eigenvalue problem defined on the finest mesh can be transformed to solving the linear boundary value problem on the finest mesh and a fixed and low dimensional nonlinear eigenvalue problem. Fruitful results have been obtained to successfully show the capability of the multilevel correction method on improving the efficiency. In [18], a numerical framework consisting of the multilevel correction method and the adaptive mesh method is proposed for solving the KohnSham equation. Similar to original idea of the multilevel correction method, the nonlinear eigenvalue problem derived from the finite element discretization of the KohnSham equation is fixed in a relatively coarse mesh, while the finite element space built on this coarse mesh is kept enriching by the solutions from a series of boundary value problems derived from the KohnSham equation. Here the adaptive mesh method is used to tailor a fitting nonuniform mesh for the derived boundary value problem, so that the finite element space for the nonlinear eigenvalue problem can be improved well. The performance of proposed solver for the KohnSham equation has been checked by the following works [12, 18, 20, 44, 45].In this paper, the efficiency of the proposed numerical method in [18] will be further improved, based on following two approaches. The first approach is based on an observation of significant difference of contributions for total energy of the system from two nonlinear terms in the hamiltonian, i.e., Hartree potential and exchangecorrelation potential. It is the nonlinearity introduced by these two potentials which makes the analysis and calculation of KohnSham equation nontrivial. Unlike the Hartreen potential which has an exact expression, there is no exact expression for the exchangecorrelation potential. Hence, an approximation such as local density approximation (LDA) is needed for the study. An interesting observation for the difference between Hartree and exchangecorrelation potentials can be made, based on following results from NIST standard reference database 141 [21].
E_tot  E_kin  E_har  E_coul  E_xc 
847.277  846.051  355.232  2008.741  39.319 
It is clearly seen that the magnitude of the exchangecorrelation energy (E_xc) is just around 10% of the Hartree energy (E_har). It also should be noted that similar comparisons exist for all other elements in the period table. Such observation brings us a chance to further improve the efficiency of the algorithm by separately considering two nonlinear terms. The idea is to reduce the computational resource for handling the nonlinearity introduced by the Hartree potential. which is realized by designing a new iteration scheme to replace the traditional selfconsistent field iteration scheme in this paper. The new scheme consists of two nested iteration schemes, i.e., an outer iteration for resolving the nonlinearity from the exchangecorrelation potential, and an inner iteration for resolving the nonlinearity from the Hartree potential. Although there are two iterations in our algorithm, it is noted that the outer iteration process for the nonlinearity of the exchangecorrelation term always be done around 10 times in our numerical experiments for molecules from a lithium hydride (2 atoms) to a sodium cluster (91 atoms). For the inner iteration, dozens of iterations are needed in each outer iteration at the beginning. However, with the increment of the outer iterations, the number of the inner iterations decreases significantly. Hence, the total number of the iterations (inner iterations by outer iterations) of our method is fairly comparable with that of a standard SCF iteration. However, due to the separation of two nonlinear terms and the framework of multilevel correction method, the calculation involving the basis function defined on the fine mesh can be extracted separately and precalculated, so that the computational work of the inner iteration only depends on the dimension of the coarse space. With this strategy, a large amount of CPU time can be saved in the simulations, compared with the original multilevel correction method [18].
To further improve the efficiency, the eigenpairwise parallelization of the algorithm based on message passing interface (MPI) is also studied in this work. One desired feature for the algorithm is an wavefunctionwise parallelization, based on which the calculation can be done for each wavefunction separately, and a significant acceleration for the overall simulation can be expected. However, this is quite nontrivial for a problem containing eigenvalue problem because of the possible orthogonalization for all wavefuntions needed during the simulation. One attractive feature of multilevel correction method is that the wavefunctionwise parallelization can be partially realized in the sense that the correction of the wavefunction can be done individually in the inner iteration. In this work, we have redesigned the algorithm to fully take advantage of this feature. It is worth to rementioning another feature of the multilevel correction method is that the eigenvalue problem is solved on the corasest mesh, which can be solved effectively. By combining these two strategies, a dramatic acceleration for the simulation can be observed clearly from the numerical experiments.
The outline of this paper is as follows. In Section 2, we recall the multilevel correction adaptive finite element method for solving KohnSham equation. In Section 3, we construct an accelerating multilevel correction adaptive finite element method which can further improve the solving efficiency for KohnSham equation. In Section 4, some numerical experiments are presented to demonstrate the efficiency of the presented algorithm. Finally, some concluding remarks are presented in the last section.
2 Multilevel correction adaptive finite element method for KohnSham equation
In this section, we review the multilevel correction adaptive finite element method for KohnSham equation (see [18]). To describe the algorithm, we introduce some notation first. Following [1], we use to denote Sobolev spaces, and and to denote the associated norms and seminorms, respectively. In case , we denote and , where is in the sense of trace, and denote . In this paper, we set for simplicity.
Let be the Hilbert space with the inner product
(1) 
where in this paper. For any and a subdomain , we define and
Let be a subspace of with orthonormality constraints:
(2) 
where .
We consider a molecular system consisting of nuclei with charges , , and locations , , , respectively, and electrons in the nonrelativistic and spinunpolarized setting. The general form of KohnSham energy functional can be demonstrated as follows
(3) 
for . Here, is the Coulomb potential defined by , is the electronelectron Coulomb energy (Hartree potential) defined by
(4) 
and is some real function over denoting the exchangecorrelation energy.
The ground state of the system is obtained by solving the minimization problem
(5) 
and we refer to [11, 8] for the existence of a minimizer under some conditions.
The EulerLagrange equation corresponding to the minimization problem (5) is the well known KohnSham equation: Find such that
(6) 
where is the KohnSham Hamiltonian operator as
(7) 
with and . The variational form of the KohnSham equation can be described as follows: Find such that
(8) 
In the ground state of the electronic system, electrons will occupy the orbitals with the lowest energies. Hence, it corresponds to finding the left most eigenpairs of KohnSham equation. If the spin polarization is considered, the number of the eigenpairs is the same to the number of the electrons. Otherwise, only half eigenpairs are needed. For simplicity, we only consider the spinunpolarized case in this paper.
In order to define an efficient way to treat the nonlinear Hartree potential term, we introduce the mixed formulation of the KohnSham equation. It has been observed from the numerical practice that the term provides the main nonlinearity, where is the Hartree (electrostatic) potential and its analytical form can be obtained by solving the following Poisson equation:
(9) 
with a proper Dirichlet boundary condition.
Now, let us define the finite element discretization of (18). First we generate a shape regular decomposition of the computing domain and let denote the interior edge set of . The diameter of a cell is denoted by and the mesh diameter describes the maximum diameter of all cells . Based on the mesh , we construct the linear finite element space denoted by . Define and it is obvious that .
In order to recover the singularity of KohnSham equation, the adaptive finite element method (AFEM) is the standard way. With the adaptive mesh refinement guided by the a posteriori error estimators, the AFEM can produce an efficient discretization scheme for the singular problems. The total amount of the mesh elements should be controlled well to make the simulation continuable and efficient for the KohnSham equation. Based on the above discussion, adaptive mesh method is a competitive candidate for the refinement strategy. A standard AFEM process can be described by the following way
Solve Estimate Mark Refine.
More precisely, to get from , we first solve the discrete equation on to get the approximate solution and then calculate the a posteriori error estimator on each mesh element. Next we mark the elements with big errors and these elements are refined in such a way that the triangulation is still shape regular and conforming.
In our simulation, the residual type a posteriori error estimation is employed to generate the error indicator. First, we construct the element residual and the jump residual for the eigenpair approximation as follows:
(11)  
(12) 
where is the common side of elements and with the unit outward normals and , respectively. Let be the set of interior faces (edges or sides) of , and be the union of element sharing a side with . For , we define the local error indicator by
(13) 
Given a subset , we define the error estimate by
(14) 
Based on the error indicator (14), we use the Dörfler’s marking strategy [13] to mark all elements in for local refinement.
Since solving largescale nonlinear eigenvalue problem is quite timeconsuming compared to that of boundary value problem, a multilevel correction adaptive method for solving KohnSham equation was designed in [18, Algorithms 1 and 3]. The multilevel correction method transforms solving KohnSham equation on the adaptive refined mesh to the solution of the associated linear boundary value problems on and nonlinear eigenvalue problem in a fixed low dimensional subspace which is build with the finite element space on the coarse mesh and finite element functions . In the multilevel correction adaptive method for solving KohnSham equation, we need to solve linear boundary value problems in each adaptive space and a smallscale KohnSham equation in the correction space . Since there is no eigenvalue problem solving in the fine mesh , the multilevel correction adaptive method has a better efficiency than the direct AFEM. The dimension of the correction space is fixed and small in the multilevel correction adaptive finite element method. But we need to solve a nonlinear eigenvalue problem in the correction space . Always, some type of nonlinear iteration steps are required to solve this nonlinear eigenvalue problems. When the system includes large number of electrons, the number of required nonlinear iteration steps is always very large. Furthermore, the correction space contains basis functions defined in the fine space . In order to guarantee the calculation accuracy, we need to assemble matrices in the fine space when it comes to the basis functions of defined on the fine mesh . This part of work depends on the number of electrons and dimension of as . In order to improve the efficiency of the correction step, a new type of nonlinear iteration method and an eigenpairwise parallel correction method with efficient implementing techniques will be designed in the next section.
3 Further improvement of the method towards the efficiency
In this section, we give a new type of AFEM which is a combination of the multilevel correction scheme, nonlinearity separating technique and an efficient parallel method for the Hartree potential. The nonlinearity separating technique is designed based on the different performance of the exchangecorrelation and Hartree potentials. We will use the self consistent field (SCF) iteration steps to treat the weak nonlinear exchangecorrelation potential and the multilevel correction method for the strong nonlinear Hartree potential. Furthermore, combining the eigenpairwise parallel idea from [46] and the special structure of the Hartree potential, the Hartree potential can be treated by the parallel way and an efficient implementing technique.
3.1 A strategy of separating nonlinear terms
The nonlinearity of the KohnSham equation comes from Hartree and exchangecorrelation potentials. As discussed in Section 1, the Hartree potential has a strong nonlinearity which leads the main nonlinearity of the KohnSham equation, while the nonlinearity of the exchangecorrelation potential is weak. Based on such a property, we modify the standard SCF iteration into a nested iteration which includes outer SCF iteration for the exchangecorrelation potential, and inner multilevel correction iteration for the Hartree potential. Furthermore, we will also design an eigenpairwise parallel multilevel correction method for solving the nonlinear eigenvalue problems associated with the Hartree potential in the inner iterations.
Actually, this section is to define an efficient numerical method to solve the KohnSham equation on the refined mesh with the proposed nested iteration here. Different from the standard SCF iteration, we decompose the SCF iteration into outer iteration for exchangecorrelation potential and inner iteration for the Hartree potential. This type of nonlinear treatment is based on the understanding of the nonlinearity strengthes from the Hartree and exchangecorrelation potentials, respectively. In the practical models and computing experience, the nonlinearity of Hartree Potential is stronger than that of the exchangecorrelation potential.
Given an initial value for the KohnSham equation, we should solve the following nonlinear eigenvalue problem in each step of the outer SCF iteration method for the exchangecorrelation potential: For , find such that
(15) 
where
(16) 
and
(17) 
In the above SCF iteration, only the exchangecorrelation potential is linearized. Thus, we still need to solve a nonlinear eigenvalue problem in each iteration step, and the nonlinearity is caused by the Hartree potential. Because the nonlinearity of the exchangecorrelation potential is weak, so only a few steps are needed for the above SCF iteration. Further, for the involved nonlinear eigenvalue problem whose nonlinearity is caused by the Hartree potential, we can design a new strategy to solve it efficiently.
It is obvious that equation (15) is still a nonlinear eigenvalue problem with the nonlinear term of Hartree potential. Now, we spend the main attentions to design an efficient eigenpairwise parallel way to solve the nonlinear eigenvalue problem (15). The idea and method here come from [18, 46, 48]. This type of method is built based on the low dimensional space defined on the coarse mesh . But the method in this paper is the eigenpairwise parallel way to implement the augmented subspace method which is different from [18]. Compared with the standard SCF iteration method for KohnSham equation, there is no inner products for orthogonalization process in the high dimensional space , which is always the bottle neck for parallel computing.
In order to define the eigenpairwise multilevel correction method for the Hartree potential, we transform the KohnSham equation (8) into the following equivalently mixed form: Find such that
(18) 
where the function set is defined by the trace of Hartree potential on the boundary
In the practical computation, the discrete form of (18) can be described as follows: Find such that
(19) 
where the set includes linear finite element functions with the boundary condition which is computed by the fast multipole method [3, (29)].
The corresponding scheme is defined by Algorithm 1, where we can find that the augmented subspace method transforms solving the nonlinear eigenvalue problem into the solution of the boundary value problem and some small scale nonlinear eigenvalue problems in the low dimensional space .
(20) 
(21) 
(22) 
The linear boundary value problems (20) and (21) in Algorithm 1 are easy to be solved by the efficient and matured linear solvers. For example, the linear solvers based on the multigrid method always give the optimal efficiency for solving the second order elliptic problems. Compared with the multilevel correction method defined in [18], the obvious difference is that Algorithm 1 treats the different orbit associated with , , indepdently. This way can avoid doing inner products for orthogonalization in the high dimensional space . Furthermore, we will design an efficient numerical method for solving the nonlinear eigenvalue problem (22) with a special implemenmting techqniue in the next subsection.
3.2 New algorithm and its parallel implementation
In this subsection, we further study two important characteristics of Algorithm 1, which will help to accelerate the solving efficiency. The first characteristic is that the SCF iteration for the nonlinear eigenvalue problem (22) can be implemented efficiently. The idea here comes from [46, 48]. The second characteristic is that Algorithm 1 is suitable for eigenpairwise parallel computing.
We begin with the first characteristic of Algorithm 1 by introducing an efficient implementation of the iteration method for the nonlinear eigenvalue problem (22). The corresponding implementation scheme is described by Algorithm 2.

For , solve the following linear eigenvalue problem.
(23) Solve (2
) and choose the eigenfunction
that has the largest component in span among all the eigenfunctions. 
Solve the following elliptic problem: Find such that
(24) 
If , stop the iteration. Else set and go to (a) to do the next iteration step.
Different from the standard SCF method, the one in Algorithm 2 use the mixed form of KohnSham equation to do the nonlinear iteration. Even the mixed form is equivalent to the standard one, it provides a chance to design an efficient implementing way to do the nonlinear iteration. This means the remaining part of this subsection is to discuss how to perform Algorithm 2 efficiently based on the special structure of the correction spaces and . The designing process for the efficient implementing way also shows a reason to treat the Hartree potential and exchangecorrelation potential separately.
From the definition of Algorithm 2, we can find that solving eigenvalue problem (2) and linear boundary value problem (24) need very small computation work since the dimensions of and are very small. But both and include finite element functions defined on the finer mesh . In order to guarantee the accuracy, we need to assemble the matrices and right hand side terms in (2) and (24) in the finer mesh which needs the computational work . Based on this consideration, the key point for implementing Algorithm 2 efficiently is to design an efficient way to assemble the concerned matrices and right hand side terms. For the description of implementing technique here, let denotes the Lagrange basis function for the coarse finite element space .
In order to show the main idea here, let us consider the matrix version of eigenvalue problem (2) as follows
(25) 
where , , and , , .
Since it is required to solve linear eigenvalue problems which can be assembled in the same way, we use and to denote and , respectively, for simplicity. We should know that the following process is performed for each SCF iteration. Here, for readability, the upper indices used in our algorithm is also omitted.
It is obvious that the matrix
, the vector
and the scalar will not change during the nonlinear iteration process as long as we have obtained the function . But the matrix , the vector and the scalar will change during the nonlinear iteration process. Then it is required to consider the efficient implementation to update the the matrix , the vector and the scalar since there is a function which is defined on the fine mesh . The aim here is to propose an efficient method to update the matrix , the vector and the scalar without computing on the fine mesh during the nonlinear iteration process. Assume we have a given initial value for Hartree potential. Now, in order to carry out the nonlinear iteration for the eigenvalue problem (25), we come to consider the computation for the matrix , the vector and the scalar .From the definitions of the space and the eigenvalue problem (2), the matrix has the following expansion
(26)  
Here , remain unchanged during the inner nonlinear iteration. During the inner loop, the exchangecorrelation potential remains unchanged, so the matrix also remains unchanged during the inner nonlinear iteration. Thus we can assemble , , in advance, and call these data directly in each iteration step. The matrix will change during the nonlinear iteration because will change. But is defined on the coarse space , which can be assembled by the small computational work .
The vector has the following expansion
(27)  
Because is a basis function of the correction space , so remain unchanged during the inner nonlinear iteration. So we can compute these three vectors in advance. In order to compute , we first assemble a matrix in the fine space
(28) 
which will be fixed during the inner nonlinear iteration and can be computed in advance. Let us define , and . Then, .
The scalar has the following expansion
(29)  