Finite Element Method (FEM) computational electromagnetic models result in very large but sparse linear systems that are often solved iteratively in lieu of direct factorization methods, [direct-methods-for-sparse], that use significantly more memory and have higher asymptotic computational complexity. Yet, when these models involve many independent excitations, or complex materials, or multi-scale or near resonance structures, even the most advanced preconditioned iterative methods can lose effectiveness and efficiency. This problematic behavior, coupled with the recent proliferation of computing cores and RAM memory has lead to a renewed interest in sparse matrix direct solution methods in certain FEM computations in CEM. For all other situations, sparse direct solvers remain indispensable, yet as a ‘hidden’ component of preconditioned iterative methods e.g. corse-grid correction in multi-grid solvers.
Efficient implementations of classic sparse Cholesky or LDL factorizations for symmetric systems, or sparse LU methods, make use of various data and temporal locality improving computational schemes such as the multi-frontal [MUMPS1] or the left/right looking super-nodal [Schenk2004] algorithms to reduce cache misses and significantly improve speed. When coupled with advanced fill-in reducing matrix reordering schemes such as the multilevel nested dissection [METIS] they can result in powerful solvers with impressive performance.
This paper will explore weather the conventional approach of using highly optimized high-performance sparse direct solver libraries is indeed the best available option for the direct solution of FEM problems?
In our attempt to answer this question, an alternative, domain decomposition (DD) based, direct solution strategy (with numerical exact arithmetic
) for FEM models will be outlined and compared to state-of-the-art sparse matrix direct solvers. Unlike conventional approaches that are agnostic of the matrix physical model origins i.e. ‘black-box’ solvers, this direct DD strategy relies on a judicious re-formulation of the boundary value problem (BVP) as a decomposed BVP on a collection of domains. It is noted that the method is not restricted to Maxwell equations or electromagnetics, but is simply presented in the CEM context. The decomposed BVP and its subsequent mixed/hybrid variational formulation must be cast such that all domains are always regular while maintaining the self-adjoint nature of the overall problem. This will prove critical in achieving an efficient direct solution. Subsequent discretization with suitably chosen FEM basis in terms of primal degrees-of-freedom (DoFs) in domain volumes and dual DoF on domain interfaces leads to a large sparse arrowhead matrix that is then reduced in size by eliminating via many independent sparse direct factorizations the primal DoFs. This process is akin to the FETI reduction in iterative DDs, but with key appropriate modifications that will later facilitate a very efficient direct factorization. Formulation of the proposed approach can be found in[moshfegh2016direct, moshfegh2017memory].
The resultant auxiliary matrix is symmetric but orders of magnitude smaller than the original one, and is block-wise sparse (sparse collection of dense blocks), something computational very desirable in direct matrix factorizations because the high degree of data locality enables the use the blazing fast level 3 BLAS [DONGARRA1990]. As such, this matrix is LDL factorized via a very fast symbolic factorization step that is attributed clique graph of the block-wise sparse matrix, followed by a custom block LDL with restricted Bunch-Kaufman pivoting [BUNCH1977]. Once the the auxiliary problem has been efficiently factorized, the solution of all or a subset of primal unknowns are recovered in an embarrassingly parallel manner via many smaller size sparse forward backward substitutions.
A conceptual overview of the proposed approach and its conventional counterpart are depicted in Fig. 1. Although the proposed approach involves more computational steps, much like field computations with auxiliary potentials in electromagnetics, solving the auxiliary problem is considerably easier than the conventional one-shot approach, but also attaining the auxiliary problem and recovering the full solution from it are relatively fast and embarrassingly parallel. In a sense out approach instead of striving to minimize fill-in to achieve better memory and speed, it attempts to maximize the concurrency in sparse matrix computations, as well as the total amount and order of subsequent dense matrix operations, thus promising very favorable parallel and GPU processing prospects.
Two low to moderate complexity unstructured FEM models are used as test cases to compare the proposed method with MUMPS 5.0.2 [MUMPS1], and PARDISO-MKL11.1 [MKL_PARDISO] state-of-the-art ‘black-box’ direct sparse matrix solvers with MeTiS 5.1.0 reordering for all methods. The first model involves a dielectric sphere of progressively larger size, (domain increases in three dimensions), and a monopole antenna array problem again of progressively increasing size but also port number (excitations). The memory, factorization time and forward/backward substitution times of those test cases are presented in Figs 2 and 3. In both cases, the purposed direct DD method shows a clear edge in memory requirements, which is a major improvement on a key direct method weakness . Both factorization and forward backward substitution are on par for all methods with the proposed method being slightly slower.
For the moderate size FEM problems examined here it was found that when memory is at premium, ‘black-box’ direct solvers are not the best direct solution approach, the proposed direct DD method showed clear benefits as its memory appears to be comparable to preconditioned iterative solvers. When speed is important the highly optimized ‘back-box’ libraries maintain a slight edge over the proposed direct DD and possibly iterative solvers. It is believe that direct DD solvers will have very favorable parallel and GPU processing prospects.