I Introduction
Modern computational electromagnetics FEM tools often rely on memory efficient iterative solvers such as Multigrid [References] and DDM [References] that may experience convergence difficulties near resonances or multiscale problems, and lose efficiency at multiple excitation runs. Contrary, direct solvers such as MUMPS [References] or PARDISO are reliable but scale unfavorably and are hard to parallelize. Thus, recent trends in direct solvers [References] strive to reduce workload by leveraging lowrank approximations at the cost of accuracy and possibly reliability. Yet, these solvers are opaque to important underlying physics, leaving room for further improvements.
To achieve an efficient direct solver, one must start from scratch and attempt to leverage deep physical and numerical insights that may require reformulating the BVP and FEM assembling, in addition to the symbolic and numeric factorization stages. This is critical not only to avoid internal resonances at all intermediate factorization separators, but also to produce numerically efficient matrix structures i.e. reduced size, blockwise sparse symmetric matrices.
This work achieves all these via a direct DDM (DM) framework. A set of auxiliary variables is used to cast a decomposed BVP that, after an initial reduction/elimination step, leads to an auxiliary blocked matrix that is suitable for factorization. To attain maximal performance, this matrix is factored with a special blocked LDL method with restricted BunchKaufman pivoting [References].
The accuracy and performance of the proposed DM has been verified and tested in 3D scattering problems by perfect electric conductor (PEC) plates and dielectric spheres of progressively larger electrical sizes. The proposed DM solver requires less memory than MUMPS mainly due to the choice of structured separators and the absence of delayed pivots attributed to the interior resonance free formulation. An initial serial implementation of DM was up to two times slower than MUMPS for small problems but becomes competitive on problems larger than one million unknowns.
Ii Theory
Consider a computational domain decomposed into nonoverlapping subdomain . For example, a decomposed problem with four domains is shown in Fig. 1. The decomposed BVP reads, :
where , and . is the restriction operator from domain to interface , and denotes the neighbor of domain .
After transforming to , casting the variational problem and expanding trial and testing function spaces gives
(1) 
where is the diagonal blocked matrix of where is the FEMABC matrix for domain with loss or gain at the interfaces. is a blocked matrix of sparse matrices which maps the primal space to LM space. is the number of interfaces.
The reduced matrix is symmetric blockwise sparse but indefinite. Hence LDL factorization with symmetric partial pivoting a.k.a BunchKaufmann LDL [References] can be used to save memory and CPU time. Since is a blockwise sparse matrix, we have modified the BunchKaufman LDL factorization to its block restricted partial pivoted form. Each block in corresponds to a supernode of typical order . Therefore, DM consistently operates at the maximum performance region of Level 3 BLAS. The main steps of the proposed DM are:

Generate dense domain matrices ,

Assemble the blockwise sparse reduced matrix ,

Reorder the clique graph of ,

Symbolic factorize the reordered clique graph,

Factorize with restricted BK pivoting block LDL factorization (see algorithm 1),

Forward/Backward substitute the reduced system for auxiliary unknowns,

Recover primal unknowns.
The clique graph of blocked sparse matrix is reordered using METIS (same as MUMPS). Assuming that the clique graph has levels, the algorithm of block LDL factorization is given in Algorithm 1. A multifrontal version of block LDL can be used to further speed computations.
Iii Numerical Results
First, the scattering of progressively larger PEC plates (from up to ) are considered. Computational complexity of factorization time and memory for these problems using the proposed DM and MUMPS are shown in Fig. 2. It is noted that a 3M unknown problem is solved with only 10 GB of RAM at full double precision accuracy.
Next, the scattering of progressively larger dielectric spheres is considered. Computational complexity of factorization time and memory for these problems using the proposed DM and MUMPS are shown in Fig. 3. Again, the proposed method uses more than 2.25 times less memory and surprisingly better time complexity than MUMPS. The relative residual error () of all runs using the proposed DM is around which was the same as MUMPS.
References
 [1] J. Moshfegh, D. G. Makris, and M. N. Vouvakis,“Parallel Direct Domain Decomposition Methods (DM) for Finite Elements.” 2019 IEEE International Symposium on Antennas and Propagation and USNCURSI Radio Science Meeting, pp. 777–778.
 [2] Y. Zhu and A. C. Cangellaris, Multigrid finite element methods for electromagnetic field modeling. John Wiley & Sons, Vol. 28, 2006.
 [3] A. Toselli and O. Widlund, Domain decomposition methods: algorithms and theory, Vol. 3. Berlin: Springer, 2005.
 [4] P. R. Amestoy, I. S. Duff, J. Y. L’Excellent, and J. Koster, “A fully asynchronous multifrontal solver using distributed dynamic scheduling,” SIAM Journal on Matrix Analysis and Applications Vol. 23, No. 1, pp.1541, 2001.
 [5] P. G. Schmitz, and L. Ying, “A fast nested dissection solver for Cartesian 3D elliptic problems using hierarchical matrices,” Journal of Computational Physics Vol. 258, pp.227245, 2014.
 [6] J. R. Bunch, L. Kaufman, “Some stable methods for calculating inertia and solving symmetric linear systems,” Math. Comp., Vol. 31, No. 137, pp. 163–179, 1977.
Comments
There are no comments yet.