1 Introduction
Topology optimization has became an indispensable tool in the design process, allowing for optimal distribution of a material within a provided space with respect to given criteria (such as minimization of compliance, maximization of output response of mechanisms, or increasing the lowest eigenfrequency) under given constraints (typically limiting the available material volume). In Solid Isotropic Material with Penalization (SIMP) method [bendsoe_topology_2004]—one of the most common approaches to topology optimization—the material distribution is parameterized with a scalar relative density field , with , which consequently governs the distribution of stiffness parameters in a domain such that
(1) 
where is a given penalization coefficient (usually ), helping the optimization procedure to achieve the desired “01” design without regions of intermediate densities, and and
are stiffness tensors of the solid material and voids, respectively. Note that while the stiffness tensor of the bulk material is physical, void stiffness serves only numerical purposes of avoiding an indefinite Hessian matrix. On one hand, it should be small enough in order to model voids properly, on the other hand too small values lead to illposed state problem. The optimal design is then usually soughtfor in an iterative manner, alternately solving the state equations with a fixed densities and updating design density variables according to the sensitivities computed for a current design.
Being a matured technology, topology optimization is now widely used in industrial applications with millions of unknowns, and first applications reaching billions of unknowns are emerging [aage_gigavoxel_2017]. Consequently, such scales of optimization problems often prohibit the use of direct solvers in favour of iterative ones, which are further combined with parallelisation. Domain decomposition (DD) techniques thus seem promising candidates; however, the presence of high contrast in coefficients due to the stiffness parameterization, recall Eq. (1), may cause slower convergence and consequently poor performance of iterative solvers based on DD, as has been already shown in [arul_feti_2020].
To make the task even more challenging, in this contribution, we focus in particular on modular topology optimization problems, recently introduced in [bib:TrussStructures, tyburec_modular_2022], in which the main domain is composed of a limited number of square modules. The multiple occurrence of individual modules can thus be readily exploited to accelerate calculations; on the other hand, the modularity automatically introduces fixed, regular partitioning, which—as will be shown later—is detrimental for the performance of DDbased solvers. Admittedly, the modular grid constitutes only the coarsest partitioning and more refined divisions of each module following e.g. the distribution of a material with the module can be performed, but we did not follow this possibility in our current study.
After recalling the fundamentals of two Finite Element Tearing and Interconnecting methods in Section 2, we review several modifications aimed at improving the performance and robustness of the solution strategy; see Section 3. Finally, in Section 4, the performance of the original methods and their modifications is assessed on two test sets: three academic problems from the literature and three geometries arising from three distinct snapshots of modular topology optimization of the emblematic Messerschmitt–Bölkow–Blohm beam.
2 Dual and dualprimal domain decomposition methods
Characteristically for domain decomposition methods, we assume that the original domain is partitioned into mutually disjoint subdomains with . The solution to the system of linear equations
(2) 
which arises from e.g. numerically discretized partial differential equations valid in
, is then sought for via a series of subdomainwise problems(3) 
with an additional constraint ensuring continuity of the solution across subdomain boundaries. Two major branches of domain decomposition methods can be distinguished based on the manner in which the aforementioned continuity is enforced: primal decomposition methods, such as the Schur Complement method [kruis_domain_2006], satisfy the continuity by construction of an approximation space, while the dual methods, e.g. Finite Element Tearing and Interconnecting (FETI) method [farhat_unconventional_1992], impose the continuity requirement as an equality constraint enforced via Lagrange multipliers. Finally, as its name suggests, the Finite Element Tearing and Interconnecting DualPrimal (FETIDP) method [bib:fetidp] adopted in our work combines both approaches.
In the following two subsections, we briefly summarize the essentials of both dual and dualprimal methods.
2.1 TotalFETI
The Total FETI method (TFETI), introduced by Dostál and coworkers [bib:tfeti] and adopted here as a representative of the dual domain decomposition methods, is closely related to the original FETI method [farhat_unconventional_1992] with the main difference being the way the Dirichlet boundary conditions (BC) are imposed. In TFETI, these BC are handled similarly to enforcing the continuity across subdomain boundaries, i.e. the constraints posed on the domainwise displacements read as
(4) 
where collects the domainwise unknowns such that
(5) 
and combines the individual ’s accordingly. Thus, in contrast to the standard FETI, constraint (4
) features a righthand side vector
that is in general nontrivial; in addition to the zero entries pertinent to the crossboundary continuity, it also contains values related to the Dirichlet BC. As a result, the number of rigid body modes of each domain is only problem specific (e.g. three for twodimensional mechanics problems) and factorization can be recycled for all occurrences of one module irrespective of whether the occurrence is supported or floating.Combining Eqs. (3) and (4) leads to
(6) 
where is a blockdiagonal matrix composed of and arises from concatenated similarly to in Eq. (5).
Assuming the is known and provided that is orthogonal to the nullspace of , can be uniquely determined up to rigid body modes of individual domains expressed as , where is a matrix containing the nullspace vectors of each domain and is a vector of rigid body modes coefficients. Without diving into technical details, for which we refer an interested reader to e.g. [dostal_total_2006], the original formulation can be recast into the dual form
(7) 
with
where denotes the MoorePenrose pseudoinverse of . Problem (7) is traditionally solved with a projected preconditioned conjugate gradient method.
2.2 FetiDp
In Finite Element Tearing and Interconnecting DualPrimal (FETIDP) method [bib:fetidp], the domainwise displacement field is decomposed into two parts:

boundary degrees of freedom
whose continuity is directly enforced in the primal unknowns, and 
remaining degrees of freedom that contain both internal degrees of freedom, which do not directly communicate with other domains, and boundary degrees of freedom whose continuity is enforced via equality constraints, similarly to Eq. (4).
Without loss of generality, we assume that is ordered such that
(8) 
Furthermore, we consider a global vector that stores the shared boundary degrees of freedom (DOFs) and a Boolean matrix such that
(9) 
The equality constraint ensuring continuity of the remaining then reads
(10) 
where—analogously to Eqs. (4) and (5)— and are concatenations of and , respectively.
Expressing the equilibrium (3) and the constraint (10) in terms of the quantities introduced in Eqs. (8)–(10) leads to
(11) 
with
Finally, due to the domainwise nature of the second row in Eq. (11), can be computed in parallel from the known and ; hence, plugging the expression for in terms of the remaining quantities into the first and the third rows of Eq. (11) leads to the final dualprimal form
(12) 
where
Note that DOFs in are chosen such that there is no need for a pseudoinverse of in contrast to TFETI. In FETIDP, DOFs related to corner nodes in a regular subdomain partitioning typically constitute .
Problem (12) is traditionally solved iteratively with a preconditioned conjugate gradient method with being condensed out, resulting in a problem dependent solely on [bib:fetidp].
3 Enhancements for heterogeneous domains
3.1 scaling
The merit of both formulations described above is that they can be conveniently parallelized thanks to their additive structure. In both, the solution is sought with an iterative conjugate gradient method considering only the first block of the corresponding equations (7) and (12), while the remaining blocks are incorporated either via projection (TFETI) or by condensating out the primal unknowns (FETIDP).
Distinctively for domain decomposition methods, the iterative procedure also benefits from a coarse problem preconditioner of either , recall Eq. (7), or , Eq. (12), constructed as a sum of local inverses
(13) 
where either

represents the Schur complement if the optimal Dirichlet preconditioner is used, or

or if a computationally cheaper approximations of is used, denoted as a lumped or superlumped preconditioner, respectively,
where extracts only the diagonal part of . Note that in FETIDP the unknown index sets and are subsets of only, since the unknowns in set are handled directly.
In the simplest setting, one can assume ; however, can be arbitrarily scaled provided that [bib:sfeti]
(14) 
Based on a mechanical reasoning, Rixen and Farhat [rixen_simple_1999] proposed scaling, which compensates for different stiffness coefficients across subdomain boundaries. In their approach, with the entries of the diagonal matrix obtained as
(15) 
where denotes the component of the stiffness matrix pertinent to the th row of and, similarly, is the corresponding coefficient of the stiffness matrix of the adjacent . In the case of multidomain constraints (which typically happen for corner nodes in TFETI), the authors of [rixen_simple_1999] advocate for the use of pairwise constraints despite the introduced redundancies. Consequently, only the denominator in Eq. (15) changes such that it sums the stiffness contributions from all related domains; see [rixen_simple_1999, Eq. 68]. Also note that Eq. (15) naturally falls back to multiplicity scaling in the case of domains composed of the same homogeneous material.
3.2 Full orthogonalization
For practical applications, orthogonalization of the current (projected) preconditioned search direction , which appears in the classical Preconditioned Conjugate Gradient method used to solve problems (7) or (12), with respect to the search directions from the previous iterations is often recommended, e.g. [bib:sfeti]. Consequently, this modification necessitates storing the previous directions and their norms. However, the faster convergence caused by this modification usually compensate for both the increased memory requirements and the orthogonalizationrelated calculations.
3.3 RankRevealing Simultaneous FETI
Finally, the last enhancement investigated in this work, is the simultaneous variant of both TFETI and FETIDP [bib:sfeti]. This modification was particularly developed to handle heterogeneous problems by exploiting the additive structure of the local preconditioners. Instead of considering only one preconditioned search direction in each iteration, it defines an individual search direction for each subdomain. Note that the original search direction can be restored with , where and collecting these individual directions as its columns.
As a result, each iteration requires solving a system of equations related to minimization and conjugation steps; however, the matrix as defined in Algorithm 1 may be illconditioned, e.g. due to a linear dependence of the search directions, and its pseudoinverse might need to be constructed. Here, we employ the rank revealing Cholesky factorization [bib:CholeskyFacotrizationwithPivoting] with permutation matrix to extract only the linearly independent search directions, which are subsequently orthonormalized; see lines 9–12 in Algorithm 1, which summarizes the strategy for TFETI.
4 Numerical tests
4.1 Academic problems
We test both approaches on two sets of illustrative examples. The first suite comprises three test problems studied, e.g., in [bib:sfeti] and shown in Fig. 1. Unlike Fig. 1, however, we keep the regular partitioning and do not investigate the effect of irregularity and bad aspect ratios of the decomposition. We included these problems primarily to verify our implementation against the convergence reported in [bib:sfeti] and also to compare the performance of all considered variants in these academic problems against the problems arising in topology optimization.
The first test is a laminated beam consisting of nine repeating square subdomains, each composed of seven alternating layers of stiff (blue) and compliant (green) material. The second test problem is a square grid of subdomains, again, with alternating layers. This problem features the corner crosspoint and hence different variants of enforcing the continuity of solution is expect to deliver different performance results. Finally, the last test problem is a grid of subdomains with a stiff inclusion immersed in each subdomain. Unlike the previous two problem, there are no jumps in stiffness coefficients and hence the convergence should be the fastest. All structures were clamped on the lefthand side and subjected to tangential and tensile normal traction on their righthand sides.
(a)  
(b)  (c) 
The two solution strategies (TFETI vs. FETIDP) in combination with the scaling options (stiffness vs. multiplicity) and the option for full orthogonalization or simultaneous search directions yield 12 variants in total. Figure 2 plots the history of the residual error defined as
(16) 
where the and are the projected residual and its preconditioned counterpart from Algorithm 1, respectively.
In accordance with expectations, the third problem with stiff square inclusions was indeed the easiest to solve and the performance of all variants did not vary significantly except for the most advanced simultaneous variants of both TFETI and FETIDP which converged in fewer than half of the iterations needed by the other variants. On the other hand, once the heterogeneity in stiffness coefficients reached subdomain boundaries (the first and the second test problems), the performance of the unmodified variants deteriorated rapidly. However, the importance of scaling (solid lines) in contrast to the simple multiplicity scaling (dotted lines) arises only in the second example due to the presence of the crosspoints; in the third problem for instance, the scaling falls back to multiplicity scaling as mentioned in Section 3.1.
(a)  

(b)  
(c) 
4.2 Topology optimization problems
The second test suite comprises several snapshots from the modular topology optimization [bib:TrussStructures] of a beam fixed at both bottom corners and loaded by a unit force acting at the midspan of the top edge; see Fig. 3. The beam is composed of 96 square modules of 16 types (depicted with distinct colours in Fig. 3). The modular composition was directly reflected in the beam’s partitioning into 96 subdomains. Each subdomain was discretized with bilinear quadrilateral finite elements, which resulted in 184,512 DOFs in total and over 10,000 DOFs in the interface problem of both TFETI and FETIDP.
In particular, we tested the described solution strategies on the , , and iteration snapshots of the topology optimization. Distribution of the base material, parameterized by the relative density (recall Eq. (1)), in these snapshots is shown in Fig. 4. As the topology optimization progresses, the initially uniform density gradually concentrates to the most efficient locations, which leads to a highly heterogeneous problem with significant jumps in stiffness coefficients along subdomain boundaries, making the problem difficult to solve with iterative methods.
(a)  
(b)  
(c) 
Similarly to Section 4.1, Figure 5 summarizes the convergence history of all 12 variants for the three topology optimization snapshots from Fig. 4.
(a)  

(b)  
(c) 
The later stages of the topology optimization in particular render challenging problems for TFETI and FETIDP solvers; while in the first snapshot, all methods except for TFETI with full orthogonalization successfully converged, in the last snapshot, only the simultaneous FETIDP with rankrevealing factorization succeeded.
5 Discussion
The numerical tests clearly demonstrate that the linear systems of equations arising in modular topology optimization are indeed challenging for domaindecomposition based iterative methods. The main culprit is the potentially unfavorable partitioning into subdomains, which leads to coefficients jumps across subdomain boundaries, and most importantly the high contrast in stiffness coefficients. These effects are partly present in the academic test problems, however, they are significantly more pronounced in the topology optimization problems up to the point at which most of the considered solution variants break or do not converge in a reasonable time.
The only variant successful in all tests was the simultaneous FETIDP with rankrevealing Cholesky decomposition. With incorporated scaling, this variant was also the fastest to converge, even though the convergence rate of TFETI with rankrevealing was usually comparable when it converged.
On the other side of the convergence spectrum were the original variants of TFETI and FETIDP. However, while featuring the slowest convergence, those methods proved to be relatively robust, unlike the modifications with full orthogonalization, which reduced the needed iterations only for mildly heterogeneous problems and broke completely for the final test problem. We conjecture that the rapid increase in the monitored residual in Fig. 5c is caused by the loss of numerical accuracy due to roundoff errors during the orthonormalization.
6 Summary
In this work, we investigated twelve variants of dual and dualprimal domain decomposition methods with particular emphasis on applications to highly heterogeneous problems with predefined partitioning which arise in modular topology optimization problems [bib:TrussStructures, tyburec_modular_2022]. We considered TotalFETI [bib:tfeti] and FETIDP [bib:fetidp] as the base methods, for which we tested two scaling approaches (scaling and multiplicity scaling), full orthogonalization, and simultaneous search directions with rankrevealing decomposition. We tested the variants on two suites of twodimensional elasticity problems: first one composed of three academic problems, the second suite comprising snapshots of modular topology optimization. All variants were compared in terms of the number of iterations needed to reach a desired residual . We are aware that such a comparison does not paint the whole picture and comparing the total wallclock time would be optimal; however, such a comparison would be heavily biased by the implementation and available hardware.
Based on our results, we conclude that the scaling should be a default setting when dealing with nonhomogeneous problems thanks to its significant impact on the convergence rate and negligible computational overhead. Our results also show that the original formulations of both TFETI and FETIDP exhibit very slow convergence in the presence of severe heterogeneity in stiffness coefficients and hence performance enhancements are needed. The simultaneous search directions with the rankrevealing Cholesky decomposition reduced the number of needed iterations the most and should be therefore preferred to full orthogonalization alone. Finally, FETIDP proved to be more robust than TFETI as it was the only method capable of reaching the desired residual in all test problems.
Following these conclusions, our current research interest lies in (i) an adaptive choice of the nodes whose compatibility should be enforced directly in primal variables (i.e., not necessarily choosing only the corner nodes for the primal part of FETIDP) and (ii) identification of boundary modes responsible for the bad convergence and eliminating from the iterative process in the spirit of [feti_geneo].
Comments
There are no comments yet.