I Introduction
Perfectly matched layer (PML) [1, 2] is often used in finite difference [3], finite element [4], and discontinuous Galerkin timedomain (DGTD) methods [5, 6, 7, 8, 9, 10] to imitate/approximate the radiation boundary condition (i.e., truncate a unbounded physical domain to a finite computation domain) while solving Maxwell equations or the wave equation.
The performance of the PML depends on the attenuation coefficient (which is implemented as conductivity in Maxwell equations) and the thickness of the layer. Increasing either one or both of them increases the absorption inside the PML. However, in practice, one cannot use a constant and high conductivity as it would increase the numerical reflection at the interface between the PML and computation domain, or use a very thick layer since it would increase the computational cost. Therefore, a smoothly increasing conductivity profile is often used to achieve both high absorption and small numerical reflection [1, 2, 11, 3, 4, 12, 13].
In DGTD, the PML conductivity profile can be implemented in two different ways. The first method assumes that the conductivity is constant in a given element. This constant might for example be set to the value of the conductivity profile at the center of the element. Implementation of this method is rather straightforward since the mass matrices of different elements only differ by a constant (for linear elements) [8]. However, the conductivity profile becomes discontinuous between neighboring elements [Fig. 1 (a)] and the element surfaces that are not parallel to the PML interface lead to large reflections and destroy the highorder accuracy of the solution. One workaround is to build layered (tetrahedral) meshes [Fig. 1 (b)] or use orthogonal (hybrid) meshes inside the PML [7, 14], and accordingly set a layered conductivity profile. But this makes the setups of the computation domain and the PML rather tedious since one needs to control the mesh and the conductivity on all face, edge, and corner regions of the PML. Moreover, to reduce the numerical reflection by decreasing the conductivity discontinuity between neighboring mesh (or conductivity) layers, their number has to be increased.
The second method allows the conductivity to vary inside a given element (following the increasing conductivity profile inside the PML). It should be noted here that (higherorder) DGTD allows for sampling of the material properties at the subelemental level. In this case, the behavior of the PML is determined only by the conductivity samples within the elements and therefore the mesh interfaces can be aligned arbitrarily [Fig. 1 (c)] without adversely affecting its performance. Indeed, this second approach can improve the PML performance [15, 16, 17]. However, the conductivity varying within the elements results in an elementdependent mass matrix. Therefore, a direct implementation requires a different mass matrix (or its inverse) to be stored for every element [9, 7, 18, 15, 16, 14]. This significantly increases DGTD’s memory footprint. For example, for the stretchedcoordinate (SC)PML [9], the memory required to store the mass matrices scales with , where
is the number of interpolating nodes in each element,
is the number of elements in the PML, andcomes from the five materialdependent coefficients in the update equations and three Cartesian components of the vector field. In contrast, for the first method, where the conductivity is assumed constant in a given element, this memory requirement scales with
since only the constant conductivity of each element and a single reference mass matrix are stored.In this work, a memoryefficient method to implement the SCPML with smoothlyvarying attenuation coefficients in DGTD is developed. The proposed method allows the conductivity to vary inside the elements and constructs the resulting local mass matrices using a weightadjusted approximation (WAA) [19]. Compared to the direct implementation that is briefly described above, the proposed method reduces the memory requirement to , where , while maintaining the PML performance. It should be noted here that the WAA has been proved to be energystable and preserve the highorder convergence of DGTD [19, 20, 21]. Indeed, numerical examples presented here also show the proposed method maintains the higherorder accuracy of the solution. Additionally, it is demonstrated that the PML with smoothlyincreasing conductivity profile as implemented with the proposed method performs better than the PML implemented using elementwise constant conductivity profile.
Ii Formulation
Iia WAADGTD for SCPML
Maxwell equations in stretchedcoordinates for a sourcefree and lossless medium can be expressed as [2]
(1)  
(2) 
where and are the electric and magnetic fields, and are the permittivity and the permeability, is the frequency, and
(3) 
The coordinatestretching variables , , are defined as [2, 9, 13]
(4) 
where and are onedimensional positive real functions along direction . Here, is the attenuation coefficient that ensures absorption inside the SCPML and changes the propagation speed (and also increases the absorption for evanescent waves). It is well known that using smoothlyincreasing and reduces the numerical reflection from the interface between the PML and the computation domain while maintaining a high absorption rate, and therefore improves the PML performance [11, 12, 13].
The timedomain update equations in SCCPML can be expressed as [9]
(5)  
(6)  
(7)  
(8) 
where and are auxiliary variables introduced to avoid computationally costly temporal convolutions while converting (1) and (2) into time domain [9] and , , , , and
are diagonal tensors with entries defined as
(9) 
In (9) and the rest of the text follows the permutation .
Following the standard nodal discontinuous Galerkin method [8], the computation domain and the PML are discretized into elements with volumetric support and boundary surface , and in each element , , and are expanded using the Lagrange polynomials [8], , where is the number of interpolating nodes and is the order of the Lagrange polynomials. Finally, Galerkin testing yields the semidiscrete system of equations as
(10)  
(11)  
(12)  
(13) 
Here, , , , and are vectors storing the unknown timedependent coefficients of the relevant basis functions, and , , are the mass matrices with entries
(14)  
(15) 
denotes the curl operator with its component along direction is given by
where , , is the component of the numerical flux along direction , which in general involves unknowns from the current element and its neighboring element [8, 22, 23], and are the stiffness and the face mass matrices with entries
(16)  
(17) 
and and are the permittivity and the permeability (assumed constant) in each element. Note that, the nodal DG framework [8] is used here, but the proposed method can be easily extended to vector DG methods [6, 7, 9, 24, 25, 26].
For linear elements, the mass matrices in (14) are simply scaled versions of the mass matrix defined on the reference element, , where is the Jacobian of the coordinate transformation between element and the reference element. Hence, only and (scalar constant) are stored. Similarly, in (15), if is assumed constant inside the elements, i.e., , then and one only needs to store in addition to and . In this case, (10)(13) can be implemented as efficiently as the case without the PML [8, 27, 28].
However, if is allowed to vary inside the elements, are different in different elements and in general there is no simple relationship between these different mass matrices. One has to store every one of these mass matrices (or their inverse). As an alternative, the mass matrix can be recomputed at each time step, but this would significantly increase the cost of time marching [8]. The memory required to store the mass matrices in (10)(13) scales with per element, where is the number of the components of the vector field, is the number of the coefficients , , , , and . Note that this memory requirement is significantly higher than that of storing the unknowns coefficients of the basis functions, which scales with in the PML.
To reduce the memory requirement of implementing (15) with allowed to vary inside the elements, WAA [19] is used. It has been shown that with this approximation DGTD retains provable energystability and highorder accuracy [19, 20, 21]. Note that in the above SCPML formulation, directly multiplying (5) and (6) with on both sides reduces the number of elementdependent mass matrices to . But this would result in a nonconservative form, whose solution is neither provably energystable nor provably highorder accurate [8, 19].
First, a weightadjusted inner product is introduced to approximate the parameterweighted inner product in the expression of the mass matrix [19]. The mass matrix, which is associated with the element and a locally varying coefficient , is approximated as
(18) 
Since is used in (10)(13) (for ), one needs to calculate . Under the nodal DG framework [8],
(19) 
where , , are the Gaussian quadrature nodes corresponding to the quadrature rules of degree and are the corresponding weights. Hence,
(20) 
where is an interpolation matrix defined on the reference element, , and are generalized Vandermonde matrices with entries and , respectively, is the th orthonormal polynomial basis [8], is also elementindependent, and is a diagonal matrix containing the coefficients evaluated at the quadrature nodes.
Inverting (18) and substituting (20) yield
(21) 
Here, is introduced to simplify the implementation. In (21), and are defined on the reference element, and is a scaled version of the reference matrix, .
The update equations (10)(13) contain multiplications between elementdependent mass matrices. To reduce the number of arithmetic operations, the following operators are defined
(22)  
(23)  
(24)  
(25) 
where (21) is used for and (20) is used for . These operators can be directly used on the right hand sides of (10)(13). Substituting (22)(25) into (10)(13) yields
(26)  
(27)  
(28)  
(29) 
Equations (26)(29) can be implemented in a matrixfree manner just like it is done in classical DG implementations [8, 9, 10, 27, 29, 28, 30].
IiB Computational complexity
In DGTD with explicit time marching, all operations are localized within the elements. The memory required to store the mass matrices in the direct implementation of (10)(13) scales with , where comes from the number of unknown components times the number of different mass matrices associated with different coefficients and is the number of elements in the PML. In the WAA formulation (26)(29), the memory requirement reduces to , where comes from the number of unknown components times the number of coefficient samples at the quadrature points and comes from and defined on the reference element. For simplicial quadrature rules that are exact for up to polynomials of degree , [31, 32].
To compare the number of arithmetic operations required by the two implementations, one should first note that the curl operator is the same in both formulations. Computation of requires those of the spatial derivatives and the numerical flux [33, 34, 35]. Here, the memory access time is much more significant than the time required to carry out these computations because data from neighboring elements, which are discontinuous in memory, is required. Therefore, only the times required to complete the arithmetic operations of the remaining terms are compared. For the same reason, in practice, the time required to compute
dominates the overall time required by the time marching, and the difference in the numbers of arithmetic operations as estimated below for the remaining terms is less significant (see the example in Section
III).In (10), the three matrixvector multiplications and two vectorvector additions require multiplication operations and addition operations, respectively. In (26), the multiplication of with a vector of length , and the multiplication of with a vector of length require multiplication operations. The multiplication of a diagonal matrix with a vector (such as ) requires multiplication operations. As a result, excluding the computation of , (26) requires multiplications and additions. For the auxiliary variable, the cost of (12) is multiplications and subtractions, while (28) requires multiplications and subtractions. One can see the number of operations in the WAA implementation is slightly higher than that in the direct implementation. But as mentioned above the time required by these operations is smaller than the time required to compute , and therefore overall times required by the two implementations are not that different.
Iii Numerical Examples
In this section, the accuracy and the efficiency of the proposed WAA formulation are compared to those of the traditional PML implementations using numerical examples. To this end, four PML configurations/implementations are considered in these examples: (i) and/or , , are assumed constant inside the elements on a paved mesh (ECpaved) [Fig. 1(a)], (ii) and/or , , are assumed constant inside the elements on a layered mesh (EClayered) [Fig. 1(b)], (iii) and/or , , are allowed to vary inside the elements on a paved mesh (SVpaved) [Fig. 1(c)], and (iv) same configuration in (iii) but implemented using the proposed method with the WAA (SVWAApaved) [Fig. 1(c)]. In all implementations, the order of the Lagrange polynomials , which results in .
For configuration (i), the constant values in a given element are obtained by sampling and , , at that element’s node that is farthest away from the PML interface (along the direction). For configuration (ii), to ensure that the element surfaces are strictly parallel to the axes, the PML mesh is built layer by layer and constant values in a given layer are obtained by sampling and , , at the outermost surface of that layer (along the direction). For the WAA in implementation (iv), the order of the Gaussian quadrature rule is , resulting in [32].
In all examples, the background medium is free space and the excitation is a plane wave with electric field , where , is the speed of light in free space, and is a baseband Gaussian pulse with and . The average edge lengths of all meshes used under this excitation are .
First, the reflection of a plane wave normally incident on the PML is computed. The computation domain is a rectangular box with dimensions . Perfect electric conductor (PEC) and periodic boundary conditions are used on the outer boundary of the PML that is located perpendicular to the direction and on the computation domain boundaries perpendicular to the and directions, respectively. The plane wave excitation is introduced on surface and propagates in the direction. The domain is long enough to ensure that the reflected field is wellseparated from the incident one, and therefore the reflection from the PML is simply measured by the peak value of the reflected field’s amplitude. The conductivity profile is described by , where is the coordinate on the interface between PML and the computation domain, is the thickness of the PML and is the order of the profile. Note that is nonzero only when . The values of these parameters are , , and , and also both inside the PML and the computation domain.
In this example, four configurations/implementations are considered: ECpaved, EClayered, SVpaved, and SVWAApaved. Their performances are compared for . For all four groups of simulations, is scanned to find the minimum reflection that can be obtained for each case. Fig. 2 shows that with increasing , the reflection first decreases exponentially and then increases gradually. This is observed for all configurations/implementations and all values of . When is small, the overall reflection is dominated by the reflection from the PEC boundary simply because the absorption inside the PML is not high enough. Therefore, in this regime, increasing elevates the absorption and reduces the amplitude of the wave reflected back into the computation domain exponentially. The numerical reflection (which is smaller than the reflection from the PEC boundary for small ) increases with increasing [11], and starts dominating the overall reflection as demonstrated in the figure by the gradual increase after the minimum point.
For the ECpaved configuration, the reflection stays at a high level and does not decrease with increasing . This is because of the large reflections from unoriented internal element surfaces. For the EClayered, SVlayered, and SVWAApaved configurations, highorder convergence is observed, i.e., the reflection keeps on decreasing exponentially with increasing . Still, the reflection for the SVpaved and SVWAApaved configurations is about smaller than that for the EClayered configuration. Note that this higher accuracy comes with the ease of meshing since a layered mesh (and conductivity profile) is not needed. Finally, Fig. 2 also shows that the SVWAApaved implementation performs exactly the same as the SVpaved direct implementation, which verifies the accuracy of the proposed method.
Next, scattering from a PEC sphere of radius is considered. The computation domains and the PMLs for the EClayered and SVpaved and SVWAApaved configurations are shown in Figs. 3 (a) and (b), respectively. The plane wave excitation is introduced on the totalfield scatteredfield (TFSF) surface [shown in green in Figs. 3 (a) and (b)]. The conductivity function is , , , where is the coordinate on the interface between PML and the computation domain, is the thickness of the PML along the direction, and is the order of the profile. Note that is nonzero only when . The values of these parameters are , and .
Because the distance between the sphere surface and the PML is short, possiblyevanescent scattered waves enter the PML with high grazing angles. A varying , , profile is employed to help with the absorption of these evanescent waves [12, 13]: with . Note that inside the computation domain, . In this example, using the PEC or the firstorder absorbing boundary condition [15] on the outer boundary of the PML gives similar results. The results presented here are obtained with the PEC boundary condition.
Three configurations/implementations are considered here: EClayered, SVpaved, and SVWAApaved. For the EClayered configuration, the thickness of each PML mesh layer is [Fig. 3 (a)]. Note that, for this example, generation of these layers is rather tedious since in the corner region one has to align all layer/element surfaces in all three directions. In contrast, for the SVpaved and SVWAApaved configurations [same mesh is used – Fig. 3 (b)], and values are simply obtained by sampling the corresponding profile functions at the nodes of the elements. This significantly simplifies the setups of the computation domain and the PML since even an explicit interface between the computation domain and the PML is not required [see Fig. 3 (b)]. The performances of the three configurations are compared for and .
is scanned to find the minimum reflection that could be reached for each case. Note that in this example “reflection” is defined as the peak value of the absolute difference between the fields computed at a probe point for the above cases and corresponding reference fields computed at the same point. different probe points (placed either in the TF or in the SF region) have been tested and the results are consistent for all of them. The results below correspond to the probe point at . These reference fields are computed under the same excitation but the distance between the sphere surface and the PML is extended to . To ensure that the discretization errors are at the same level, the meshes in the overlapped regions between the actual computation domains and the extended ones are kept exactly the same, the average edge lengths of the meshes in the extended region are kept the same as those in the actual computation domains, and the solutions are obtained using the same value of .
Fig. 4 plots the reflection for the three cases with different values of and versus . Clearly, the SVpaved and SVWAApaved configurations perform better than the EClayered configuration for every value of . The best performance is obtained with . Note that further increasing degrades the PML performance for all configurations/implementations and all values of since high conductivity values only appear at the very end of the PML when is high. Fig. 4 also shows that the SVWAApaved implementation performs exactly the same as the SVpaved direct implementation, which means the error caused by the WAA of the mass matrices is below the level of the discretization error.
memory (KB)  CPU time per step (s)  
SVpaved  SVWAApaved  SVpaved  SVWAApaved  
1  4  4  378,660  267,928  1.652716  2.341525 
2  10  11  1,274,424  498,508  4.083981  5.960960 
3  20  23  4,126,640  894,936  9.606642  15.73330 
4  35  44  11,583,440  1,513,140  19.64900  30.16986 
5  56  74  28,410,000  2,291,608  78.56877  105.1317 

Tested on a workstation with Intel Xeon(R) E52680 v4 CPU and 128GB memory. A single process is used. and .
Table. I compares the computational cost of the SVpaved and SVWAApaved implementations. With increasing , the memory requirement increases dramatically for the SVpaved direct implementation but only modestly for the SVWAA implementation. For , the memory requirement of the SVpaved implementation is times that of the SVWAApaved implementation. The computation time required by the SVWAApaved implementation per time step is slightly larger than that required by the SVpaved implementation due to the increased number of arithmetic operations (see Section. IIB). It should be also be noted here that, in practice, a DGTD algorithm is usually parallelized. The difference in times required for updating different elements is relatively small and can be easily compensated by allocating a smaller number of elements for those MPI processes containing PML elements. In the numerical results presented here, assigning a weight of for PML elements in ParMetis [36, 37] yields a good loadbalance.
Iv Conclusion
A PML implementation that allows the attenuation coefficient to vary inside the discretization elements yields a smaller numerical reflection from the interface between the PML and the computation domain and significantly simplifies the meshing process. However, these advantages come at the cost of increased memory footprint since a different mass matrix has to be stored for every discretization element. In this work, this memory requirement is reduced by applying WAA to the mass matrices without abandoning the advantages listed above. Indeed, numerical results demonstrate that the PML with smoothlyincreasing conductivity profile as implemented with the proposed method performs better than the PML implemented using elementwise constant conductivity profile and that the higherorder accuracy of the solution is maintained.
The proposed method is especially useful for simulations running on sharedmemory systems where the high memory requirement of smoothlyvarying PMLs could be a bottleneck. For simulations running on distributedmemory systems, the memory requirement of a single computing node is also reduced and a better loadbalance could be reached with a slightly adjusted weight in the domain partition.
References
 [1] J.P. Berenger, “A perfectly matched layer for the absorption of electromagnetic waves,” J. Comput. Phys., vol. 114, no. 2, pp. 185–200, 1994.
 [2] W. C. Chew and W. H. Weedon, “A 3D perfectly matched medium from modified Maxwell’s equations with stretched coordinates,” Microw. Opt. Tech. Lett., vol. 7, no. 13, pp. 599–604, 1994.
 [3] A. Taflove and S. C. Hagness, Computational electrodynamics: the finitedifference timedomain method. Artech house, 2005.
 [4] J.M. Jin, The finite element method in electromagnetics. John Wiley & Sons, 2015.
 [5] J. Hesthaven and T. Warburton, “Nodal highorder methods on unstructured grids: I. timedomain solution of Maxwell’s equations,” J. Comput. Phys., vol. 181, no. 1, pp. 186 – 221, 2002.
 [6] B. Cockburn, F. Li, and C.W. Shu, “Locally divergencefree discontinuous Galerkin methods for the Maxwell equations,” J. Comput. Phys., vol. 194, no. 2, pp. 588 – 610, 2004.
 [7] T. Lu, P. Zhang, and W. Cai, “Discontinuous Galerkin methods for dispersive and lossy Maxwell’s equations and PML boundary conditions,” J. Comput. Phys., vol. 200, no. 2, pp. 549–580, 2004.
 [8] J. Hesthaven and T. Warburton, Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications. NY, USA: Springer, 2008.
 [9] S. D. Gedney, C. Luo, J. A. Roden, R. D. Crawford, B. Guernsey, J. A. Miller, T. Kramer, and E. W. Lucas, “The discontinuous Galerkin finiteelement timedomain method solution of Maxwell’s equations,” Appl. Comput. Electromagn. Soc. J., vol. 24, no. 2, p. 129, 2009.
 [10] G. C. Cohen and S. Pernet, Finite element and discontinuous Galerkin methods for transient wave equations. Springer, 2017.
 [11] W. C. Chew and J. Jin, “Perfectly matched layers in the discretized space: An analysis and optimization,” Electromagn., vol. 16, no. 4, pp. 325–340, 1996.
 [12] J.P. Berenger, “Perfectly matched layer (PML) for computational electromagnetics,” Synthesis Lectures on Computational Electromagnetics, vol. 2, no. 1, pp. 1–117, 2007.
 [13] S. D. Gedney, “Introduction to the finitedifference timedomain (fdtd) method for electromagnetics,” Synthesis Lectures on Computational Electromagnetics, vol. 6, no. 1, pp. 1–250, 2011.
 [14] G. Chen, L. Zhao, W. Yu, S. Yan, K. Zhang, and J.M. Jin, “A general scheme for the discontinuous Galerkin timedomain modeling and sparameter extraction of inhomogeneous waveports,” IEEE Trans. Microwave Theory Tech., vol. 66, no. 4, pp. 1701–1712, 2018.
 [15] L. Angulo, J. Alvarez, M. Pantoja, S. Garcia, and A. Bretones, “Discontinuous Galerkin time domain methods in computational electrodynamics: State of the art,” in Forum Electromagn. Res. Methods Appl. Technol., vol. 10, 2015, pp. 1–24.
 [16] L. M. D. Angulo, “Time domain discontinuous Galerkin methods for Maxwell equations,” Ph.D. dissertation, Universidad de Granada, 2014.
 [17] K. Sankaran, “Accurate domain truncation techniques for timedomain conformal methods,” Ph.D. dissertation, ETH Zurich, 2007.
 [18] J. Niegemann, M. Konig, K. Stannigel, and K. Busch, “Higherorder timedomain methods for the analysis of nanophotonic systems,” Photonic. Nanostruct., vol. 7, no. 1, pp. 2–11, 2009.
 [19] J. Chan, R. J. Hewett, and T. Warburton, “Weightadjusted discontinuous Galerkin methods: wave propagation in heterogeneous media,” SIAM J. Sci. Comput., vol. 39, no. 6, pp. A2935–A2961, 2017.
 [20] K. Guo and J. Chan, “BernsteinBezier weightadjusted discontinuous Galerkin methods for wave propagation in heterogeneous media,” J. Comput. Phys., vol. 400, p. 108971, 2020.
 [21] K. Shukla, J. Chan, V. Maarten, and P. Jaiswal, “A weightadjusted discontinuous galerkin method for the poroelastic wave equation: penalty fluxes and microheterogeneities,” J. Comput. Phys., vol. 403, p. 109061, 2020.
 [22] L. Chen and H. Bagci, “Steadystate simulation of semiconductor devices using discontinuous Galerkin methods,” IEEE Access, vol. 8, pp. 16 203–16 215, 2020.
 [23] L. Chen and H. Bagci, “Multiphysics modeling of plasmonic photoconductive devices using discontinuous Galerkin methods,” arXiv preprint arXiv:1912.03639, 2019.
 [24] P. Li, Y. Shi, L. J. Jiang, and H. Bagci, “DGTD analysis of electromagnetic scattering from penetrable conductive objects with IBC,” IEEE Trans. Antennas Propag., vol. 63, no. 12, pp. 5686–5697, 2015.
 [25] P. Li, L. J. Jiang, and H. Bagci, “Transient analysis of dispersive powerground plate pairs with arbitrarily shaped antipads by the DGTD method with wave port excitation,” IEEE Trans. Electromagn. Compat., vol. 59, no. 1, pp. 172–183, 2017.
 [26] ——, “Discontinuous Galerkin timedomain modeling of graphene nanoribbon incorporating the spatial dispersion effects,” IEEE Trans. Antennas Propag., vol. 66, no. 7, pp. 3590–3598, 2018.
 [27] M. Liu, K. Sirenko, and H. Bagci, “An efficient discontinuous Galerkin finite element method for highly accurate solution of Maxwell equations,” IEEE Trans. Antennas Propag., vol. 60, no. 8, pp. 3992–3998, 2012.
 [28] L. Chen and H. Bagci, “A discontinuous Galerkin framework for multiphysics simulation of photoconductive devices,” in Proc. Int. Appl. Comput. Electromagn. Symp. IEEE, 2019, pp. 1–2.
 [29] K. Sirenko, M. Liu, and H. Bagci, “Incorporation of exact boundary conditions into a discontinuous Galerkin finite element method for accurately solving 2d timedependent Maxwell equations,” IEEE Trans. Antennas Propag., vol. 61, no. 1, pp. 472–477, 2012.
 [30] L. Chen and H. Bagci, “A unitcell discontinuous Galerkin scheme for analyzing plasmonic photomixers,” in Proc. IEEE Int. Symp. Antennas Propag., 2019, pp. 1069–1070.
 [31] R. Cools, “Monomial cubature rules since “stroud”: A compilation – part 2,” J. Comput. Appl. Math., vol. 112, no. 12, pp. 21–27, Nov. 1999.
 [32] H. Xiao and Z. Gimbutas, “A numerical algorithm for the construction of efficient quadrature rules in two and higher dimensions,” Comput. Math. with Appl., vol. 59, no. 2, pp. 663–676, 2010.
 [33] L. Chen, M. Dong, and H. Bagci, “Modeling floating potential conductors using discontinuous Galerkin method,” IEEE Access, vol. 8, pp. 7531–7538, 2020.
 [34] K. Sirenko, Y. Sirenko, and H. Bagci, “Exact absorbing boundary conditions for periodic threedimensional structures: Derivation and implementation in discontinuous Galerkin timedomain method,” IEEE J. Multiscale and Multiphys. Comput. Techn., vol. 3, pp. 108–120, 2018.
 [35] L. Chen, M. Dong, P. Li, and H. Bagci, “A hybridizable discontinuous Galerkin method for simulation of electrostatic problems with floating potential conductors,” arXiv preprint arXiv:2006.02549, 2020.
 [36] G. Karypis and V. Kumar, “A parallel algorithm for multilevel graph partitioning and sparse matrix ordering,” J. Parallel Distr. Com., vol. 48, no. 1, pp. 71–95, 1998.
 [37] L. Chen and H. Bagci, “An MPIbased parallel multiphysics discontinuous Galerkin framework for photoconductive devices,” in Prog. in Electromagn. Res. Symp. December 1720, 2019 in Xiamen, China, 2019. [Online]. Available: http://hdl.handle.net/10754/662548