Accuracy Controlled Structure-Preserving H^2-Matrix-Matrix Product in Linear Complexity with Change of Cluster Bases

08/14/2019
by   Miaomiao Ma, et al.
Purdue University
0

H^2-matrix constitutes a general mathematical framework for efficient computation of both partial-differential-equation and integral-equation-based operators. Existing linear-complexity H^2 matrix-matrix product (MMP) algorithm lacks explicit accuracy control, while controlling accuracy without compromising linear complexity is challenging. In this paper, we develop an accuracy controlled H^2 matrix-matrix product algorithm by instantaneously changing the cluster bases during the matrix product computation based on prescribed accuracy. Meanwhile, we retain the computational complexity of the overall algorithm to be linear. Different from the existing H^2 matrix-matrix product algorithm where formatted multiplications are performed using the original cluster bases, in the proposed algorithm, all additions and multiplications are either exact or computed based on prescribed accuracy. Furthermore, the original H^2-matrix structure is preserved in the matrix product. While achieving optimal complexity for constant-rank matrices, the computational complexity of the proposed algorithm is also minimized for variable-rank H^2-matrices. The proposed work serves as a fundamental arithmetic in the development of fast solvers for large-scale electromagnetic analysis. Applications to both large-scale capacitance extraction and electromagnetic scattering problems involving millions of unknowns on a single core have demonstrated the accuracy and efficiency of the proposed algorithm.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 5

04/12/2022

HODLR2D: A new class of Hierarchical matrices

This article introduces HODLR2D, a new hierarchical low-rank representat...
04/16/2019

Fast Commutative Matrix Algorithm

We show that the product of an nx3 matrix and a 3x3 matrix over a commut...
10/03/2019

Orbit Computation for Atomically Generated Subgroups of Isometries of Z^n

Isometries and their induced symmetries are ubiquitous in the world. Tak...
10/17/2019

Reducing the Computational Complexity of Pseudoinverse for the Incremental Broad Learning System on Added Inputs

In this brief, we improve the Broad Learning System (BLS) [7] by reducin...
01/31/2021

Linear Computation Coding

We introduce the new concept of computation coding. Similar to how rate-...
03/23/2021

A fast and oblivious matrix compression algorithm for Volterra integral operators

The numerical solution of dynamical systems with memory requires the eff...
10/12/2015

Penalized estimation in large-scale generalized linear array models

Large-scale generalized linear array models (GLAMs) can be challenging t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The -matrix [1, 2] constitutes a general mathematical framework for compact representation and efficient computation of large dense systems. Both partial differential equation (PDE) and integral equation (IE) operators in electromagnetics can be represented as -matrices with controlled accuracy [3, 4, 5].

The development of -matrix arithmetic such as addition, multiplication, and inverse are of critical importance to the development of fast solvers in electromagnetics [7]. Under the -matrix framework, it has been shown that an

-matrix-based addition, matrix-vector product (MVP), and matrix-matrix product (MMP) all can be performed in linear complexity for constant-rank

[1]. However, the accuracy of existing -MMP algorithm like [1] is not controlled. This is because given two -matrices and , the matrix structure and cluster bases of their product are pre-assumed, and a formatted multiplication is performed, whose accuracy is not controlled. For example, the row cluster bases of and the column cluster bases of are assumed to be those of C. This treatment lacks accuracy control since the original cluster basis may not be able to represent the new matrix content generated during the MMP. For instance, when multiplying a full-matrix block F by a low rank block , treating the result as a low-rank block is correct. However, it is inaccurate to use the original row cluster basis as the product’s row cluster basis, since the latter has been changed to . Therefore, the algorithm in [1] can be accurate if the cluster bases of the original matrices can also be used to accurately represent the matrix product. However, this is unknown in general applications, and hence the accuracy of existing linear-complexity MMP algorithm is not controlled. One can find many cases where a formatted multiplication would fail.

The posteriori multiplication in [2] is more accurate than the formatted multiplication in [1]. But it is only suitable for special matrices. Besides, this posteriori multiplication requires much more computational time and memory than the formatted one. It needs to first present the product in -matrix and then convert it into an -matrix, the complexity of which is not linear.

In this work, we propose a new algorithm to do the matrix-matrix multiplication with controlled accuracy. The cluster bases are calculated instantaneously based on the prescribed accuracy during the computation of the matrix-matrix product. Meanwhile, we are able to keep the computational complexity to be linear for constant-rank . For variable-rank cases such as those in an electrically large analysis, the proposed MMP is also efficient since it only involves computations at level , each of which costs only, where is the rank at tree level . This algorithm can be used as a fundamental arithmetic in the error-controlled fast inverse, LU factorization, solution for many right hand sides, etc. Numerical experiments have demonstrated its accuracy and low complexity. In [12, 13], we present a fast algorithm to compute the product of two -matrices in controlled accuracy. However, unlike this work, the original cluster bases are not completely changed, but appended to account for the updates to the original matrix during the MMP. In [8], we present the basic idea of this work. However, it is a one-page abstract. In this paper, we provide a complete algorithm together with a comprehensive analysis of its accuracy and complexity, whose validity and performance are then demonstrated by abundant numerical examples.

Ii Preliminaries

(a)
(b)
Fig. 1: Illustration of a block cluster tree and resulting -matrix partition. (a) Block cluster tree. (b) -matrix structure.

In an -matrix [1], the entire matrix is partitioned into multilevel admissible and inadmissible blocks, where inadmissible blocks are at the leaf level, noted as . An admissible matrix block satisfies the following strong admissibility condition

(1)

where () denotes the geometrical support of the unknown set (), is the Euclidean diameter of a set, denotes the Euclidean distance between two sets, and is a positive parameter that can be used to control the admissibility condition. An admissible matrix block in an -matrix is represented as

(2)

where () is called cluster basis associated with cluster (), is called coupling matrix. The cluster bases V in an -matrix has a nested property. This means the cluster basis for a non-leaf cluster , , can be expressed by its two children’s cluster bases, and , as

(3)

where and are called transfer matrices. Because of such a nested relationship, the cluster bases only need to be stored for leaf clusters. For non-leaf clusters, only transfer matrices need to be stored. The -matrix is stored in a tree structure, with the size of leaf-level clusters denoted by . The number of blocks formed by a single cluster at each tree level is bounded by a constant . In an -matrix, a large matrix block consisting of F and R is called a nonleaf block NL. As an example, a four-level block cluster -tree is illustrated in Fig. 1 (a), where the green link connects a row cluster with a column cluster, which form an admissible block, and the red links are for inadmissible blocks. The resultant -matrix is shown in Fig. 1 (b), where the admissible blocks are marked in green and the inadmissible blocks are marked in red.

Iii Proposed Matrix-Matrix Product Algorithm—Leaf Level

Fig. 2: An -matrix structure. (a) . (b) . (c) .

To compute , unlike the existing formatted MMP [1], which is recursive, we propose to perform a one-way tree traversal from leaf level all the way up to the minimum level that has admissible blocks. Here, the tree is inverted with root level at level 0. While doing the multiplications at each level, we instantaneously compute the new row and column cluster bases based on prescribed accuracy to represent the product matrix accurately. We will use the -matrices shown in Fig. 2 to illustrate the proposed algorithm, but the algorithm is valid for any -matrix. The structures of , , and matrices, i.e., which block is admissible and which is inadmissible, are determined based on the admissibility condition given in (1). During the product calculation, we will keep the structure of product matrix while achieving prescribed accuracy. In this section, we detail proposed algorithm for leaf-level multiplications.

Fig. 3: -matrix at leaf level. (a) . (b) . (c) .

We start from leaf level . Let F denote an inadmissible block, which is stored as a full matrix, and R be an admissible block. At leaf level, there are in total four matrix-matrix multiplication cases, i.e.,

  • Case-1:

  • Case-2:

  • Case-3:

  • Case-4:

The resulting matrix block in C is of two kinds: First, full matrix block, denoted by , marked in red in Fig. 3(c); Second, admissible block of leaf size, which could be located at leaf level, denoted by as marked in green in Fig. 3(c); which could also appear as a subblock in the non-leaf level as marked in blue in Fig. 3(c). The blue blocks in Fig. 3(c) are only for temporary storage, which will be changed to green admissible blocks during the upper level multiplication to preserve the structure of matrix. The white blocks in Fig. 3 denote those blocks that are not involved in the leaf level multiplication. Next we show how to perform each matrix-matrix multiplication based on the two kinds of target blocks.

Iii-a Product is an inadmissible block (full matrix) in C

If the product matrix is a full block , we can perform the four cases of multiplications exactly as they are by full matrix multiplications. For the admissible leaf blocks in four cases, we convert them into full matrices and then compute products. Since the size of these matrices is of , a user-defined constant, the computational cost is constant for each of such computations.

Iii-B Product is an admissible block in C

If the product is admissible in C whether it is a leaf-level block or a subblock of a non-leaf admissible block, case-4 can be performed as it is since the product matrix is obviously admissible, which also preserves the original row and column cluster bases. In other words, the row cluster basis of A is that of C; and the column cluster basis of B is kept in C. To see this point clearly, we can write

(4)

where subscripts , , and denote cluster index, subscript denotes the corresponding cluster is a row cluster, whereas denotes the cluster is a column cluster. For example, denotes the cluster basis of row cluster in A, and denotes the cluster basis of column cluster in B. Eqn. (4) can be written in short as

(5)

in which is the part in between the two cluster bases, which denotes the coupling matrix of the product admissible block in C. Clearly, this case of multiplication does not change the original row and column cluster bases.

For the other three cases, in existing MMP algorithms, a formatted multiplication is performed, which is done in the same way as case-4, i.e., using the original cluster bases of A and B or pre-assumed bases as the cluster bases of the product block. This obviously can be inaccurate since cases-1, 2, and 3, if performed as they are, would result in different cluster bases in the product matrix, which cannot be assumed. Specifically, case-1 results in a different row as well as column cluster bases in the product admissible block because

(6)

case-2 yields a different row cluster basis since

(7)

whereas case-3 results in a different column cluster basis in the product admissible block, because

(8)

If we do not update the cluster bases in the product matrix, the accuracy of the multiplication is not controllable. Therefore, in the proposed algorithm, we update row and column cluster bases for multiplication cases 1, 2, and 3 based on prescribed accuracy. We also have to do so with the nested property taken into consideration so that the computation at nonleaf levels can be performed efficiently.

For case-1, both row and column cluster bases of the product block need to be updated. For case-2, we need to use to update the original row cluster basis . For case-3, we need to use to update column cluster basis . Since there are many case-1, 2 and 3 products encountered at the leaf level for the same row or column cluster, we develop the following algorithm to systematically update the cluster bases. In this procedure, we also have to take the computation at all nonleaf levels into consideration so that the changed cluster bases at the leaf level can be reused at the nonleaf levels. To achieve this goal, when we update the cluster basis due to the case-1, 2, and 3 multiplications associated with this cluster, not only we consider the product admissible block in the leaf level, but also the admissible blocks at all nonleaf levels. In other words, when computing multiplied by , if the block is part of a non-leaf admissible block, we will take the corresponding multiplication into account to update the cluster bases. The detailed algorithms are as follows.

Iii-C Computation of new cluster bases in matrix product

First, we show how to calculate the new row cluster bases of . Take an arbitrary row cluster as an example, let its cluster basis in C be denoted by . This cluster basis is affected by both case-1 and case-2 multiplications, as analyzed in the above. We first find all the case-1 multiplications associated with cluster , i.e., all whose product block is admissible. Again, notice that the can be either admissible at leaf level or be part of a non-leaf admissible block. For any cluster , the number of is bounded by constant , since the number of inadmissible blocks that can be formed by a cluster is bounded by . For the same reason, the number of for cluster is also bounded by constant . Hence, the total number of multiplications is bounded by , thus also a constant. Then we calculate the Gram matrix sum of these products as:

(9)

in which superscript denotes a Hermitian matrix. We also find all case-2 products associated with cluster , which is the number of formed by cluster at leaf level in . This is also bounded by . Since in case-2 products, is multiplied by an admissible block in B, and hence , we compute

(10)

which incorporates all of the new cluster bases information due to case-2 products.

For case-3 and case-4 multiplications, the row cluster bases of matrix are kept to be those of C. So we account for the contribution of as

(11)

The column space spanning , and would be the new cluster basis of , since it takes both the original cluster basis and the change to the cluster basis due to matrix products into consideration. Since the magnitude of the three matrices may differ greatly, we normalize them before summing them up so that each component is captured. We thus obtain

(12)

The    above , and denotes a normalized matrix. We then perform an SVD on to obtain the row cluster bases for cluster of based on prescribed accuracy

. The singular vectors whose normalized singular values are greater than

make the new row cluster basis . It can be used to accurately represent the admissible blocks related to cluster in . Here, notice that the proposed algorithm for computing matrix-product cluster bases keeps nested property of . This is because the Gram matrix sums in (9), (10) and (11) take the upper level admissible products into account.

To compute the column cluster bases in , the steps are similar to the row cluster basis computation. We account for the contributions from all the four cases of products to compute column cluster bases. As can be seen from (25) and (27), in case-1 and case-3 products, the column cluster bases are changed from the original ones; whereas in case-2 and case-4 products, the column cluster bases are kept the same as those in B.

Consider an arbitrary column cluster in . We find all of the case-1 products associated with , which is with target being admissible either at the leaf or non-leaf level. The number of such multiplications is bounded by . We then compute the sum of their Gram matrices as:

(13)

Here, the superscript denotes a complex conjugate. We also find all of the case-3 products associated with , which is with target being admissible either at the leaf or non-leaf level. Hence, the new column cluster basis takes a form of . The number of such multiplications is also bounded by . The sum of their Gram matrices can be computed as:

(14)

For case-2 and case-4 products, the original column cluster bases of are kept in , hence, we compute

(15)

We also normalize these three Gram matrices , and and sum them up as:

(16)

We then perform an SVD on this and truncate the singular values based on prescribed accuracy to obtain the column cluster bases for cluster . Now this new column cluster basis can be used to accurately represent the admissible blocks formed by column cluster in .

Iii-D Computation of the four cases of multiplications with the product block being admissible

After computing the new row and column cluster bases of the product matrix, for the multiplication whose target is an admissible block described in Section III-B, the computation becomes the coupling matrix computation since the cluster bases have been generated. For the four cases of multiplications, their coupling matrices have the following expressions:

(17)

The resulting admissible blocks in are nothing but .

In (17), the is the cluster bases product, which is as shown below:

(18)

Since it is only related to the original cluster bases, it can be prepared in advance before the MMP computation. Using the nested property of the cluster bases, can be computed in linear time for all clusters , be a leaf or a non-leaf cluster.

In (17), the is simply the projection of the original row cluster basis of A onto the new cluster basis of the product matrix C. Similarly, denotes the projection of the original column cluster basis of B onto the newly generated column cluster basis in C. The two cluster basis projections can also be computed for every leaf cluster after the new cluster bases have been generated. Hence, we compute

(19)

for each leaf row cluster , and each column leaf cluster . In this way, it can be reused without recomputation for each admissible block formed by or .

In (17), we can also see that the F block is front and back multiplied by cluster bases. It can be viewed as an F block collected based on the front (row) and back (column) cluster bases, which becomes a matrix of rank size. Specifically, in (17), there are three kinds of collected blocks

(20)

which is used in case-1, 2, and 3 multiplication respectively.

As can be seen from (17), the case-1 multiplication with an admissible block being the target can be performed by first computing the full-matrix product, and then collecting the product onto the new row and column cluster bases of the product matrix. This collect operation is accurate because the newly generated row and column cluster bases have taken such a case-1 multiplication into consideration when being generated. As for the case-2 multiplication, as can be seen from (17), we can use the collected based on the new row cluster basis and the original column cluster basis, the size of which is rank, to multiply the coupling matrix of , and then multiply the column basis projection matrix since the column bases have been changed. Similarly, for case-3, we use the collected block , and front multiply it by the coupling matrix of , and then front multiply a row cluster basis transformation matrix. As for case-4, we multiply the coupling matrix of A’s admissible block by the cluster basis product, and then by the coupling matrix of B’s admissible block. Since the row and column cluster bases have been changed to account for the other cases of multiplications, at the end, we need to front and back multiply the cluster basis transformation matrices to complete the computation of case-4. Summarizing the aforementioned, the coupling matrix in (17) can be efficiently computed as

(21)

Iii-E Summary of overall algorithm at leaf level

Here, we conclude all the operations related to leaf level computation when the target is an admissible block:

  1. Prepare cluster bases product B;

  2. Compute all the leaf-level row and column cluster bases of product matrix ;

  3. Collect the F blocks in and based on the new row and/or column cluster bases, also prepare cluster bases transformation matrix P ;

  4. Perform four cases of multiplications.

After leaf level multiplications, we need to merge four coupling matrices at a non-leaf level admissible block, as shown by the blue blocks in Fig. 3 (c). These matrices correspond to the multiplication case of a nonleaf block NL multiplied by a nonleaf block NL generating an admissible block at next level. The merged block is the coupling matrix of this next-level admissible block. It will be used to update next level transfer matrices. The details will be given in next section.

Iv Proposed Matrix-Matrix Product Algorithm—Non-Leaf Level

Fig. 4: -matrix block at non-leaf level . (a) . (b) . (c) .

After finishing the leaf level multiplication, we proceed to non-leaf level multiplications. In Fig. 4, we use level as an example to illustrate , , and .

At a nonleaf level , there are also in total four matrix-matrix multiplication cases, i.e.,

  • Case-1:

  • Case-2:

  • Case-3:

  • Case-4: ,

where NL denotes a non-leaf block. The resulting matrix block in C is also of two kinds: 1) non-leaf block NL at this level, marked in red in Fig. 4 (c), and 2) admissible block R, marked in green in Fig. 4 (c). Next we show how to perform each case of multiplications based on the two kinds of target blocks.

Iv-a Product is an Nl block in C

The NL target block would not exist for a case-1 multiplication, since if a case-1 multiplication results in an NL block, that computation should have been performed at previous level. As for the other three cases of multiplications, since at least one admissible block is present in the multipliers, the product must be an admissible block. Hence, we compute them as having an admissible block as the product, using the algorithm described in the following subsection, and associate the resulting admissible block with the NL block. After the computation is done at all levels, we perform a backward split operation to split the admissible block associated with each NL block to each leaf block of C based on its structure.

Iv-B Product is an admissible block in C

Similar to the leaf level, if the product is an admissible block in C whether at the same non-leaf level or at an upper level, case-4 can be performed as it is since the product matrix is obviously admissible, which also preserves the original row and column cluster bases. We can write

case-4:
(22)

where T denotes a transfer matrix, and superscript denotes the two children clusters of the non-leaf cluster . If the cluster bases at leaf level and the transfer matrices at non-leaf levels are kept the same as before, then the computation of (IV-B) is to calculate the coupling matrix at level , which is

(23)

It is a product of three small matrices whose size is the rank at this tree level. Rewriting (IV-B) as

case-4:
(24)

If we exclude the children cluster bases in the front and at the back, we can see that T serves as the new cluster basis at this level. In other words, at a non-leaf level , if we treat this level as the bottom level of the remaining tree, then the transfer matrix of the non-leaf cluster is nothing but the leaf cluster basis of the shortened tree.

Similar to the leaf-level computation, the other three cases of multiplications will result in a change of cluster basis in the matrix product. Specifically, case-1 results in a different row as well as column cluster bases in the product admissible block because

(25)

case-2 yields a different row cluster basis since

(26)

whereas case-3 results in a different column cluster basis in the product admissible block, because

(27)

If we do not update the cluster bases in the product matrix, the accuracy of the multiplication is not controllable. However, if we update the cluster basis as they are, it is computationally very expensive since the matrix block size keeps increasing when we proceed from leaf level towards the root level. In addition to the cost of changing cluster bases, if we have to carry out the multiplications at each non-leaf level using the actual matrix block size, then the computation is also prohibitive. Therefore, the fast algorithm we develop here is to perform all computations using the rank size at each tree level, and meanwhile control the accuracy.

In the proposed algorithm, to account for the updates to the original matrix during the MMP procedure, the cluster bases of C are computed level by level, which are manifested by the changed leaf cluster bases and the transfer matrices at nonleaf levels. At a non-leaf level, its children-level cluster bases have already been computed, and they are different from the original ones in A and B. However, the new cluster bases have taken the upper-level multiplications into consideration. Hence, we can accurately represent the multiplication at the current non-leaf level using newly generated children cluster bases.

Take case-1 product as an example, where we perform obtaining an admissible . We can accurately represent this product using the children cluster bases of and as follows:

case-1:
(28)

in which

(29)

This collected block, , is actually the coupling matrix merged from the four small coupling matrices computed at previous level, when dealing with the multiplication case of having a target block as a subblock in the upper-level admissible block. It can be written as

(30)

Each of the four coupling matrices has been obtained at previous level. From (IV-B), it is clear that using the nested property of the cluster bases, the collect operation does not need to start from leaf level, but using the four blocks obtained at previous one level.

For case-2 product, it can also be accurately expanded in the space of the children row cluster bases, and hence

(31)

where

(32)

which is collected based on the children’s new row cluster bases in C and the original column cluster bases in B. From (IV-B), it can be seen that if excluding the children cluster bases, then resembles the in the leaf level case-2 product. In other words, if we treat the current non-leaf level as the leaf level, then is equivalent to a full matrix block, whereas T is the leaf cluster basis. An example of block at level in can be seen below:

(33)

which consists of collected full matrices whose expressions are shown in (20), and projected coupling matrices of admissible blocks. Again, using the nested property of both new and original cluster bases, the collect operation does not need to start from leaf level, but using the four blocks obtained at previous one level. Each collect operation only costs , where is the rank at level .

Since the cluster bases at the previous level have been computed, for case-1 and case-2 products at a non-leaf level, we only need to compute the center block associated with the current non-leaf level, and this computation can be carried out in the same way as how we carry out leaf-level computation, if we treat the current non-leaf level as the leaf level of the remaining tree. The same is true to case-3 product, where we have

(34)

in which

(35)

We can see that resembles the in the leaf level case-3 product. An example of collected NL block in is given as follows

(36)

which consists of collected full matrices whose expressions are shown in (20), and projected coupling matrices of admissible blocks.

Since the cluster bases have been changed at previous level, we also represent the case-4 product using the new children cluster bases of and , thus

case-4:
(37)

and

(38)

which can be written in short as

(39)

where denotes children. Here, there is a cluster basis transformation matrix in the front and at the back.

Iv-C Computation of the new non-leaf level transfer matrices in C

If the target block is an admissible block at a nonleaf level, we need to represent it as in controlled accuracy. Hence, we need to calculate new row and column transfer matrices T of product matrix . First, we introduce how to calculate the row transfer matrices. Similar to leaf level, case-1 and 2 products result in a change in the row cluster basis and hence row transfer matrix. Case-3 and 4 products do not require a change of transfer matrix if the cluster bases have not been changed at previous level. However, since the cluster bases have been changed at previous level, the transfer matrix requires an update as well.

For an arbitrary non-leaf cluster , we first find all of the case-1 products associated with . Each of such a product leads to a coupling matrix merged from the four coupling matrices obtained at previous level computation, denoted by