I Introduction
The matrix [1, 2] constitutes a general mathematical framework for compact representation and efficient computation of large dense systems. Both partial differential equation (PDE) and integral equation (IE) operators in electromagnetics can be represented as matrices with controlled accuracy [3, 4, 5].
The development of matrix arithmetic such as addition, multiplication, and inverse are of critical importance to the development of fast solvers in electromagnetics [7]. Under the matrix framework, it has been shown that an
matrixbased addition, matrixvector product (MVP), and matrixmatrix product (MMP) all can be performed in linear complexity for constantrank
[1]. However, the accuracy of existing MMP algorithm like [1] is not controlled. This is because given two matrices and , the matrix structure and cluster bases of their product are preassumed, and a formatted multiplication is performed, whose accuracy is not controlled. For example, the row cluster bases of and the column cluster bases of are assumed to be those of C. This treatment lacks accuracy control since the original cluster basis may not be able to represent the new matrix content generated during the MMP. For instance, when multiplying a fullmatrix block F by a low rank block , treating the result as a lowrank block is correct. However, it is inaccurate to use the original row cluster basis as the product’s row cluster basis, since the latter has been changed to . Therefore, the algorithm in [1] can be accurate if the cluster bases of the original matrices can also be used to accurately represent the matrix product. However, this is unknown in general applications, and hence the accuracy of existing linearcomplexity MMP algorithm is not controlled. One can find many cases where a formatted multiplication would fail.The posteriori multiplication in [2] is more accurate than the formatted multiplication in [1]. But it is only suitable for special matrices. Besides, this posteriori multiplication requires much more computational time and memory than the formatted one. It needs to first present the product in matrix and then convert it into an matrix, the complexity of which is not linear.
In this work, we propose a new algorithm to do the matrixmatrix multiplication with controlled accuracy. The cluster bases are calculated instantaneously based on the prescribed accuracy during the computation of the matrixmatrix product. Meanwhile, we are able to keep the computational complexity to be linear for constantrank . For variablerank cases such as those in an electrically large analysis, the proposed MMP is also efficient since it only involves computations at level , each of which costs only, where is the rank at tree level . This algorithm can be used as a fundamental arithmetic in the errorcontrolled fast inverse, LU factorization, solution for many right hand sides, etc. Numerical experiments have demonstrated its accuracy and low complexity. In [12, 13], we present a fast algorithm to compute the product of two matrices in controlled accuracy. However, unlike this work, the original cluster bases are not completely changed, but appended to account for the updates to the original matrix during the MMP. In [8], we present the basic idea of this work. However, it is a onepage abstract. In this paper, we provide a complete algorithm together with a comprehensive analysis of its accuracy and complexity, whose validity and performance are then demonstrated by abundant numerical examples.
Ii Preliminaries
In an matrix [1], the entire matrix is partitioned into multilevel admissible and inadmissible blocks, where inadmissible blocks are at the leaf level, noted as . An admissible matrix block satisfies the following strong admissibility condition
(1) 
where () denotes the geometrical support of the unknown set (), is the Euclidean diameter of a set, denotes the Euclidean distance between two sets, and is a positive parameter that can be used to control the admissibility condition. An admissible matrix block in an matrix is represented as
(2) 
where () is called cluster basis associated with cluster (), is called coupling matrix. The cluster bases V in an matrix has a nested property. This means the cluster basis for a nonleaf cluster , , can be expressed by its two children’s cluster bases, and , as
(3) 
where and are called transfer matrices. Because of such a nested relationship, the cluster bases only need to be stored for leaf clusters. For nonleaf clusters, only transfer matrices need to be stored. The matrix is stored in a tree structure, with the size of leaflevel clusters denoted by . The number of blocks formed by a single cluster at each tree level is bounded by a constant . In an matrix, a large matrix block consisting of F and R is called a nonleaf block NL. As an example, a fourlevel block cluster tree is illustrated in Fig. 1 (a), where the green link connects a row cluster with a column cluster, which form an admissible block, and the red links are for inadmissible blocks. The resultant matrix is shown in Fig. 1 (b), where the admissible blocks are marked in green and the inadmissible blocks are marked in red.
Iii Proposed MatrixMatrix Product Algorithm—Leaf Level
To compute , unlike the existing formatted MMP [1], which is recursive, we propose to perform a oneway tree traversal from leaf level all the way up to the minimum level that has admissible blocks. Here, the tree is inverted with root level at level 0. While doing the multiplications at each level, we instantaneously compute the new row and column cluster bases based on prescribed accuracy to represent the product matrix accurately. We will use the matrices shown in Fig. 2 to illustrate the proposed algorithm, but the algorithm is valid for any matrix. The structures of , , and matrices, i.e., which block is admissible and which is inadmissible, are determined based on the admissibility condition given in (1). During the product calculation, we will keep the structure of product matrix while achieving prescribed accuracy. In this section, we detail proposed algorithm for leaflevel multiplications.
We start from leaf level . Let F denote an inadmissible block, which is stored as a full matrix, and R be an admissible block. At leaf level, there are in total four matrixmatrix multiplication cases, i.e.,

Case1:

Case2:

Case3:

Case4:
The resulting matrix block in C is of two kinds: First, full matrix block, denoted by , marked in red in Fig. 3(c); Second, admissible block of leaf size, which could be located at leaf level, denoted by as marked in green in Fig. 3(c); which could also appear as a subblock in the nonleaf level as marked in blue in Fig. 3(c). The blue blocks in Fig. 3(c) are only for temporary storage, which will be changed to green admissible blocks during the upper level multiplication to preserve the structure of matrix. The white blocks in Fig. 3 denote those blocks that are not involved in the leaf level multiplication. Next we show how to perform each matrixmatrix multiplication based on the two kinds of target blocks.
Iiia Product is an inadmissible block (full matrix) in C
If the product matrix is a full block , we can perform the four cases of multiplications exactly as they are by full matrix multiplications. For the admissible leaf blocks in four cases, we convert them into full matrices and then compute products. Since the size of these matrices is of , a userdefined constant, the computational cost is constant for each of such computations.
IiiB Product is an admissible block in C
If the product is admissible in C whether it is a leaflevel block or a subblock of a nonleaf admissible block, case4 can be performed as it is since the product matrix is obviously admissible, which also preserves the original row and column cluster bases. In other words, the row cluster basis of A is that of C; and the column cluster basis of B is kept in C. To see this point clearly, we can write
(4) 
where subscripts , , and denote cluster index, subscript denotes the corresponding cluster is a row cluster, whereas denotes the cluster is a column cluster. For example, denotes the cluster basis of row cluster in A, and denotes the cluster basis of column cluster in B. Eqn. (4) can be written in short as
(5) 
in which is the part in between the two cluster bases, which denotes the coupling matrix of the product admissible block in C. Clearly, this case of multiplication does not change the original row and column cluster bases.
For the other three cases, in existing MMP algorithms, a formatted multiplication is performed, which is done in the same way as case4, i.e., using the original cluster bases of A and B or preassumed bases as the cluster bases of the product block. This obviously can be inaccurate since cases1, 2, and 3, if performed as they are, would result in different cluster bases in the product matrix, which cannot be assumed. Specifically, case1 results in a different row as well as column cluster bases in the product admissible block because
(6) 
case2 yields a different row cluster basis since
(7) 
whereas case3 results in a different column cluster basis in the product admissible block, because
(8) 
If we do not update the cluster bases in the product matrix, the accuracy of the multiplication is not controllable. Therefore, in the proposed algorithm, we update row and column cluster bases for multiplication cases 1, 2, and 3 based on prescribed accuracy. We also have to do so with the nested property taken into consideration so that the computation at nonleaf levels can be performed efficiently.
For case1, both row and column cluster bases of the product block need to be updated. For case2, we need to use to update the original row cluster basis . For case3, we need to use to update column cluster basis . Since there are many case1, 2 and 3 products encountered at the leaf level for the same row or column cluster, we develop the following algorithm to systematically update the cluster bases. In this procedure, we also have to take the computation at all nonleaf levels into consideration so that the changed cluster bases at the leaf level can be reused at the nonleaf levels. To achieve this goal, when we update the cluster basis due to the case1, 2, and 3 multiplications associated with this cluster, not only we consider the product admissible block in the leaf level, but also the admissible blocks at all nonleaf levels. In other words, when computing multiplied by , if the block is part of a nonleaf admissible block, we will take the corresponding multiplication into account to update the cluster bases. The detailed algorithms are as follows.
IiiC Computation of new cluster bases in matrix product
First, we show how to calculate the new row cluster bases of . Take an arbitrary row cluster as an example, let its cluster basis in C be denoted by . This cluster basis is affected by both case1 and case2 multiplications, as analyzed in the above. We first find all the case1 multiplications associated with cluster , i.e., all whose product block is admissible. Again, notice that the can be either admissible at leaf level or be part of a nonleaf admissible block. For any cluster , the number of is bounded by constant , since the number of inadmissible blocks that can be formed by a cluster is bounded by . For the same reason, the number of for cluster is also bounded by constant . Hence, the total number of multiplications is bounded by , thus also a constant. Then we calculate the Gram matrix sum of these products as:
(9) 
in which superscript denotes a Hermitian matrix. We also find all case2 products associated with cluster , which is the number of formed by cluster at leaf level in . This is also bounded by . Since in case2 products, is multiplied by an admissible block in B, and hence , we compute
(10) 
which incorporates all of the new cluster bases information due to case2 products.
For case3 and case4 multiplications, the row cluster bases of matrix are kept to be those of C. So we account for the contribution of as
(11) 
The column space spanning , and would be the new cluster basis of , since it takes both the original cluster basis and the change to the cluster basis due to matrix products into consideration. Since the magnitude of the three matrices may differ greatly, we normalize them before summing them up so that each component is captured. We thus obtain
(12) 
The above , and denotes a normalized matrix. We then perform an SVD on to obtain the row cluster bases for cluster of based on prescribed accuracy
. The singular vectors whose normalized singular values are greater than
make the new row cluster basis . It can be used to accurately represent the admissible blocks related to cluster in . Here, notice that the proposed algorithm for computing matrixproduct cluster bases keeps nested property of . This is because the Gram matrix sums in (9), (10) and (11) take the upper level admissible products into account.To compute the column cluster bases in , the steps are similar to the row cluster basis computation. We account for the contributions from all the four cases of products to compute column cluster bases. As can be seen from (25) and (27), in case1 and case3 products, the column cluster bases are changed from the original ones; whereas in case2 and case4 products, the column cluster bases are kept the same as those in B.
Consider an arbitrary column cluster in . We find all of the case1 products associated with , which is with target being admissible either at the leaf or nonleaf level. The number of such multiplications is bounded by . We then compute the sum of their Gram matrices as:
(13) 
Here, the superscript denotes a complex conjugate. We also find all of the case3 products associated with , which is with target being admissible either at the leaf or nonleaf level. Hence, the new column cluster basis takes a form of . The number of such multiplications is also bounded by . The sum of their Gram matrices can be computed as:
(14) 
For case2 and case4 products, the original column cluster bases of are kept in , hence, we compute
(15) 
We also normalize these three Gram matrices , and and sum them up as:
(16) 
We then perform an SVD on this and truncate the singular values based on prescribed accuracy to obtain the column cluster bases for cluster . Now this new column cluster basis can be used to accurately represent the admissible blocks formed by column cluster in .
IiiD Computation of the four cases of multiplications with the product block being admissible
After computing the new row and column cluster bases of the product matrix, for the multiplication whose target is an admissible block described in Section IIIB, the computation becomes the coupling matrix computation since the cluster bases have been generated. For the four cases of multiplications, their coupling matrices have the following expressions:
(17) 
The resulting admissible blocks in are nothing but .
In (17), the is the cluster bases product, which is as shown below:
(18) 
Since it is only related to the original cluster bases, it can be prepared in advance before the MMP computation. Using the nested property of the cluster bases, can be computed in linear time for all clusters , be a leaf or a nonleaf cluster.
In (17), the is simply the projection of the original row cluster basis of A onto the new cluster basis of the product matrix C. Similarly, denotes the projection of the original column cluster basis of B onto the newly generated column cluster basis in C. The two cluster basis projections can also be computed for every leaf cluster after the new cluster bases have been generated. Hence, we compute
(19) 
for each leaf row cluster , and each column leaf cluster . In this way, it can be reused without recomputation for each admissible block formed by or .
In (17), we can also see that the F block is front and back multiplied by cluster bases. It can be viewed as an F block collected based on the front (row) and back (column) cluster bases, which becomes a matrix of rank size. Specifically, in (17), there are three kinds of collected blocks
(20) 
which is used in case1, 2, and 3 multiplication respectively.
As can be seen from (17), the case1 multiplication with an admissible block being the target can be performed by first computing the fullmatrix product, and then collecting the product onto the new row and column cluster bases of the product matrix. This collect operation is accurate because the newly generated row and column cluster bases have taken such a case1 multiplication into consideration when being generated. As for the case2 multiplication, as can be seen from (17), we can use the collected based on the new row cluster basis and the original column cluster basis, the size of which is rank, to multiply the coupling matrix of , and then multiply the column basis projection matrix since the column bases have been changed. Similarly, for case3, we use the collected block , and front multiply it by the coupling matrix of , and then front multiply a row cluster basis transformation matrix. As for case4, we multiply the coupling matrix of A’s admissible block by the cluster basis product, and then by the coupling matrix of B’s admissible block. Since the row and column cluster bases have been changed to account for the other cases of multiplications, at the end, we need to front and back multiply the cluster basis transformation matrices to complete the computation of case4. Summarizing the aforementioned, the coupling matrix in (17) can be efficiently computed as
(21) 
IiiE Summary of overall algorithm at leaf level
Here, we conclude all the operations related to leaf level computation when the target is an admissible block:

Prepare cluster bases product B;

Compute all the leaflevel row and column cluster bases of product matrix ;

Collect the F blocks in and based on the new row and/or column cluster bases, also prepare cluster bases transformation matrix P ;

Perform four cases of multiplications.
After leaf level multiplications, we need to merge four coupling matrices at a nonleaf level admissible block, as shown by the blue blocks in Fig. 3 (c). These matrices correspond to the multiplication case of a nonleaf block NL multiplied by a nonleaf block NL generating an admissible block at next level. The merged block is the coupling matrix of this nextlevel admissible block. It will be used to update next level transfer matrices. The details will be given in next section.
Iv Proposed MatrixMatrix Product Algorithm—NonLeaf Level
After finishing the leaf level multiplication, we proceed to nonleaf level multiplications. In Fig. 4, we use level as an example to illustrate , , and .
At a nonleaf level , there are also in total four matrixmatrix multiplication cases, i.e.,

Case1:

Case2:

Case3:

Case4: ,
where NL denotes a nonleaf block. The resulting matrix block in C is also of two kinds: 1) nonleaf block NL at this level, marked in red in Fig. 4 (c), and 2) admissible block R, marked in green in Fig. 4 (c). Next we show how to perform each case of multiplications based on the two kinds of target blocks.
Iva Product is an Nl block in C
The NL target block would not exist for a case1 multiplication, since if a case1 multiplication results in an NL block, that computation should have been performed at previous level. As for the other three cases of multiplications, since at least one admissible block is present in the multipliers, the product must be an admissible block. Hence, we compute them as having an admissible block as the product, using the algorithm described in the following subsection, and associate the resulting admissible block with the NL block. After the computation is done at all levels, we perform a backward split operation to split the admissible block associated with each NL block to each leaf block of C based on its structure.
IvB Product is an admissible block in C
Similar to the leaf level, if the product is an admissible block in C whether at the same nonleaf level or at an upper level, case4 can be performed as it is since the product matrix is obviously admissible, which also preserves the original row and column cluster bases. We can write
case4:  
(22) 
where T denotes a transfer matrix, and superscript denotes the two children clusters of the nonleaf cluster . If the cluster bases at leaf level and the transfer matrices at nonleaf levels are kept the same as before, then the computation of (IVB) is to calculate the coupling matrix at level , which is
(23) 
It is a product of three small matrices whose size is the rank at this tree level. Rewriting (IVB) as
case4:  
(24) 
If we exclude the children cluster bases in the front and at the back, we can see that T serves as the new cluster basis at this level. In other words, at a nonleaf level , if we treat this level as the bottom level of the remaining tree, then the transfer matrix of the nonleaf cluster is nothing but the leaf cluster basis of the shortened tree.
Similar to the leaflevel computation, the other three cases of multiplications will result in a change of cluster basis in the matrix product. Specifically, case1 results in a different row as well as column cluster bases in the product admissible block because
(25) 
case2 yields a different row cluster basis since
(26) 
whereas case3 results in a different column cluster basis in the product admissible block, because
(27) 
If we do not update the cluster bases in the product matrix, the accuracy of the multiplication is not controllable. However, if we update the cluster basis as they are, it is computationally very expensive since the matrix block size keeps increasing when we proceed from leaf level towards the root level. In addition to the cost of changing cluster bases, if we have to carry out the multiplications at each nonleaf level using the actual matrix block size, then the computation is also prohibitive. Therefore, the fast algorithm we develop here is to perform all computations using the rank size at each tree level, and meanwhile control the accuracy.
In the proposed algorithm, to account for the updates to the original matrix during the MMP procedure, the cluster bases of C are computed level by level, which are manifested by the changed leaf cluster bases and the transfer matrices at nonleaf levels. At a nonleaf level, its childrenlevel cluster bases have already been computed, and they are different from the original ones in A and B. However, the new cluster bases have taken the upperlevel multiplications into consideration. Hence, we can accurately represent the multiplication at the current nonleaf level using newly generated children cluster bases.
Take case1 product as an example, where we perform obtaining an admissible . We can accurately represent this product using the children cluster bases of and as follows:
case1:  
(28) 
in which
(29) 
This collected block, , is actually the coupling matrix merged from the four small coupling matrices computed at previous level, when dealing with the multiplication case of having a target block as a subblock in the upperlevel admissible block. It can be written as
(30) 
Each of the four coupling matrices has been obtained at previous level. From (IVB), it is clear that using the nested property of the cluster bases, the collect operation does not need to start from leaf level, but using the four blocks obtained at previous one level.
For case2 product, it can also be accurately expanded in the space of the children row cluster bases, and hence
(31) 
where
(32) 
which is collected based on the children’s new row cluster bases in C and the original column cluster bases in B. From (IVB), it can be seen that if excluding the children cluster bases, then resembles the in the leaf level case2 product. In other words, if we treat the current nonleaf level as the leaf level, then is equivalent to a full matrix block, whereas T is the leaf cluster basis. An example of block at level in can be seen below:
(33) 
which consists of collected full matrices whose expressions are shown in (20), and projected coupling matrices of admissible blocks. Again, using the nested property of both new and original cluster bases, the collect operation does not need to start from leaf level, but using the four blocks obtained at previous one level. Each collect operation only costs , where is the rank at level .
Since the cluster bases at the previous level have been computed, for case1 and case2 products at a nonleaf level, we only need to compute the center block associated with the current nonleaf level, and this computation can be carried out in the same way as how we carry out leaflevel computation, if we treat the current nonleaf level as the leaf level of the remaining tree. The same is true to case3 product, where we have
(34) 
in which
(35) 
We can see that resembles the in the leaf level case3 product. An example of collected NL block in is given as follows
(36) 
which consists of collected full matrices whose expressions are shown in (20), and projected coupling matrices of admissible blocks.
Since the cluster bases have been changed at previous level, we also represent the case4 product using the new children cluster bases of and , thus
case4:  
(37) 
and
(38) 
which can be written in short as
(39) 
where denotes children. Here, there is a cluster basis transformation matrix in the front and at the back.
IvC Computation of the new nonleaf level transfer matrices in C
If the target block is an admissible block at a nonleaf level, we need to represent it as in controlled accuracy. Hence, we need to calculate new row and column transfer matrices T of product matrix . First, we introduce how to calculate the row transfer matrices. Similar to leaf level, case1 and 2 products result in a change in the row cluster basis and hence row transfer matrix. Case3 and 4 products do not require a change of transfer matrix if the cluster bases have not been changed at previous level. However, since the cluster bases have been changed at previous level, the transfer matrix requires an update as well.
For an arbitrary nonleaf cluster , we first find all of the case1 products associated with . Each of such a product leads to a coupling matrix merged from the four coupling matrices obtained at previous level computation, denoted by
Comments
There are no comments yet.