1 Introduction
Scientific computing with functions, vectors and matrices in low rank formats allows us to tackle even highdimensional and extreme scale problems, provided the underlying mathematical objects allow for a low rank approximation. In this article we focus on the task of postprocessing low rank matrices and tensors. Our guiding problem is to find the entry of maximum absolute value in a low rank object without accessing every entry. Such a task is the discrete analogue to finding global optima and thus of general interest. While this is straightforward for rank one, it is surprisingly difficult for ranks beyond one. The seminal idea for finding the maximum absolute value efficiently stems from the PhD thesis of Mike Espig
[5]where the problem is reformulated as a diagonal (extreme scale) eigenvalue problem. The latter can e.g. be solved by a simple power iteration, and in case that there is a unique maximizer, the corresponding eigenvector is rank one. Hence, it makes sense to approximate the eigenvector by low rank (cf.
[6] for a recent approach independently of this article). However, if several entries are of similar size, then the power iteration can be arbitrarily slow, the iterates cannot be approximated by low rank and a truncation to low rank may destroy convergence altogether.In this article we provide much more advanced approaches for finding the maximum by subspace acceleration and a squaring trick. Without truncation we are able to prove a uniformly good convergence rate of for computing the norm and based on this we devise a divide and conquer strategy to find the corresponding entry.
The article is structured as follows: In Section 2 we introduce the HTformat followed by the problem for rank one or elementary tensors in Section 3. In Section 4 we summarize the power iteration approach and give examples that underline the involved difficulties. In Section 5 we provide the subspace acceleration and in Section 6 the squaring trick. Finally, in Section 7 we explain how one can determine the entry of maximum absolute value based on the so far introduced iterative methods. We conclude by giving numerical examples in Section 8 that show the benefits and limitations of the algorithms.
2 Lowrank formats for tensors
We denote the entry of an order tensor , by
For larger tensor order
a full (entrywise) representation is not feasible (cf. the curse of dimensionality
[3, 2]) and additional structure is required. In the following we use low rank tensors that generalize the matrix rank. [Elementary tensors] A tensor is called elementary tensor or rank tensor if(1) 
The CPrank (Canonical Polyadic) [14] of a tensor is defined as the minimum number such that
(2) 
For computational purposes the minimum number of terms is not necessary, instead any representation of the form (2) with reasonably small is sufficient. Such a representation is called CPformat with representation rank . The corresponding set is defined by
(3) 
These sets are numerically difficult and thus a different type of rank is commonly used which is based on matricizations. [matricization] Let . We denote
(4) 
and we use the short notation
(5) 
Then the matricization of is defined by
i.e. a rearrangement of into a matrix with row index set and column index set . For the CPformat, instead of all tensor entries of some tensor , the matrices from the righthand side of (2) are stored. Assuming all tensor directions to be of the same size , this sums up to a storage complexity of , which is linear in the tensor order . Despite of this desirable linear complexity in , the CPformat has two drawbacks: on the one hand, the CPrank is in general NPhard to determine, cf. [15], on the other hand, the set (3) is not closed for tensors of higher order , cf. [4, 12]. This fact can render the numerical treatment of tensors in CPformat difficult.
For the Tucker format the corresponding set is closed, cf. [12], but the storage complexity w.r.t. the tensor order is exponential.
The above mentioned complexities and shortcomings transfer to many arithmetic operations, such as dot products or Hadamard products, which are involved in the algorithms presented in this article. All of the shortcomings can be resolved by using the Hierarchical Tucker format (HTformat) introduced in the following.
2.1 The Hierarchical Tucker format
An HTformat for a tensor , , is based on a hierarchy of all tensor directions , which can be expressed by a binary tree, referred to as dimension tree, cf. [8, 12]:
[Dimension tree] For a tensor space , , by we denote the set of all tensor directions. A corresponding dimension tree is a tree with the following properties:

all vertices of are nonempty subsets of ,

the root vertex of is the set : ,

all vertices with have two disjoint successors with
A dimension tree for a tensor order , , is thus a binary tree with singletons , , at the leaf vertices. Two examples of dimension trees for tensor order are shown in Figure 1.
The HTformat is again defined by matricizations, cf. Definition 3, where the subsets are the subsets of the underlying dimension tree which determines the structure of the HTformat: [Hierarchical Tucker format (HTformat)] Let , , be a tensor and , , a dimension tree. Furthermore, for all let
(6) 
, as in (4), be a decomposition of the matricization of , where the , , are referred to as frames: the columns of span the range of , especially for we assume (6) to be chosen such that
(7) 
holds, i.e. . A Hierarchical Tucker format (HTformat) for is then defined by

the above matrices for all leaf vertices , , of and

transfer tensors at all nonleaf vertices , , with at the root, which fulfill for all
(8) i.e. the transfer tensor holds the coefficients of the linear combination (8) by which the frame can be obtained from the frames and .
The vector containing the numbers of columns of the frames , , is called representation rank of the HTformat. Notice that we always assume at the root, i.e. contains only one column, namely , cf. (7).
[Hierarchical Tucker rank (HTrank)] The Hierarchical Tucker rank (HTrank) of a tensor , , w.r.t. some dimension tree , , is defined as the vector containing the smallest possible numbers , , such that an HTformat of with representation rank exists.
Notice that by (6) the components , , of the HTrank of , , w.r.t to some dimension tree are the ranks of the respective matricizations , denoted as
According to (3) for , , we define the set as
(9) 
i.e. the set of all tensors for which an HTrepresentation with representation rank exists. It can be shown that is a closed subset of , cf. [12].
Evaluation of tensor entries
Let , , be represented in the HTformat w.r.t. some dimension tree , i.e. at the leaf vertices the matrices are stored and at the nonleaf vertices the transfer tensors , , are stored. The tensor entries , , are thus not available explicitly but can be computed recursively: by (7) we have
which can be evaluated by using (8):
(10) 
where we use the notation (5) for subindices of corresponding to some subset . The columns and are either stored explicitly (if the corresponding vertex or is a leaf of ) or can again be evaluated recursively by using (8). By this means, the tensor entries , , of can be evaluated with a computational work growing like , i.e. linearly with the tensor order , when is an upper bound for the components of the representation rank .
Storage complexity
The storage complexity for a tensor , , in HTrepresentation w.r.t. some dimension tree also grows linearly with the tensor order (as for the CPformat): assuming all tensor directions to be of the same size and a bound for the components of the representation rank of , real numbers have to be stored at each leaf vertex , , cf. Definition 1 (a), and real numbers have to be stored at each interior vertex (neither leaf nor root) and real numbers at the root. Taking into account that a dimension tree , , consists of leaf vertices and nonleaf vertices, this sums up to a storage complexity of .
Arithmetic operations in the HTformat
Two arithmetic operations for tensors, which are of particular importance for this article, are the dot product and the Hadamard product of two tensors , defined by
These operations can be computed directly in the HTformat, cf. [12, 11]. The corresponding complexities grow again linearly with the tensor order : assuming all tensor directions to be of the same size and an upper bound for the components of the representation ranks of and , dot products can be computed with a complexity bounded by flops whereas flops are needed for the computation of Hadamard products (cf. [11] for a parallel algorithm).
A more detailed description of different lowrank formats for tensors can e.g. be found in [12, 10]. For this article we use the HTformat but other formats can be used as well, e.g. the TTformat [20, 12] or MPS format [21, 22]. Since the computation of the Hadamard product of two lowrank tensors typically yields a result of higher representation rank, one needs a truncation procedure which truncates tensors back to lower ranks. For the HTformat we use the truncation techniques introduced in [8] the complexity of which is comparable to that of the computation of dot products above.
3 Maximum entries of elementary tensors
Let , . [Maximum norm of a tensor] The maximum norm of a tensor is defined by .
Maximum norm of a matrix
Maximum norm of an elementary tensor
Let , , , be an elementary tensor. The maximum norm of is then given by
which can be maximized by maximizing each of the factors , , in complexity . Unfortunately, no such algorithm is known for tensors of higher rank (i.e. nonelementary tensors). Even if a good approximation of a matrix by a rank matrix
exists in the Euclidean sense (e.g. a truncated singular value decomposition), the corresponding maximum norms
and may differ extremely as the following example indicates:Example
Consider a matrix defined as
(11) 
Then has the singular values and is the rank best approximation of in the Frobenius (and spectral) norm with an approximation error
(12) 
in the Frobenius norm. Assuming we get a corresponding error
(13) 
for the approximation of the maximum norm, which tends to for , independently of the error (12).
4 Estimating the maximum norm by a power iteration
We use a power iteration to approximate the maximum norm of a tensor , , as introduced in [5]: consider the diagonal matrix
(14) 
Then the maximum norm is the maximum absolute value of an eigenvalue of :
(15) 
which can be computed by a standard power iteration if there exists only one eigenvalue of with , i.e. there is only one index with . Taking into account that the matrix multiplication , , can be written as the Hadamard product , this results in Algorithm 1, where denotes the Euclidean norm of .
Improvement of the Rayleigh quotient
Due to the diagonal structure of the eigenvalue problem in (15), the Rayleigh quotient in line 1 of Algorithm 1 can be improved, which results in a new estimator , , cf. Algorithm 2. [Improved estimator for the power iteration] The new estimator , , of Algorithm 2 is at least as good as the Rayleigh quotient , , from the standard power iteration, cf. Algorithm 1:
(16) 
and always converges to from below: , . Using the CauchySchwarz inequality and in line 1 of Algorithm 1 yields
i.e. (16). By induction one easily verifies that , , holds in line 2 of Algorithm 2, where denotes the fold Hadamard product of . This implies
(17) 
in line 2, i.e. , , are lower bounds of . We now use
(18) 
and the norm equivalence for norms on which follows from the Hölder inequality:
(19) 
as well as
(20) 
in order to get a lower bound for , :
(21) 
Combining the bounds (17) and (21) for yields
(22) 
i.e. for since , .
The following example shows that the estimator from Algorithm 2 is indeed better than the Rayleigh quotient from the standard power iteration:
Example
Error estimator for Algorithm 2
Since Algorithm 2 is a power iteration for with an improved estimator for the eigenvalue, cf. Lemma 4, any error estimator known for the standard power iteration can also be applied to Algorithm 2. Since in our framework the underlying matrix is diagonal, cf. (14), results for power iteration with symmetric matrices may be applied. E.g. in [18] the error , , is proven to be bounded by
(23) 
if the corresponding matrix is symmetric positive definite, which in our framework means that all entries of are positive, cf. (14).
However, for our special case of a power iteration on a diagonal matrix , the bound
(24) 
can directly be obtained from (22) in the proof of Lemma 4. Asymptotically the bounds (23) and (24) show the same behavior, where the bound in (24) is always better than the one in (23). Assuming an error which grows like the bound (23) leads to a slow convergence:
(25) 
Complexity of the power iteration
In the HTformat all tensor operations involved in Algorithm 2 can be carried out directly in the HTformat with a complexity depending linearly on the tensor order and the mode sizes. We can thus expect such a linear complexity for the approximation of by Algorithm 2 with , , in HTrepresentation, which is verified by the experiments in Section 8.6, cf. Figure 7.
Truncation
Since the computation of Hadamard products increases the representation ranks of the iterates, we include truncations in each step of Algorithm 2 in order to keep the representation ranks of the iterates small enough. We compute the truncations by a hierarchical singular value decomposition (HSVD) in the HTformat [8]. A detailed analysis of truncations in iterative fixedpoint like processes is given by [13]: if the truncation errors are small enough and the starting tensor is chosen appropriately, the iterative process still converges to the original fixed point, provided that the latter allows for a respective lowrank representation. In our numerical experiments, cf. Section 8, we truncate w.r.t. fixed prescribed HTranks in each step of the algorithms, which already worked well. Another option would be to truncate w.r.t. a given tolerance for the relative truncation error, which, however, may lead to high representation ranks in the beginning of the algorithms, since the underlying error estimations for the HTformat can be rather pessimistic, cf. [8].
Notice that it is possible to construct counterexamples where any truncation leads to the loss of that entry of which corresponds to , i.e. to a wrong approximation of in the end. Such counterexamples can be constructed as in (11): take (order tensor) as defined in (11) with and , . A rank best approximation of is given by , cf. (11), i.e. after truncation, instead of , the maximum norm would be computed. This example shows that it may happen that a single entry of larger absolute value (the entry of ) is ”hidden” by many entries of smaller absolute value (the entries of ). The reason for this is that the truncation, cf. [8], is based on the singular value decomposition, which corresponds to a best approximation in the Euclidean sense and not w.r.t. the maximum norm. Nevertheless this does not stand in contradiction to the convergence analysis developed in [13] since the truncation of does not meet the underlying assumptions: the starting tensor of the iterative process is far away from the fixed point (), cf. (11):
5 Estimating the maximum norm by Ritz values
In order to accelerate the convergence of the power iteration, we make use of an orthogonal projection technique which uses not only the iterate of the current step for the estimation of , cf. line 2 in Algorithm 2, but the latest iterates, i.e. , for some , . The resulting method is often referred to as RayleighRitz method, cf. [1]:
RayleighRitz method
The idea is to approximate an exact eigenvector of the underlying eigenvalue problem , , cf. (14), (15), by a linear combination of the latest iterates , , which means that an approximation of is searched for in the corresponding subspace
(26) 
The approximation and a corresponding approximation of are obtained by imposing a Galerkin condition:
Computing an orthonormal basis of and defining the symmetric matrix as
(27) 
yields
i.e. is the eigenvalue of which has maximum absolute value, and . The eigenvalues of are called Ritz values of , i.e. the eigenvalue of fulfilling is approximated by the Ritz value of which maximizes .
5.1 Computation of orthonormal bases in the HTformat
Since we use the HTformat for the representation of lowrank tensors, the iterates spanning the subspace are tensors in the HTformat. We thus want to compute an orthonormal basis of
directly in the HTformat. This can e.g. be achieved by a GramSchmidt process which only involves the computation of sums and dot products and can thus be carried out directly in the HTformat.
For our experiments in Section 8 we used a different procedure which computes a set of orthonormal HTtensors together with a right factor , such that
(28) 
We now briefly describe this procedure which we refer to as HTQR decomposition. In
[12] the computation of joint orthonormal bases (ONB) is described for a set of tensors in HTrepresentation: at each nonroot vertex of the underlying dimension tree , a QRdecomposition of the matrix containing the corresponding frames of the tensors is computed:(29) 
The orthonormal columns of are kept as new (joint, orthonormal) basis at the vertex for all tensors and the right factors are multiplied to the father vertices, where the procedure continues recursively. By this, joint ONB are installed at all nonroot vertices , i.e. the new HTrepresentations of only differ at the root vertex. At the leaf vertices, a QRdecomposition (29) can be computed directly since the frame matrices are stored explicitly. At the interior vertices, (29) can be computed by a respective QRdecomposition of
Notice that the computation of joint ONB for the set only affects their HTrepresentations while the tensors themselves stay invariant. After the HTrepresentations of have been transformed into joint ONB, an HTQR decomposition (28) can easily be obtained by computing a respective QRdecomposition of the root transfer tensors .
6 Acceleration by squaring
Even though the RayleighRitz method (Algorithm 3), cf. Section 5, may lead to a notable acceleration of the power iteration (Algorithm 2), cf. Figure 3(a) and Figure 3, there are also cases where the order of convergence stays invariant, cf. Figure 3(b). However, the power iteration including a RayleighRitz method can well be used to generate sufficiently good starting tensors for a faster (but less stable) method, which is presented in this section.
Power iteration with squaring
As stated in the proof of Lemma 4, the th iterate of Algorithm 2 is the corresponding normalized power of the original tensor (if truncations down to smaller HTrepresentation rank are neglected):
(30) 
The power iteration can be sped up by taking the new iterate
instead of line 2 in Algorithm 2, which results in Algorithm 4, hereinafter referred to as power iteration with squaring.
Without truncations, the iterates of Algorithm 4 are also normalized powers of the original tensor :
(31) 
which yields the corresponding new expression for in line 4 of Algorithm 4.
The comparison of (30) and (31) shows the immense acceleration which can be expected by using Algorithm 4 instead of Algorithm 2: the iterate in step of Algorithm 4 corresponds to the iterate in step of Algorithm 2. Assuming the slow convergence (25) for Algorithm 2, cf. Figures 1(a) and 2(a), the corresponding convergence behavior of Algorithm 4 would be
(32) 
i.e. linear convergence with a convergence rate of or better, which is verified by our experiments in Section 8.3, cf. Figure 5.
Truncation
As in Algorithm 2, in practice truncations are included into Algorithm 4, since otherwise the complexity of the involved tensor arithmetic quickly becomes too large, cf. Section 2.1. Algorithm 4, however, is less stable than Algorithm 2, which can be explained as follows: consider two tensors , , the Hadamard product of which is to be computed. If only is perturbed by a relative error (due to truncation), the computed Hadamard product will have the same relative error (notice that for any norm, , if ):
(33) 
whereas the relative error doubles if and are both afflicted with the relative error:
Comments
There are no comments yet.