Scientific computing with functions, vectors and matrices in low rank formats allows us to tackle even high-dimensional and extreme scale problems, provided the underlying mathematical objects allow for a low rank approximation. In this article we focus on the task of postprocessing low rank matrices and tensors. Our guiding problem is to find the entry of maximum absolute value in a low rank object without accessing every entry. Such a task is the discrete analogue to finding global optima and thus of general interest. While this is straight-forward for rank one, it is surprisingly difficult for ranks beyond one. The seminal idea for finding the maximum absolute value efficiently stems from the PhD thesis of Mike Espig
where the problem is reformulated as a diagonal (extreme scale) eigenvalue problem. The latter can e.g. be solved by a simple power iteration, and in case that there is a unique maximizer, the corresponding eigenvector is rank one. Hence, it makes sense to approximate the eigenvector by low rank (cf. for a recent approach independently of this article). However, if several entries are of similar size, then the power iteration can be arbitrarily slow, the iterates cannot be approximated by low rank and a truncation to low rank may destroy convergence altogether.
In this article we provide much more advanced approaches for finding the maximum by subspace acceleration and a squaring trick. Without truncation we are able to prove a uniformly good convergence rate of for computing the -norm and based on this we devise a divide and conquer strategy to find the corresponding entry.
The article is structured as follows: In Section 2 we introduce the HT-format followed by the problem for rank one or elementary tensors in Section 3. In Section 4 we summarize the power iteration approach and give examples that underline the involved difficulties. In Section 5 we provide the subspace acceleration and in Section 6 the squaring trick. Finally, in Section 7 we explain how one can determine the entry of maximum absolute value based on the so far introduced iterative methods. We conclude by giving numerical examples in Section 8 that show the benefits and limitations of the algorithms.
2 Low-rank formats for tensors
We denote the entry of an order tensor , by
For larger tensor order
a full (entrywise) representation is not feasible (cf. the curse of dimensionality[3, 2]) and additional structure is required. In the following we use low rank tensors that generalize the matrix rank. [Elementary tensors] A tensor is called elementary tensor or rank tensor if
The CP-rank (Canonical Polyadic)  of a tensor is defined as the minimum number such that
For computational purposes the minimum number of terms is not necessary, instead any representation of the form (2) with reasonably small is sufficient. Such a representation is called CP-format with representation rank . The corresponding set is defined by
These sets are numerically difficult and thus a different type of rank is commonly used which is based on matricizations. [-matricization] Let . We denote
and we use the short notation
Then the -matricization of is defined by
i.e. a rearrangement of into a matrix with row index set and column index set . For the CP-format, instead of all tensor entries of some tensor , the matrices from the right-hand side of (2) are stored. Assuming all tensor directions to be of the same size , this sums up to a storage complexity of , which is linear in the tensor order . Despite of this desirable linear complexity in , the CP-format has two drawbacks: on the one hand, the CP-rank is in general NP-hard to determine, cf. , on the other hand, the set (3) is not closed for tensors of higher order , cf. [4, 12]. This fact can render the numerical treatment of tensors in CP-format difficult.
For the Tucker format the corresponding set is closed, cf. , but the storage complexity w.r.t. the tensor order is exponential.
The above mentioned complexities and shortcomings transfer to many arithmetic operations, such as dot products or Hadamard products, which are involved in the algorithms presented in this article. All of the shortcomings can be resolved by using the Hierarchical Tucker format (HT-format) introduced in the following.
2.1 The Hierarchical Tucker format
[Dimension tree] For a tensor space , , by we denote the set of all tensor directions. A corresponding dimension tree is a tree with the following properties:
all vertices of are non-empty subsets of ,
the root vertex of is the set : ,
all vertices with have two disjoint successors with
A dimension tree for a tensor order , , is thus a binary tree with singletons , , at the leaf vertices. Two examples of dimension trees for tensor order are shown in Figure 1.
The HT-format is again defined by matricizations, cf. Definition 3, where the subsets are the subsets of the underlying dimension tree which determines the structure of the HT-format: [Hierarchical Tucker format (HT-format)] Let , , be a tensor and , , a dimension tree. Furthermore, for all let
holds, i.e. . A Hierarchical Tucker format (HT-format) for is then defined by
the above matrices for all leaf vertices , , of and
transfer tensors at all non-leaf vertices , , with at the root, which fulfill for all
i.e. the transfer tensor holds the coefficients of the linear combination (8) by which the frame can be obtained from the frames and .
The vector containing the numbers of columns of the frames , , is called representation rank of the HT-format. Notice that we always assume at the root, i.e. contains only one column, namely , cf. (7).
[Hierarchical Tucker rank (HT-rank)] The Hierarchical Tucker rank (HT-rank) of a tensor , , w.r.t. some dimension tree , , is defined as the vector containing the smallest possible numbers , , such that an HT-format of with representation rank exists.
Notice that by (6) the components , , of the HT-rank of , , w.r.t to some dimension tree are the ranks of the respective matricizations , denoted as
According to (3) for , , we define the set as
i.e. the set of all tensors for which an HT-representation with representation rank exists. It can be shown that is a closed subset of , cf. .
Evaluation of tensor entries
Let , , be represented in the HT-format w.r.t. some dimension tree , i.e. at the leaf vertices the matrices are stored and at the non-leaf vertices the transfer tensors , , are stored. The tensor entries , , are thus not available explicitly but can be computed recursively: by (7) we have
which can be evaluated by using (8):
where we use the notation (5) for sub-indices of corresponding to some subset . The columns and are either stored explicitly (if the corresponding vertex or is a leaf of ) or can again be evaluated recursively by using (8). By this means, the tensor entries , , of can be evaluated with a computational work growing like , i.e. linearly with the tensor order , when is an upper bound for the components of the representation rank .
The storage complexity for a tensor , , in HT-representation w.r.t. some dimension tree also grows linearly with the tensor order (as for the CP-format): assuming all tensor directions to be of the same size and a bound for the components of the representation rank of , real numbers have to be stored at each leaf vertex , , cf. Definition 1 (a), and real numbers have to be stored at each interior vertex (neither leaf nor root) and real numbers at the root. Taking into account that a dimension tree , , consists of leaf vertices and non-leaf vertices, this sums up to a storage complexity of .
Arithmetic operations in the HT-format
Two arithmetic operations for tensors, which are of particular importance for this article, are the dot product and the Hadamard product of two tensors , defined by
These operations can be computed directly in the HT-format, cf. [12, 11]. The corresponding complexities grow again linearly with the tensor order : assuming all tensor directions to be of the same size and an upper bound for the components of the representation ranks of and , dot products can be computed with a complexity bounded by flops whereas flops are needed for the computation of Hadamard products (cf.  for a parallel algorithm).
A more detailed description of different low-rank formats for tensors can e.g. be found in [12, 10]. For this article we use the HT-format but other formats can be used as well, e.g. the TT-format [20, 12] or MPS format [21, 22]. Since the computation of the Hadamard product of two low-rank tensors typically yields a result of higher representation rank, one needs a truncation procedure which truncates tensors back to lower ranks. For the HT-format we use the truncation techniques introduced in  the complexity of which is comparable to that of the computation of dot products above.
3 Maximum entries of elementary tensors
Let , . [Maximum norm of a tensor] The maximum norm of a tensor is defined by .
Maximum norm of a matrix
Maximum norm of an elementary tensor
Let , , , be an elementary tensor. The maximum norm of is then given by
which can be maximized by maximizing each of the factors , , in complexity . Unfortunately, no such algorithm is known for tensors of higher rank (i.e. non-elementary tensors). Even if a good approximation of a matrix by a rank- matrix
exists in the Euclidean sense (e.g. a truncated singular value decomposition), the corresponding maximum normsand may differ extremely as the following example indicates:
Consider a matrix defined as
Then has the singular values and is the rank- best approximation of in the Frobenius (and spectral) norm with an approximation error
in the Frobenius norm. Assuming we get a corresponding error
for the approximation of the maximum norm, which tends to for , independently of the error (12).
4 Estimating the maximum norm by a power iteration
We use a power iteration to approximate the maximum norm of a tensor , , as introduced in : consider the diagonal matrix
Then the maximum norm is the maximum absolute value of an eigenvalue of :
which can be computed by a standard power iteration if there exists only one eigenvalue of with , i.e. there is only one index with . Taking into account that the matrix multiplication , , can be written as the Hadamard product , this results in Algorithm 1, where denotes the Euclidean norm of .
Improvement of the Rayleigh quotient
Due to the diagonal structure of the eigenvalue problem in (15), the Rayleigh quotient in line 1 of Algorithm 1 can be improved, which results in a new estimator , , cf. Algorithm 2. [Improved estimator for the power iteration] The new estimator , , of Algorithm 2 is at least as good as the Rayleigh quotient , , from the standard power iteration, cf. Algorithm 1:
in line 2, i.e. , , are lower bounds of . We now use
and the norm equivalence for -norms on which follows from the Hölder inequality:
as well as
in order to get a lower bound for , :
i.e. for since , .
The following example shows that the estimator from Algorithm 2 is indeed better than the Rayleigh quotient from the standard power iteration:
Error estimator for Algorithm 2
Since Algorithm 2 is a power iteration for with an improved estimator for the eigenvalue, cf. Lemma 4, any error estimator known for the standard power iteration can also be applied to Algorithm 2. Since in our framework the underlying matrix is diagonal, cf. (14), results for power iteration with symmetric matrices may be applied. E.g. in  the error , , is proven to be bounded by
if the corresponding matrix is symmetric positive definite, which in our framework means that all entries of are positive, cf. (14).
However, for our special case of a power iteration on a diagonal matrix , the bound
can directly be obtained from (22) in the proof of Lemma 4. Asymptotically the bounds (23) and (24) show the same behavior, where the bound in (24) is always better than the one in (23). Assuming an error which grows like the bound (23) leads to a slow convergence:
Complexity of the power iteration
In the HT-format all tensor operations involved in Algorithm 2 can be carried out directly in the HT-format with a complexity depending linearly on the tensor order and the mode sizes. We can thus expect such a linear complexity for the approximation of by Algorithm 2 with , , in HT-representation, which is verified by the experiments in Section 8.6, cf. Figure 7.
Since the computation of Hadamard products increases the representation ranks of the iterates, we include truncations in each step of Algorithm 2 in order to keep the representation ranks of the iterates small enough. We compute the truncations by a hierarchical singular value decomposition (HSVD) in the HT-format . A detailed analysis of truncations in iterative fixed-point like processes is given by : if the truncation errors are small enough and the starting tensor is chosen appropriately, the iterative process still converges to the original fixed point, provided that the latter allows for a respective low-rank representation. In our numerical experiments, cf. Section 8, we truncate w.r.t. fixed prescribed HT-ranks in each step of the algorithms, which already worked well. Another option would be to truncate w.r.t. a given tolerance for the relative truncation error, which, however, may lead to high representation ranks in the beginning of the algorithms, since the underlying error estimations for the HT-format can be rather pessimistic, cf. .
Notice that it is possible to construct counterexamples where any truncation leads to the loss of that entry of which corresponds to , i.e. to a wrong approximation of in the end. Such counterexamples can be constructed as in (11): take (order- tensor) as defined in (11) with and , . A rank- best approximation of is given by , cf. (11), i.e. after truncation, instead of , the maximum norm would be computed. This example shows that it may happen that a single entry of larger absolute value (the entry of ) is ”hidden” by many entries of smaller absolute value (the entries of ). The reason for this is that the truncation, cf. , is based on the singular value decomposition, which corresponds to a best approximation in the Euclidean sense and not w.r.t. the maximum norm. Nevertheless this does not stand in contradiction to the convergence analysis developed in  since the truncation of does not meet the underlying assumptions: the starting tensor of the iterative process is far away from the fixed point (), cf. (11):
5 Estimating the maximum norm by Ritz values
In order to accelerate the convergence of the power iteration, we make use of an orthogonal projection technique which uses not only the iterate of the current step for the estimation of , cf. line 2 in Algorithm 2, but the latest iterates, i.e. , for some , . The resulting method is often referred to as Rayleigh-Ritz method, cf. :
The idea is to approximate an exact eigenvector of the underlying eigenvalue problem , , cf. (14), (15), by a linear combination of the latest iterates , , which means that an approximation of is searched for in the corresponding subspace
The approximation and a corresponding approximation of are obtained by imposing a Galerkin condition:
Computing an orthonormal basis of and defining the symmetric matrix as
i.e. is the eigenvalue of which has maximum absolute value, and . The eigenvalues of are called Ritz values of , i.e. the eigenvalue of fulfilling is approximated by the Ritz value of which maximizes .
5.1 Computation of orthonormal bases in the HT-format
Since we use the HT-format for the representation of low-rank tensors, the iterates spanning the subspace are tensors in the HT-format. We thus want to compute an orthonormal basis of
directly in the HT-format. This can e.g. be achieved by a Gram-Schmidt process which only involves the computation of sums and dot products and can thus be carried out directly in the HT-format.
For our experiments in Section 8 we used a different procedure which computes a set of orthonormal HT-tensors together with a right factor , such that
We now briefly describe this procedure which we refer to as HT-QR decomposition. In the computation of joint orthonormal bases (ONB) is described for a set of tensors in HT-representation: at each non-root vertex of the underlying dimension tree , a QR-decomposition of the matrix containing the corresponding frames of the tensors is computed:
The orthonormal columns of are kept as new (joint, orthonormal) basis at the vertex for all tensors and the right factors are multiplied to the father vertices, where the procedure continues recursively. By this, joint ONB are installed at all non-root vertices , i.e. the new HT-representations of only differ at the root vertex. At the leaf vertices, a QR-decomposition (29) can be computed directly since the frame matrices are stored explicitly. At the interior vertices, (29) can be computed by a respective QR-decomposition of
Notice that the computation of joint ONB for the set only affects their HT-representations while the tensors themselves stay invariant. After the HT-representations of have been transformed into joint ONB, an HT-QR decomposition (28) can easily be obtained by computing a respective QR-decomposition of the root transfer tensors .
6 Acceleration by squaring
Even though the Rayleigh-Ritz method (Algorithm 3), cf. Section 5, may lead to a notable acceleration of the power iteration (Algorithm 2), cf. Figure 3(a) and Figure 3, there are also cases where the order of convergence stays invariant, cf. Figure 3(b). However, the power iteration including a Rayleigh-Ritz method can well be used to generate sufficiently good starting tensors for a faster (but less stable) method, which is presented in this section.
Power iteration with squaring
The power iteration can be sped up by taking the new iterate
Without truncations, the iterates of Algorithm 4 are also normalized powers of the original tensor :
The comparison of (30) and (31) shows the immense acceleration which can be expected by using Algorithm 4 instead of Algorithm 2: the iterate in step of Algorithm 4 corresponds to the iterate in step of Algorithm 2. Assuming the slow convergence (25) for Algorithm 2, cf. Figures 1(a) and 2(a), the corresponding convergence behavior of Algorithm 4 would be
As in Algorithm 2, in practice truncations are included into Algorithm 4, since otherwise the complexity of the involved tensor arithmetic quickly becomes too large, cf. Section 2.1. Algorithm 4, however, is less stable than Algorithm 2, which can be explained as follows: consider two tensors , , the Hadamard product of which is to be computed. If only is perturbed by a relative error (due to truncation), the computed Hadamard product will have the same relative error (notice that for any -norm, , if ):
whereas the relative error doubles if and are both afflicted with the relative error: