 # Randomized algorithms for the low multilinear rank approximations of tensors

In this paper, we develop efficient methods for the computation of low multilinear rank approximations of tensors based on randomized algorithms. Combining the random projection with the singular value decomposition, the rank-revealing QR decomposition and the rank-revealing LU factorization, respectively, we obtain three randomized algorithms for computing the low multilinear rank approximations. Based on the singular values of sub-Gaussian matrices, we derive the error bounds for each algorithm with probability. We illustrate the proposed algorithms via several numerical examples.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

An increasing number of applications, such as in chemometrics, signal processing and high order statistics [9, 10, 11, 29, 55]

, involve the manipulation of quantities with elements addressed by more than two indices. In the literature these higher-order equivalents of vectors (first-order) and matrices (second-order) are called higher-order tensors, multidimensional matrices, or multiway arrays.

The symbol represents a -dimensional array of real numbers with entries given by for all and . For notational simplicity, we illustrate our results using third-order tensors whenever generalizations to higher-order cases are straightforward. Subtle differences will be mentioned when they exist.

In this paper, we consider the low multilinear rank approximation of a tensor, which is defined as follows.

###### Problem 1.1.

Suppose that . The goal is to require three column full rank matrices with , such that

 ai1i2i3≈I1,I2,I3∑j1,j2,j3=1aj1j2j3p(1)i1j1p(2)i2j2p(3)i3j3,

where is a projected matrix.

When all the matrices are columnwise orthogonal, Problem 1.1 can be solved using a number of recently developed algorithms, such as higher-order orthogonal iteration , the Newton-Grassmann method , the Riemannian trust-region method , the Quasi-Newton method , semi-definite programming (SDP) , and Lanczos-type iteration [22, 45]. The readers can refer to two surveys [24, 29] for relevant information. When the columns of each are extracted from the mode- unfolding matrix , then, the solution of Problem 1.1 is called as the CUR-type decomposition of , which can be obtained using different versions of the cross approximation method. We can refer to [5, 16, 23, 34, 38, 39] for more details about a CUR-type decomposition of tensors. On the other hand, for Problem 1.1, when we restrict the entries of the tensor and the matrices to be nonnegative and allow the matrices to be not columnwise orthogonal, the solution of Problem 1.1 is sometimes called a nonnegative Tucker decomposition [19, 58, 59, 61].

Low-rank matrix approximations, such as the truncated singular value decomposition [21, page 291] and the rank-revealing QR decomposition , play a central role in data analysis and scientific computing. Halko, Rohwedder, and Tropp  present a modular framework to construct randomized algorithms for computing partial matrix decompositions. We can refer to three surveys [17, 33, 56] for more results about the randomized algorithms for the low rank matrix approximations.

Randomized algorithms have recently been applied to tensor decompositions. Drineas and Mahoney  presented and analyzed randomized algorithms for computing the CUR-type decomposition of a tensor, which can be viewed as the generalization of the Linear-Time-SVD algorithm  and the Fast-Approximate-SVD algorithm  for the low rank approximations of matrices to tensors, which were originally for matrices. Battaglino et al.  extended randomized least squares methods to tensors and show the workload of CANDECOMP/PARAFAC-ALS can be drastically reduced without sacrifice in quality. Vervliet and De Lathauwer  presented the randomized block sampling canonical polyadic decomposition method, which combines increasingly popular ideas from randomization and stochastic optimization to tackle the computational problems.

Zhou et al.  proposed a distributed randomized Tucker decomposition for arbitrarily big tensors but with relatively low multilinear rank. Che and Wei  designed adaptive randomized algorithms for computing the low multilinear rank approximation of tensors and the approximate tensor train decomposition. More results about this topic can be found in [3, 37, 50] and their references.

As shown in [8, 60], comparison with the deterministic algorithms for low multilinear rank approximations, randomized algorithms are often faster and more robust. On the other hand, the algorithms in [8, 60] still have some deficiencies. Hence, the main work in this paper is to design more effective randomized algorithms for the computation of low multilinear rank approximations of tensors.

Our proposed algorithms can be divided into two stages. Suppose that . In the first stage, for each , the Kronecker product of two standard Gaussian matrices of suitable dimensions are applied to the mode- unfolding of , which is an matrix . In the second stage, we use a basic matrix decomposition, such as singular value decomposition (SVD), the rank-revealing QR decomposition (RRQR) and the rank-revealing LU factorization (RRLU), to obtain a full column rank matrix, satisfying the requirement that the column space of the matrix can be used to approximate . Note that Algorithm 4.1 with “FactType”=“SVD” and “FactType”=“RRLU” can be viewed as the generalization of the core idea of the randomized algorithm in  and Algorithm 4.1 in  with , respectively. As shown in Section 6, in terms of CPU times, the proposed algorithms are faster than the existed algorithms for low multilinear rank approximations; and in terms of RLNE, the proposed algorithms are sometimes less than the existed algorithms.

### 1.1 Notations and organizations

Throughout this paper, we assume that , , and denote the index upper bounds, unless stated otherwise. We use lower case letters for scalars, lower case bold letters for vectors, bold capital letters for matrices, and calligraphic letters for tensors. This notation is consistently used for lower-order parts of a given structure. For example, the entry with row index and column index in a matrix , i.e., , is represented as (also and ).

For a vector we use and to denote its 2-norm and transpose, respectively. denotes the zero vector in . We use to denote the Kronecker product of matrices and . We use to denote the Khatri–Rao product of matrices and . represents the Moore-Penrose pseudoinverse of .

The rest of our paper is organized as follows. In Section 2, we introduce basic tensor algebra, such as, tensor operation, RRQR, RRLU and singular values of general and random matrices. We present the higher-order singular value decomposition and higher-order orthogonal iteration for the low multilinear rank approximation in Section 3. The randomized algorithms for the low multilinear rank approximation are presented in Section 4. In the same section, we also analyze probabilistic error bounds and computational complexity of these three algorithms. The probabilistic error bounds are analyzed in Section 5. We illustrate our algorithms via numerical examples in Section 6. We conclude this paper and discuss future research topics in Section 7.

## 2 Preliminaries

### 2.1 Basic definitions

We review the basic notations and concepts involving tensors which will be used in this paper. The mode- product [10, 29] of a real tensor by a matrix , denoted by :

 n=1: cji2i3=I1∑i1=1ai1i2i3bji1; n=2: ci1ji3=I2∑i2=1ai1i2i3bji2; n=3: ci1i2j=I3∑i3=1ai1i2i3bji3.

For any given tensor and three matrices , and , one has 

 (A×nF)×mG=(A×mG)×nF=A×nF×mG,(A×nF)×nH=A×n(H⋅F),

where ‘’ represents the multiplication of two matrices with appropriate sizes.

Scalar products and the Frobenius norm of a tensor are extensions of respective definitions from a matrix to a tensor of an arbitrary order [12, 29]. For two tensors , the Frobenius norm of a tensor is given by and the scalar product is defined as ,

 ⟨A,B⟩=I1,I2,I3∑i1,i2,i3=1ai1i2i3bi1i2i3.

Generally speaking, the mode- unfolding matrix of a third-order tensor can be understood as the process of the construction of a matrix containing all the mode- vectors of the tensor. The order of the columns is not unique and the unfolding matrix of , denoted by , arranges the mode- fibers into columns of this matrix. More specifically, a tensor element maps on a matrix element , where

 n=1: j=i2+(i3−1)I2;n=2: j=i1+(i3−1)I1;n=3: j=i1+(i2−1)I1.

### 2.2 Rank revealing QR (RRQR)

For a given with , QR factorization with column pivoting makes use of a column pivoting strategy  to determine a permutation matrix such that is the QR factorization of , with being columnwise orthogonal and the upper triangular matrix partitioned as

 R=(R11R120(J−K)×KR22)

where and is small in norm.

The QR factorization of , where is a permutation matrix chosen to yield a “small” , is referred to as the rank-revealing QR (RRQR) factorization of . The definition of the RRQR factorization is given below.

###### Definition 2.1.

([4, Definition 2]) For a matrix and an integer such that and , assume that there exists a permutation matrix such that

 AP=QR=Q(R11R120(J−K)×KR22) (2.1)

holds, where is columnwise orthogonal and is upper triangular. The above factorization is called a RRQR factorization if it satisfies

 σK(A)p1(K,J)≤σmin(R11)≤σK(A),σK+1(A)≤∥R22∥2≤p2(K,J)σK+1(A),

where and are functions bounded by low degree polynomials in and .

Most researchers improved RRQR factorizations by focusing on improving the functions and in Definition 2.1. We recommend [6, 7, 20, 25, 27, 41, 49] and their references for different expressions of and (see Table 2 in ). To be specific, the following theorem is adapted from [7, 25, 27].

###### Theorem 2.1.

For a matrix and an integer such that and , there exists a permutation matrix such that (2.1) holds, where and are bounded by

 σK(A)≥σmin(R11)≥1/√K(J−K)+1σK(A),σK+1(A)≤∥R22∥2≤√K(J−K)+1σK+1(A).

Based on Theorem 2.1, we have the following definition.

###### Definition 2.2.

(RRQR Approximation denoted ) Given a RRQR factorization of a matrix with and an integer , as in (2.1), such that , where is a permutation matrix, the RRQR rank approximation is defined by taking columns from and rows from such that

 RRQRK(AP)=Q(:,1:K)(R11R12), (2.2)

where , , and are defined in (2.1).

###### Lemma 2.1.

(RRQR Approximation Error) The error of the approximation of is

 ∥AP−RRQRK(AP)∥2≤√K(J−K)+1σK+1(A).
###### Proof.

The proof follows from directly from (2.1) and (2.2). ∎

### 2.3 Rank revealing LU (RRLU)

The following theorem is adopted from [40, Theorem 1.2]:

###### Theorem 2.2.

Let with . Given an integer , the following factorization

 PAQ=(L110R×(I−R)L21II−R)(U11U120(J−R)×RU22) (2.3)

holds, where is a unit lower triangular matrix, is upper triangular, and are orthogonal permutation matrices. Let , then

 σR(A)≥σmin(L11U11)≥σR(A)R(J−R)+1,σR+1(A)≤∥U22∥2≤(R(J−R)+1)σR+1(A).

This is called RRLU decomposition. Based on Theorem 2.2, we have the following definition.

###### Definition 2.3.

(RRLU Approximation denoted [47, Definition 3.1]) Given a RRLU decomposition (Theorem 2.2) of a matrix with and an integer , as in (2.3), such that . The RRLU rank approximation is defined by taking columns from and rows from such that

where , , , , and are defined in Theorem 2.2.

###### Lemma 2.2.

(RRLU Approximation Error [47, Lemma 3.2]) The error of the approximation of is

 ∥PAQ−RRLUR(PAQ)∥2≤(R(J−R)+1)σR+1(A).

### 2.4 Singular values of random matrices

We first introduce the definition of the sub-Gaussian random variable. Sub-Gaussian variables are an important class of random variables that have strong tail decay properties.

###### Definition 2.4.

([47, Definition 3.2]) A real valued random variable is called sub-Gaussian if there exist such that for all we have . A random variable is centered if .

We review several results adapted from [32, 42] about random matrices whose entries are sub-Gaussian. We focus on the case where is an matrix with . Similar results can be found in  for the square and almost square matrices.

###### Definition 2.5.

Assume that , and . The set consists of all random matrices whose entries are the centered independent identically distributed real valued random variables satisfying the following conditions: (a)moments: ; (b) norm: ; (c)variance: .

It is shown in  that if is sub-Gaussian, then . For a Gaussian matrix with zero mean and unit variance, we have . Theorems 2.3 and 2.4 are taken from Section 2 in .

###### Theorem 2.3.

() Suppose that is sub-Gaussian with , and . Then

 P(∥A∥2>a1√J)≤e−a2J

where .

Theorem 2.3 provides an upper bound for the largest singular value that depends on the desired probability. Theorem 2.4 is used to bound from the upper below the smallest singular value of a random sub-Gaussian matrices.

###### Theorem 2.4.

() Let , and . Suppose that with . Then, there exist positive constants and such that

 P(σI(A)≤c1√J)≤e−J+e−c′′J/(2μ6)+e−a2J≤e−c2J.
###### Remark 2.1.

For Theorem 2.4, the exact values of constants , and are discussed in .

## 3 HOSVD and HOOI

A Tucker decomposition  of a tensor is defined as

 A≈G×1U(1)×2U(2)×3U(3), (3.1)

where are called the mode- factor matrices and is called the core tensor of the decomposition with the set .

The Tucker decomposition is closely related to the mode- unfolding matrix with . In particular, the relation (3.1) implies

 A(1)≈U(1)G(1)(U(2)⊗U(3))⊤; A(2)≈U(2)G(2)(U(1)⊗U(3))⊤; A(3)≈U(3)G(3)(U(1)⊗U(2))⊤.

It follows that the rank of is less than or equal to , as the mode- factor at most has rank . This motivates us to define the multilinear rank of as the tuple , where the rank of is equal to .

By applying the singular value decomposition (SVD) to with , we obtain a special form of the Tucker decomposition of a given tensor, which is referred to as the higher-order singular value decomposition (HOSVD) .

When for one or more , the decomposition is called the truncated HOSVD. Note that the truncated HOSVD is not optimal in terms of giving the best fitting as measured by the Frobenius norm of the difference, but it is used to initialize iterative algorithms to compute the best approximation of a specified multilinear rank [13, 18, 28, 46]. With respect to the Frobenius norm of tensors, the low multilinear rank approximation of can be rewritten as the optimization problem

 minG,U(1),U(2),U(3)∥∥A−G×1U(1)×2U(2)×3U(3)∥∥2F,subject toG∈RR1×R2×R3,U(n)∈RIn×Rn is % columnwise orthogonal.

If is a solution of the above maximization problem, then we call as a low multilinear rank approximation of , where .

## 4 The proposed algorithm and its analysis

In this section, we present our randomized algorithm for the low multilinear rank approximations of tensors, summarized in Algorithm 4.1. We also give a slight modification of Algorithm 4.1 to reduce the computational complexity of Algorithm 4.1.

### 4.1 Framework for the algorithm

For each , Algorithm 4.1 begins by projecting the mode- unfolding of the input tensor on the Kronecker product of random matrices. The result matrix captures most of the range of the mode- unfolding of the tensor. Then we compute a basis for this matrix by Lemma 5.3, RRQR or RRLU, respectively. Finally, we project the input tensor on it.

###### Remark 4.1.

Note that for the cases of “FactType”=“SVD” and “FactType”=“RRQR”, , where all the matrices are obtained from Algorithm 4.1.

In Algorithm 4.1, we use the computer science interpretation of to refer to the class of functions whose growth is bounded and below up to a constant.

Suppose that all the matrices are derived from Algorithm 4.1, then we have

 A−A×1(Q1Q†1)×2(Q2Q†2)×3(Q3Q†3)=A−A×1(Q1Q†1)+A×1(Q1Q†1)−A×1(Q1Q†1)×2(Q2Q†2)+A×1(Q1Q†1)×2(Q2Q†2)−A×1(Q1Q†1)×2(Q2Q†2)×3(Q3U†3). (4.1)

According to (4.1), we have

 ∥∥A−A×1(Q1Q†1)×2(Q2Q†2)×3(Q3Q†3)∥∥2F≤3∑n=1∥∥A−A×n(QnQ†n)∥∥2F. (4.2)

The result relies on the orthogonality of the projector in the Frobenius norm , i.e., for any ,

 ∥A∥2F=∥∥A×n(UnU⊤n)∥∥2F+∥∥A×n(IIn−UnU⊤n)∥∥2F,

and the fact that with , where the orthogonal projection satisfies 

 P2=P,P⊤=P,P∈RJ×J.

Hence, when obtaining the error bound of , we present an error bound for Algorithm 4.1, summarized in the following theorem.

###### Theorem 4.1.

Suppose that , and . Let , and be integers such that . Let , and be integers such that . Let , and be integers such that . For each , we define , , , and as in Theorems 2.3 and 2.4 with .

For a given tensor , three full column rank matrices are obtained by Algorithm 4.1. Then

 ∥∥A−A×1(Q1Q†1)×2(Q2Q†2)×3(Q3Q†3)∥∥F≤23∑n=1CnΔμn+1(A(n)) (4.3)

with probability at least

 1−(e−c′1L2L3+e−c′2L1L3+e−c′3L1L2+e−a′1I2I3+e−a′2I1I3+e−a′3I1I2),

where for “FactType”=“SVD”, , and are given by

 C1= ⎷a21I2I3c212L2L3+1+ ⎷a21I2I3c21L2L3,C2= ⎷a22I1I3c22L1L3+1+ ⎷a22I1I3c22L1L3,C3= ⎷a23I1I2c23L1L2+1+ ⎷a23I1I2c23L1L2;

for “FactType”=“RRQR”, , and are given by

 C1= ⎷a21I2I3c21L2L3+1+√I1(μ1(I1−μ1)+1) ⎷a21I2I3c21L2L3,C2= ⎷a22I1I3c22L1L3+1+√I2(μ2(I2−μ2)+1) ⎷a22I1I3c22L1L3,C3= ⎷a23I1I2c23L1L2+1+√I3(μ3(I3−μ3)+1) ⎷a23I1I2c23L1L2;

and for “FactType”=“RRLU”, , and are given by

 C1= ⎷a21I2I3c21L2L3+1+√I1(μ1(I1−μ1)+1) ⎷a21I2I3c21L2L3,C2= ⎷a22I1I3c22L1L3+1+√I2(μ2(I2−μ2)+1) ⎷a22I1I3c22L1L3,C3= ⎷a23I1I2c23L1L2+1+√I3(μ3(I3−μ3)+1) ⎷a23I1I2c23L1L2.
###### Remark 4.2.

Note that for the case of “FactType”=“RRLU”, (4.3) can be rewritten as

 ∥∥˜A−˜A×1(Q1Q†1)×2(Q2Q†2)×3(Q3Q†3)∥∥F≤23∑n=1CnΔμn+1(A(n)),

with , where all the matrices are derived from Step 11 in Algorithm 4.1.

Suppose that is the mode-1 unfolding of . Let be the singular value decomposition of , where and are orthogonal and is diagonal with positive diagonal elements. If , where are orthogonal with , then we have

 B(1)=(Q1U)Σ(V(Q3⊗Q2))⊤,

where is the mode-1 unfolding of . It implies that the singular values of are the same as that of . Similarly, the singular values of the mode- unfolding of are the same as that of the mode- unfolding of with . Thus, the upper bound in Theorem 4.1 is orthogonal invariant.

For the case of , we set in Algorithm 4.1 and