# Nonnegative Low Rank Tensor Approximation and its Application to Multi-dimensional Images

The main aim of this paper is to develop a new algorithm for computing Nonnegative Low Rank Tensor (NLRT) approximation for nonnegative tensors that arise in many multi-dimensional imaging applications. Nonnegativity is one of the important property as each pixel value refer to nonzero light intensity in image data acquisition. Our approach is different from classical nonnegative tensor factorization (NTF) which has been studied for many years. For a given nonnegative tensor, the classical NTF approach is to determine nonnegative low rank tensor by computing factor matrices or tensors (for example, CPD finds factor matrices while Tucker decomposition finds core tensor and factor matrices), such that the distance between this nonnegative low rank tensor and given tensor is as small as possible. The proposed NLRT approach is different from the classical NTF. It determines a nonnegative low rank tensor without using decompositions or factorization methods. The minimized distance by the proposed NLRT method can be smaller than that by the NTF method, and it implies that the proposed NLRT method can obtain a better low rank tensor approximation. The proposed NLRT approximation algorithm is derived by using the alternating averaged projection on the product of low rank matrix manifolds and non-negativity property. We show the convergence of the alternating projection algorithm. Experimental results for synthetic data and multi-dimensional images are presented to demonstrate the performance of the proposed NLRT method is better than that of existing NTF methods.

## Authors

• 13 publications
• 32 publications
• 5 publications
• 5 publications
09/02/2020

### Tangent Space Based Alternating Projections for Nonnegative Low Rank Matrix Approximation

In this paper, we develop a new alternating projection method to compute...
09/17/2019

### Nonnegative Canonical Polyadic Decomposition with Rank Deficient Factors

Recently, there is an emerging interest for applications of tensor facto...
01/26/2022

### Sketching for low-rank nonnegative matrix approximation: a numerical study

We propose new approximate alternating projection methods, based on rand...
03/07/2020

### Efficient Nonnegative Tensor Factorization via Saturating Coordinate Descent

With the advancements in computing technology and web-based applications...
10/06/2020

### Improving Nonparametric Density Estimation with Tensor Decompositions

While nonparametric density estimators often perform well on low dimensi...
09/06/2021

### Fast Hypergraph Regularized Nonnegative Tensor Ring Factorization Based on Low-Rank Approximation

For the high dimensional data representation, nonnegative tensor ring (N...
03/19/2018

### Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration

Low-rank signal modeling has been widely leveraged to capture non-local ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Nonnegative data matrices appear in many data analysis applications. For instance, in image analysis, image pixel values are nonnegative and the associated nonnegative image data matrices can be formed for clustering and recognition [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. In text mining, the frequencies of terms in documents are nonnegative and the resulted nonnegative term-to-document data matrices can be constructed for clustering [13, 14, 15, 16]. In bioinformatics, nonnegative gene expression values are studied and nonnegative gene expression data matrices are generated for diseases and genes classification [17, 18, 19, 20, 21]. Low rank matrix approximation for nonnegative matrices plays a key role in all these applications. Its main purpose is to identify a latent feature space for objects representation. The classification, clustering or recognition analysis can be done by using these latent features.

Nonnegative Matrix Factorization (NMF) has emerged in 1994 by Paatero and Tapper [22] for performing environmental data analysis. The purpose of NMF is to decompose an input -by- nonnegative matrix into -by- nonnegative matrix and -by- nonnegative matrix : , and more precisely

 minB,C≥0 ∥A−BC∥2F, (1)

where means that each entry of and is nonnegative, is the Frobenius norm of a matrix, and (the low rank value) is smaller than and . Several researchers have proposed and developed algorithms for determining such nonnegative matrix factorization in the literature [1, 8, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]. Lee and Seung [1] proposed and developed NMF algorithms, and demonstrated that NMF has part-based representation which can be used for intuitive perception interpretation. For the development of NMF, we refer to the recently edited book [33].

Nowadays, data that comes from many fields are more naturally represented as multi-dimensional data which refers to as tensor, for example, video data, hyperspectral data, fMRI data and so on. For more details and applications, we refer to the review paper [34]. To handle tensor data, there are two well-known and widely used decompositions, Canonical Polyadic decomposition (CPD) and Tucker decomposition. Because of the interpretability of NMF, many nonnegative tensor decompositions or models are proposed and developed. Most models are focused on CPD and Tucker decomposition with the nonnegative constraints.

For the CPD with nonnegative constraints, which is known as Nonnegative Tensor Factorization (NTF), early in 2001, Max Welling et.al, introduced a fixed point algorithm in [35]. Cichocki et al. [26, 36] developed a class of optimized local algorithms known as Hierarchical ALS (HALS), also extended the approach to other cost functions using the Alpha and Beta divergences for NTF computation. In [37], Park et al. proposed an NTF algorithm based on the alternating nonnegative constrained least squares method. In [38], the block pricipal pivoting method is further introduced to solve nonnegative constrained least squares (NNLS) to overcome some difficulties for the NNLS problems with a large number of variables. Xu et. al [39] studied regularized block multiconvex optimization and proposed a block coordinate descent method which can be applied to NTF problem.

In [40], Kim and Choi studied the Tucker decomposition with nonnegative constraints. Multiplicative updating algorithms for NMF are extended to solve the nonnegative Tucker decomposition. Zhou et al. [41] transformed this problem into a series of NMF problem as well, and used MU and HALS algorithms on the unfolding matrices for Tucker decomposition calculation. Some other constraints like orthogonality on the factor matrices are also considered and studied by some researchers [42, 43]. For instance, in [43], Pan et al. proposed orthogonal nonnegative Tucker decomposition and apply the alternating direction method of multipliers (ADMM), aims to get clustering informations from the factor matrices and their joint connection weight from the core tensor. Besides these two standard and well-known decompositions, some other models are designed for the specific applications. For instance, for blind unmixing of hyperspectral problem, Qian et al. [44] proposed a model which is derived from block term decomposition (BTD) and applied alternating least square method to solve it.

### 1.1 The Contribution

The main aim of this paper is to propose and study Nonnegative Low Rank Tensor Approximation (NLRT) for applications of multi-dimensional images. Our approach is completely different from classical NTF which has been studied for many years. For a given nonnegative tensor, the classical NTF approach is to determine nonnegative low rank tensor by computing factor matrices or tensors (for example, CPD finds factor matrices while Tucker decomposition finds core tensor and factor matrices), such that the distance between this nonnegative low rank tensor and given tensor is as small as possible. The proposed NLRT approach is different with classical NTF. It determines a nonnegative low rank tensor without using decompositions or factorization methods. The minimized distance by the proposed NLRT method can be smaller than that by the NTF method, and it implies that the proposed NLRT method can obtain a better low rank tensor approximation. The proposed NLRT approximation algorithm is derived by using the alternating projection on the product of low rank matrix manifolds and the non-negativity property. We show the convergence of the alternating projection by constructing two projections which combine a projection of low rank matrix manifolds and the nonnegative projection, and a projection of taking average of tensors. Experimental results for synthetic data and multi-dimensional images are presented to demonstrate the above mentioned advantages of the proposed NLRT method compared the NTF methods.

The paper is organized as follows. In Section 2, we review tensor operations and decompositions, and present our algorithm. In Section 3, we show the convergence of the proposed alternating projections. In Section 4, numerical results are presented to demonstrate the proposed algorithm. Finally, some concluding remarks and future research work are given in Section 4.

## 2 Nonnegative Low Rank Tensor Approximation

### 2.1 Tensor Operations and Decompositions

An -dimensional tensor is a multi-dimensional array, i.e., . We denote its -th entry of as . The inner product of two same-sized tensors and is defined as

 ⟨A,B⟩:=∑i1,i2,⋯,imAi1i2⋯im⋅Bi1i2⋯im.

The Frobenius norm of an -dimensional tensor is then defined as

 ∥A∥F:=√⟨A,A⟩=(∑i1,i2,⋯,imA2i1i2⋯im)12.

The -th unfolding of is defined as . In the following discussion, we will introduce two well-known tensor decompositions.

CP decomposition (CPD) [45, 46, 47]: Given a tensor , the CANDECOMP/PARAFAC decomposition (CPD) is defined as follows:

 A=Z∑z=1λzaz,1⊗az,2⊗⋯az,m. (2)

Note that

 Ai1,i2,⋯,im=Z∑z=1λzaz,1i1az,2i2⋯az,mim, (3)

where

denotes the outer product of vectors,

, . The minimal value of such that CP decomposition exists is called the CP rank of .

Tucker decomposition [48, 49, 47]: Given a tensor , the Tucker decomposition is defined as follows:

 A=G×1U(1)×2U(2)×3⋯×mU(m), (4)

i.e.,

 Ai1,⋯,im=∑j1,⋯,jmGj1,⋯,jmU(1)i1,j1⋯U(m)im,jm, (5)

where , is a -by- matrix where its columns are mutually orthogonal, denotes the -mode matrix product of a tensor defined by

 (G×kU(k))j1⋯jk−1ikjk+1⋯jm=Jk∑jk=1Gj1⋯jk−1jkjk+1⋯jdU(k)ik,jk.

The Tucker (or multilinear) rank of is defined by .

When the nonnegativity constraints are imposed on the factor matrices generated by CPD or Tucker decompositions, the models given in (2) and (4) refer to be nonnegative tensor factorization and nonegative Tucker decomposition respectively. Nonnegative tensor factorization [26, 36, 38, 39] and nonnegative Tucker decomposition [40, 41, 42, 43, 44] models and algorithms were developed to deal with nonnegative tensor applications.

### 2.2 The Proposed Model

In [50, 51], Song and Ng proposed a new algorithm for computing nonnegative low rank matrix approximation for nonnegative matrices. Instead of using matrix factorization in (1), their method is to determine a rank nonnegative matrix such that the distance between such matrix and the given nonnegative matrix is as small as possible:

 minrank(X)=r,X≥0 ∥A−X∥2F. (6)

They have shown the convergence of the proposed algorithm based on the projection onto a manifold of rank matrices and the projection onto nonnegative matrices.

The main purpose of this paper is to develop nonnegative low rank tensor approximation for nonnegative tensors. The idea is to find a nonnegative low rank tensor such that the distance between and is as small as possible. Mathematically, it can be formulated as the following optimization problem

 minrank(Xk)=rk(k=1,...,m),X≥0m∑k=1∥Ak−Xk∥2F, (7)

where and are the -th mode of unfolding matrix of and , respectively. The sizes of and are -by- with

. According to the singular value decomposition of

:

 Xk=PkΣkQtk,

where is -by- with orthonormal columns, is a diagonal matrix of size -by-, and is -by- with orthonormal columns ( is the transpose of ), a nonnegative low rank tensor can be expressed as follows:

 X=S×1P1×2P2×3⋯×mPm.

Here :

 S=X×1Pt1×2Pt2×3⋯×mPtm

and thus has a multilinear rank .

### 2.3 The Proposed Algorithm

We note from (7) that there are manifolds of low rank matrices, and the constraints of nonnegativity to be considered in the optimization. In this subsection, we propose to solve (7) by using the averaged projection method which is a replacement of the alternating projections on many manifolds and the constraints of nonnegativity. Let

 Mk={W∈Rn1×⋯×nm | rank(Wk)=rk, k=1,...,m} (8)

denote the set of tensors which their -mode unfolding matrices has fixed rank , and

 M={W∈Rn1×⋯×nm | Wi1i2⋯im≥0} (9)

denote the set of nonnegative tensors.

Denote “fold” as the operator that fold a matrix into a tensor along the -mode. Then by the Eckart-Young-Mirsky theorem [52], the projections that project an given tensor onto , can be expressed as follows:

 πk(W)=foldk(ri∑i=1σi(Wk)ui(Wk)vi(Wk)T),k=1,...,m, (10)

where is the -mode unfolding matrix of , is the -th singular values of , and their corresponding left and right singular vectors: and . The projection that projects an given tensor onto the nonnegative tensor manifold can be expressed as follows:

 π(W)={Wi1i2⋯im,  if  Wi1i2⋯im≥0,0,  if  Wi1i2⋯im<0. (11)

In our proposed algorithm, we consider the following two sets of tensors in the product space ( times):

 Ω1={W=(W1,W2,⋯,Wm):W1=W2=⋯=Wm∈M} (12)

and

 Ω2=M1×M2⋯×Mm={W=(W1,W2,⋯,Wm):W1∈M1,W2∈M2,⋯,Wm∈Mm}. (13)

Note that is a convex set and an affine manifold, thus is also. As are manifolds (Example 2 in [53]), is a product of manifold. Now we define the two projections onto and in the alternating projections algorithm. The first projection is given by

 πΩ1(W1,⋯,Wm) (14) =

where is defined in (11). The second projection is given by

 (15)

where () are defined in (10). The proposed algorithm is summarized in Algorithm 1. With input nonnegatve tensor , the algorithm computes the projections and alternatively until it is convergent. Note that the dominant overall computational cost of Algorithm 1 can be expressed as the SVDs of unfolding matrices with sizes by , respectively, which leads to a total of flops.

## 3 The Convergence Analysis

The framework of this algorithm is the same as the convex case for finding a point in the intersection of several closed sets, while the projection sets here are two product manifolds. In [53], Lewis and Malick proved that a sequence of alternating projections converges locally linearly if the two projected sets are -manifolds intersecting transversally. Lewis et al. [54] proved local linear convergence when two projected sets intersecting nontangentially in the sense of linear regularity, and one of the sets is super regular. Later Bauschke et al. [55, 56] investigated the case of nontangential intersection further and proved linear convergence under weaker regularity and transversality hypotheses. In [57], Noll and Rondepierre generalized the existing results by studying the intersection condition of the two projected sets. They esatablished local convergence of alternating projections between subanalytic sets under a mild regularity hypothesis on one of the sets. Here we analyze the convergence of the alternating projections algorithm by using the results in [57].

We remark that the sets and given in (12) and (13) respectively are two smooth manifolds which are not closed. The convergence cannot be derived directly by applying the convergence results of alternating projections between two closed subanalytic sets. By using the results in variational analysis and differential geometry, the main convergence results are shown in the following theorem.

###### Theorem 1.

Let and be the manifolds given in (8) and (9) respectively. Let . Then there exists a neighborhood of such that whenever a sequence derived by Algorithm 1 enters , then it converges to some with rate for some .

In order to show Theorem 1, it is necessary to study Hlder regularity and separable intersection. For detailed discussion, we refer to Noll and Rondepierre [57].

###### Definition 1.

[57] Let and be two sets of points in a Hibert space equipped with the inner product and the norm . Let . The set is -Hlder regular with respect to at if there exists a neighborhood of and a constant such that for every and every , one has

 Ball(y+,(1+c)r)∩{x | y+=pA(x),⟨y+−x+,x−x+⟩>√crσ+1∥x−x+∥}∩B=∅,

where . Note that is the projection of onto and is the projection of onto , with respect to the norm. We say that is Hlder regular with respect to if it is -Hlder regular with respect to for every .

Hlder regularity is mild compared with some other regularity concepts such as the prox-regularity [58], Clarke regularity [59] and super-regularity [60].

###### Definition 2.

[57] Let and be two sets of points in a Hibert space equipped with the inner product and the norm . We say intersects separably at with exponent and constant if there exist a neighborhood of such that for every building block in , the condition

 ⟨z−y+,z+−y+⟩≤(1−γ∥z+−y+∥ω)∥y−z+∥∥z+−y+∥ (16)

holds, i.e., it is equivalent to

 1−cosα∥y+−z+∥ω≥γ,

where is a projection point of onto , is a projection point of onto , and is the angle between and .

This separable intersection definition is a new geometric concept which generalized the transversal intersection [53], the linear regular intersection [54], and the intrinsic transversality intersection [61]. It has been shown that the definitions of these three kinds of intersections imply in the separable intersection.

Proof of Theorem 1. It is clear that finding a point in is equivalent to finding a point in the intersection of and .

The first task is to show that intersects separably at with exponent . Define as

 f(X)=δΩ1(X)+12d2Ω2(X),X=(X1,X2,...,Xm)∈Ω1, (17)

with

 δΩ1(X)={0 if X∈Ω1,+∞ otherwise

and

 dΩ2(X)=min{∥(X−W∥F:W∈Ω2}.

It follows the deﬁnition of that and is a critical point of .

Recall that and are two manifolds. Then is locally Lipschitz continuous, i.e., for each , there is an such that is Lipschitz continuous on the open ball of center with radius . Assume that is a local smooth chart of around with bounded . Therefore, is bounded by the fact that is local Lipschitz continuous. According to the definition of semi-algebraic function [62], we can deduce that is also semi-algebraic. Then the Kurdyka-Łojasiweicz inequality [63] for holds for . It implies that there exist and a concave function such that

1. ;

2. is ;

3. on ;

4. for all with , we have

 τ′(f∘ψ−1(W)−f∘ψ−1(¯W)) dist(0,∂(f∘ψ−1)(W)≥1.

Moreover, is analytic on , thus is continuous on , where is the differential operator. For every compact subset in , there exists , where denotes the operator norm. Suppose that is an open set containing in such that is compact ( denotes the closure of and denotes the of ). Then, for every with , we have

 CKτ′(f(X)−f(X∗)) dist(0,^∂(f(X))≥1, (18)

where is the Fréchet subdifferential of . We see that the Kurdyka-Łojasiweicz inequality is satisfied for given in (17).

Here we construct a function which satisfies (i)-(iv). Because , (18) becomes

 CKτ′(f(X)) dist(0,^∂(f(X))≥1.

Since , there always exists a neighborhood of such that , i.e.,

 |f(X)|−θ∥g∥F≥c,with c=1CK(1−θ), (19)

for all and every .

In Algorithm 1, we construct the following sequences according to Definition 2:

 Z→Y+→Z+.

Here is the projection and is the projection . Suppose and are in , , we get the proximal normal cone to at :

 NpΩ1(Y+)={λV:λ≥0,Y+∈πΩ1(Y++V)}.

According to the definition of Fréchet subdifferential, if and only if for every of the form .

Note that , from (17), we have . Substitute into (19) gives

 2θdΩ2(Y+)−2θ∥λ(Z−Y+)+(Y+−Z+)∥F≥c>0,

for every . It follows that

 dΩ2(Y+)−2θminλ≥0∥λ(Z−Y+)+(Y+−Z+)∥F≥2−θc. (20)

Let the angle be the angle between the iterations, which can be defined as the angle between and .

Let us consider two cases.

(i) When ,

 minλ≥0∥λ(Z−Y+)+(Y+−Z+)∥F=∥Y+−Z+∥Fsinα,

Substitute it into (20), then

 sinαdΩ2(Y+)2θ−1≥2−θc.

Note that , we have

 1−cosαdΩ2(Y+)4θ−2≥2−2θ−1c2. (21)

when the numerator tends to , the denominator has to go to zero, which implies that , i.e., . Therefore, we get intersects separably with exponent , the corresponding constant can be set to be .

(ii) When , which means , i.e., . The infimum in (20) is attained at . (20) becomes . Therefore,

 dΩ2(Y+)2−4θ≥2−2θc2>2−2θ−1c2.

(21) is also satisfied. According to Definition 2, intersects separably.

On the other hand, intersects separably can be proved by using the similar argument.

It follows from Theorem 1 and Corollary 4 in [57] that there exists a neighborhood of such that every sequence of alternating projections that enters converges to . The convergence rate is and with . The result follows.

In the next section, we test our method and nonnegative tensor decomposition methods on the synthetic data and real-world data, and show the performance of the proposed alternating projections method is better than the other methods.

## 4 Experimental Results

The state-of-the-art methods for nonnegative tensor decompositions in Section 2.1 are used as follows.

• Nonnegative CP decomposition (NCPD). For the CP scheme approximation, we will use the hierarchical ALS algorithm (referred to as “NCPD-HALS”) proposed in [26, 36], a fixed point (FP) algorithm (referred to as “NCPD-FP”) in [35], and a block coordinate descent (BCD) method (referred to as “NCPD-BCD”) from [39].

• Nonnegative Tucker decomposition (NTD). For the Tucker scheme approximation, we apply the HALS algorithm (referred to as “NTD-HALS”), the accelerated proximal gradient algorithm (referred to as “NTD-APG”), and the plain multiple updating algorithm (referred to as “NTD-MU”) in [41] and the block coordinate descent (BCD) method (referred to as “NTD-BCD”) from [39].

In the following, we list the computational cost of these methods. The cost of the proposed NLRT method per iteration is about the same as that of NTD-type methods. As they involve the calculation of nonnegative vectors only, the cost of NCP-type methods per iteration is smaller than that of the proposed NLRT method.

The stopping criterion of the proposed method and other comparison methods is that the relative difference between successive iterates is smaller than . All the experiments are conducted on Intel(R) Core(TM) i9-9900K CPU@3.60GHz with 32GB of RAM using Matlab.

### 4.1 Synthetic Datasets

We first test different methods on synthetic datasets. We generate two kinds of synthetic data as follows:

• Test 1 (Low-rank nonnegative tensor): We generate low rank nonnegative tensors by two steps. First, a core tensor of the size (i.e., multilinear rank is ) and factor matrices of sizes

are generated with entries uniformly distributed in

. Second, these factor matrices are multiplied to the core tensor via the tensor-matrix product and the low rank nonnegative tensors of size are normalized to

. Finally, we add the truncated Gaussian noise (i.e., set the negative noisy value to be 0) to the generated tensor with different signal-to-noise ratios (SNR). Given the multilinear rank (

), the CP rank is cannot be larger than . Therefore, we set the CP rank in the NCPD methods from three possible values: . In the results, we report the best relative approximation error in the NCPD methods.

• Test 2 (Nonnegative tensor) We randomly generate nonnegative tensors of size where their entries follow a uniform distribution in between 0 and 1. The low rank minimizer is unknown in this setting. In the CP decomposition methods, the CP rank is set to be . In the Tucker decomposition methods, the multilinear rank is set to be .

We report the relative approximation error, which is defined as

 ∥−Xgroundtruth∥F∥Xgroundtruth∥F,

to measure the approximation quality. The ground truth tensor is the generated tensor without noise. The relative approximation errors of the results by different methods in Test 1 are reported in Table 2. The reported entries of all the comparison methods in the table are the average results over ten different initial guesses in CP decomposition vectors and Tucker decomposition matrices. However, the results of the proposed NLRT method are determistic when the given nonnegative tensor is fixed. We see from Table 2 that the proposed NLRT method achieves the best performance and it is also quite robust to SNRs. On the other hand, the relative approximation errors in Test 2 are plotted in Fig. 1 with respect to different values of rank . Again the reported entries are the average results over ten different initial guesses in the comparison methods. From Fig. 1, we can see that the proposed NLRT method and NTD-BCD perform better than the other methods. For the tensors of the size , the superior of our method over NTD-BCD is obvious when the rank is in between 27 and 39.

### 4.2 Video Data

In this subsection, we select 4 videos111Videos are available at http://trace.eas.asu.edu/yuv/ and https://sites.google.com/site/jamiezeminzhang/publications. to test our method on the task of approximation. Three videos (respectively named “foreman”, “coastguard”, and “news”) are of the size (heightwidthframe) and one (named “basketball”) is of the size . Firstly, we set the multilinear rank to be and the CP rank to be . We test our method to approximate these four videos with varying from 5 to 100. Moeover, we add the Gaussian noise to the video “coastguard” with different noise levels (SNR = 20, 30 ,40, 50), and test the approximation ability for the noisy data. We plot the relative approximation errors with respect to on four videos in Fig. 2. It can be seen that the approximation errors of the results by our method are the lowest. Fig. 3 shows the relative approximation errors on the noisy video “coastguard” with respect to the . Similarly, our method achieves the lowest approximation errors.

### 4.3 Hyperspectral Data

In this subsection, we test different methods on the hyperspectral data. We consider four hyperspectral images (HSIs): a subimage of Pavia City Center dataset2 of the size (heightwidthband), a subimage of Washington DC Mall dataset333Data available at https://engineering.purdue.edu/b̃iehl/MultiSpec/hyperspectral.html. of size , the RemoteImage444Data available at https://www.cs.rochester.edu/ jliu/code/TensorCompletion.zip. of the size , and a subimage of Curprite dataset555Data available at https://aviris.jpl.nasa.gov/data/free_data.html. of the size .

Figs. 4 and 5 report the relative approximation errors with respect to different values of rank . It is evidently that the relative approximation errors by our NLRT are the lowest among all the methods. It is interesting to note that the difference between our method and NTD-BCD (the second best comparison method) is more significant than that on the synthetic data. In Fig. 6, we display the pseudo-color images of the results on the Washington DC Mall dataset with multilinear rank (100,100,100) and CP-rank = 100. The pseudo-color image is composed of the 145th, 127-th, and 92-th bands as the red, green, and blue channel, respectively. In Fig. 6, we also report two image quality assessments: the mean value of the peak signal to noise ratio (MPSNR) and the mean value of the structural similarity index of all the spectral bands (MSSIM) [64]. It can be found in Fig. 6 that both visual and quality assessments of the proposed NLRT method are better than those by the other comparison methods.

### 4.4 Selection of Features

One advantage of the proposed NLRT method is that it can provide a significant index based on singular values of unfolding matrices [50] that can be used to identify important singular basis vectors in the approximation. Here we take the HSI Washington DC Mall as an example. We compute the low-rank approximations of the proposed NLRT method and the other comparison methods with multlinear rank and CP rank for . For the approximation results by NCPD methods, we normalize the base vectors in 2 such that the norms of , and are equal to 1, and rearrange the resulting values in the descending order in the CP decomposition. In Fig. 7, we plot with respect to , where . Similarly, for the results of NTD methods, we also plot with respect to , where , and indicates a vector composed of the indexes corresponding to the largest norms of ’s columns. For the results by our methods, we plot with respect to , where , is the -th singular values of , and is the third-mode unfolding matrix of . The third-mode of is chosen in NTD and our NLRT, we are interested to observe how many indices required in the spectral model of given hyperspectral data.

In Fig. 7, we can see that when the number of components (namely ) increases, the relative residual decreases. Our NLRT could provide a significant index based on singular values to identify important singular basis vectors for the approximation. Thus, the relative residuals by the proposed NLRT algorithm are significantly smaller than those by the testing NTD and NCPD algorithms. Similar phenomena can be found in Fig. 8, in which and are computed using the number of indices in the first or second modes of .

### 4.5 Image Classification

The advantage of the proposed NLRT method is that the important singular basis vectors can be identified. Such basis vectors can provide useful information for image recognition such as classification. Here we conduct hyperspectral image classification experiments on the Indian Pines dataset777Data available at https://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html.. This data set was captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines test site in North-western Indiana in June 1992. After removing 20 bands, which cover the region of water absorption, this HSI is of the size . The ground truth contains 16 land cover classes as shown in Fig. 9. Therefore, we set the multilinear rank to be and the CP rank to be 16 for all the testing methods. We randomly choose of the available labelled samples, which are exhibited in Table 3. Labelled samples from each class are used for training and the remaining samples are used for testing.

After obtaining low rank approximations, 16 singular vectors corresponding to the largest 16 singular values of the unfolding matrix of the tensor approximation along the spectral mode (the third mode) are employed for classification. We apply the -nearest neighbour (-NN, =