Nonnegative Low Rank Tensor Approximation and its Application to Multi-dimensional Images

07/28/2020
by   Tai-Xiang Jiang, et al.
NetEase, Inc
0

The main aim of this paper is to develop a new algorithm for computing Nonnegative Low Rank Tensor (NLRT) approximation for nonnegative tensors that arise in many multi-dimensional imaging applications. Nonnegativity is one of the important property as each pixel value refer to nonzero light intensity in image data acquisition. Our approach is different from classical nonnegative tensor factorization (NTF) which has been studied for many years. For a given nonnegative tensor, the classical NTF approach is to determine nonnegative low rank tensor by computing factor matrices or tensors (for example, CPD finds factor matrices while Tucker decomposition finds core tensor and factor matrices), such that the distance between this nonnegative low rank tensor and given tensor is as small as possible. The proposed NLRT approach is different from the classical NTF. It determines a nonnegative low rank tensor without using decompositions or factorization methods. The minimized distance by the proposed NLRT method can be smaller than that by the NTF method, and it implies that the proposed NLRT method can obtain a better low rank tensor approximation. The proposed NLRT approximation algorithm is derived by using the alternating averaged projection on the product of low rank matrix manifolds and non-negativity property. We show the convergence of the alternating projection algorithm. Experimental results for synthetic data and multi-dimensional images are presented to demonstrate the performance of the proposed NLRT method is better than that of existing NTF methods.

READ FULL TEXT VIEW PDF

Authors

page 15

page 18

09/02/2020

Tangent Space Based Alternating Projections for Nonnegative Low Rank Matrix Approximation

In this paper, we develop a new alternating projection method to compute...
09/17/2019

Nonnegative Canonical Polyadic Decomposition with Rank Deficient Factors

Recently, there is an emerging interest for applications of tensor facto...
01/26/2022

Sketching for low-rank nonnegative matrix approximation: a numerical study

We propose new approximate alternating projection methods, based on rand...
03/07/2020

Efficient Nonnegative Tensor Factorization via Saturating Coordinate Descent

With the advancements in computing technology and web-based applications...
10/06/2020

Improving Nonparametric Density Estimation with Tensor Decompositions

While nonparametric density estimators often perform well on low dimensi...
09/06/2021

Fast Hypergraph Regularized Nonnegative Tensor Ring Factorization Based on Low-Rank Approximation

For the high dimensional data representation, nonnegative tensor ring (N...
03/19/2018

Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration

Low-rank signal modeling has been widely leveraged to capture non-local ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nonnegative data matrices appear in many data analysis applications. For instance, in image analysis, image pixel values are nonnegative and the associated nonnegative image data matrices can be formed for clustering and recognition [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. In text mining, the frequencies of terms in documents are nonnegative and the resulted nonnegative term-to-document data matrices can be constructed for clustering [13, 14, 15, 16]. In bioinformatics, nonnegative gene expression values are studied and nonnegative gene expression data matrices are generated for diseases and genes classification [17, 18, 19, 20, 21]. Low rank matrix approximation for nonnegative matrices plays a key role in all these applications. Its main purpose is to identify a latent feature space for objects representation. The classification, clustering or recognition analysis can be done by using these latent features.

Nonnegative Matrix Factorization (NMF) has emerged in 1994 by Paatero and Tapper [22] for performing environmental data analysis. The purpose of NMF is to decompose an input -by- nonnegative matrix into -by- nonnegative matrix and -by- nonnegative matrix : , and more precisely

(1)

where means that each entry of and is nonnegative, is the Frobenius norm of a matrix, and (the low rank value) is smaller than and . Several researchers have proposed and developed algorithms for determining such nonnegative matrix factorization in the literature [1, 8, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]. Lee and Seung [1] proposed and developed NMF algorithms, and demonstrated that NMF has part-based representation which can be used for intuitive perception interpretation. For the development of NMF, we refer to the recently edited book [33].

Nowadays, data that comes from many fields are more naturally represented as multi-dimensional data which refers to as tensor, for example, video data, hyperspectral data, fMRI data and so on. For more details and applications, we refer to the review paper [34]. To handle tensor data, there are two well-known and widely used decompositions, Canonical Polyadic decomposition (CPD) and Tucker decomposition. Because of the interpretability of NMF, many nonnegative tensor decompositions or models are proposed and developed. Most models are focused on CPD and Tucker decomposition with the nonnegative constraints.

For the CPD with nonnegative constraints, which is known as Nonnegative Tensor Factorization (NTF), early in 2001, Max Welling et.al, introduced a fixed point algorithm in [35]. Cichocki et al. [26, 36] developed a class of optimized local algorithms known as Hierarchical ALS (HALS), also extended the approach to other cost functions using the Alpha and Beta divergences for NTF computation. In [37], Park et al. proposed an NTF algorithm based on the alternating nonnegative constrained least squares method. In [38], the block pricipal pivoting method is further introduced to solve nonnegative constrained least squares (NNLS) to overcome some difficulties for the NNLS problems with a large number of variables. Xu et. al [39] studied regularized block multiconvex optimization and proposed a block coordinate descent method which can be applied to NTF problem.

In [40], Kim and Choi studied the Tucker decomposition with nonnegative constraints. Multiplicative updating algorithms for NMF are extended to solve the nonnegative Tucker decomposition. Zhou et al. [41] transformed this problem into a series of NMF problem as well, and used MU and HALS algorithms on the unfolding matrices for Tucker decomposition calculation. Some other constraints like orthogonality on the factor matrices are also considered and studied by some researchers [42, 43]. For instance, in [43], Pan et al. proposed orthogonal nonnegative Tucker decomposition and apply the alternating direction method of multipliers (ADMM), aims to get clustering informations from the factor matrices and their joint connection weight from the core tensor. Besides these two standard and well-known decompositions, some other models are designed for the specific applications. For instance, for blind unmixing of hyperspectral problem, Qian et al. [44] proposed a model which is derived from block term decomposition (BTD) and applied alternating least square method to solve it.

1.1 The Contribution

The main aim of this paper is to propose and study Nonnegative Low Rank Tensor Approximation (NLRT) for applications of multi-dimensional images. Our approach is completely different from classical NTF which has been studied for many years. For a given nonnegative tensor, the classical NTF approach is to determine nonnegative low rank tensor by computing factor matrices or tensors (for example, CPD finds factor matrices while Tucker decomposition finds core tensor and factor matrices), such that the distance between this nonnegative low rank tensor and given tensor is as small as possible. The proposed NLRT approach is different with classical NTF. It determines a nonnegative low rank tensor without using decompositions or factorization methods. The minimized distance by the proposed NLRT method can be smaller than that by the NTF method, and it implies that the proposed NLRT method can obtain a better low rank tensor approximation. The proposed NLRT approximation algorithm is derived by using the alternating projection on the product of low rank matrix manifolds and the non-negativity property. We show the convergence of the alternating projection by constructing two projections which combine a projection of low rank matrix manifolds and the nonnegative projection, and a projection of taking average of tensors. Experimental results for synthetic data and multi-dimensional images are presented to demonstrate the above mentioned advantages of the proposed NLRT method compared the NTF methods.

The paper is organized as follows. In Section 2, we review tensor operations and decompositions, and present our algorithm. In Section 3, we show the convergence of the proposed alternating projections. In Section 4, numerical results are presented to demonstrate the proposed algorithm. Finally, some concluding remarks and future research work are given in Section 4.

2 Nonnegative Low Rank Tensor Approximation

2.1 Tensor Operations and Decompositions

An -dimensional tensor is a multi-dimensional array, i.e., . We denote its -th entry of as . The inner product of two same-sized tensors and is defined as

The Frobenius norm of an -dimensional tensor is then defined as

The -th unfolding of is defined as . In the following discussion, we will introduce two well-known tensor decompositions.

CP decomposition (CPD) [45, 46, 47]: Given a tensor , the CANDECOMP/PARAFAC decomposition (CPD) is defined as follows:

(2)

Note that

(3)

where

denotes the outer product of vectors,

, . The minimal value of such that CP decomposition exists is called the CP rank of .

Tucker decomposition [48, 49, 47]: Given a tensor , the Tucker decomposition is defined as follows:

(4)

i.e.,

(5)

where , is a -by- matrix where its columns are mutually orthogonal, denotes the -mode matrix product of a tensor defined by

The Tucker (or multilinear) rank of is defined by .

When the nonnegativity constraints are imposed on the factor matrices generated by CPD or Tucker decompositions, the models given in (2) and (4) refer to be nonnegative tensor factorization and nonegative Tucker decomposition respectively. Nonnegative tensor factorization [26, 36, 38, 39] and nonnegative Tucker decomposition [40, 41, 42, 43, 44] models and algorithms were developed to deal with nonnegative tensor applications.

2.2 The Proposed Model

In [50, 51], Song and Ng proposed a new algorithm for computing nonnegative low rank matrix approximation for nonnegative matrices. Instead of using matrix factorization in (1), their method is to determine a rank nonnegative matrix such that the distance between such matrix and the given nonnegative matrix is as small as possible:

(6)

They have shown the convergence of the proposed algorithm based on the projection onto a manifold of rank matrices and the projection onto nonnegative matrices.

The main purpose of this paper is to develop nonnegative low rank tensor approximation for nonnegative tensors. The idea is to find a nonnegative low rank tensor such that the distance between and is as small as possible. Mathematically, it can be formulated as the following optimization problem

(7)

where and are the -th mode of unfolding matrix of and , respectively. The sizes of and are -by- with

. According to the singular value decomposition of

:

where is -by- with orthonormal columns, is a diagonal matrix of size -by-, and is -by- with orthonormal columns ( is the transpose of ), a nonnegative low rank tensor can be expressed as follows:

Here :

and thus has a multilinear rank .

2.3 The Proposed Algorithm

We note from (7) that there are manifolds of low rank matrices, and the constraints of nonnegativity to be considered in the optimization. In this subsection, we propose to solve (7) by using the averaged projection method which is a replacement of the alternating projections on many manifolds and the constraints of nonnegativity. Let

(8)

denote the set of tensors which their -mode unfolding matrices has fixed rank , and

(9)

denote the set of nonnegative tensors.

Denote “fold” as the operator that fold a matrix into a tensor along the -mode. Then by the Eckart-Young-Mirsky theorem [52], the projections that project an given tensor onto , can be expressed as follows:

(10)

where is the -mode unfolding matrix of , is the -th singular values of , and their corresponding left and right singular vectors: and . The projection that projects an given tensor onto the nonnegative tensor manifold can be expressed as follows:

(11)

In our proposed algorithm, we consider the following two sets of tensors in the product space ( times):

(12)

and

(13)

Note that is a convex set and an affine manifold, thus is also. As are manifolds (Example 2 in [53]), is a product of manifold. Now we define the two projections onto and in the alternating projections algorithm. The first projection is given by

(14)

where is defined in (11). The second projection is given by

(15)

where () are defined in (10). The proposed algorithm is summarized in Algorithm 1. With input nonnegatve tensor , the algorithm computes the projections and alternatively until it is convergent. Note that the dominant overall computational cost of Algorithm 1 can be expressed as the SVDs of unfolding matrices with sizes by , respectively, which leads to a total of flops.

Input: Given a nonnegative tensor , this algorithm computes a Tucker rank nonnegative tensor close to with respect to (7).
  1: Initialize and
  2: for ( is the iteration number)
  3:  ;
  4:  ;
  5: end
Output:
when the stopping criterion is satisfied.

Algorithm 1 Alternating Projections Algorithm

3 The Convergence Analysis

The framework of this algorithm is the same as the convex case for finding a point in the intersection of several closed sets, while the projection sets here are two product manifolds. In [53], Lewis and Malick proved that a sequence of alternating projections converges locally linearly if the two projected sets are -manifolds intersecting transversally. Lewis et al. [54] proved local linear convergence when two projected sets intersecting nontangentially in the sense of linear regularity, and one of the sets is super regular. Later Bauschke et al. [55, 56] investigated the case of nontangential intersection further and proved linear convergence under weaker regularity and transversality hypotheses. In [57], Noll and Rondepierre generalized the existing results by studying the intersection condition of the two projected sets. They esatablished local convergence of alternating projections between subanalytic sets under a mild regularity hypothesis on one of the sets. Here we analyze the convergence of the alternating projections algorithm by using the results in [57].

We remark that the sets and given in (12) and (13) respectively are two smooth manifolds which are not closed. The convergence cannot be derived directly by applying the convergence results of alternating projections between two closed subanalytic sets. By using the results in variational analysis and differential geometry, the main convergence results are shown in the following theorem.

Theorem 1.

Let and be the manifolds given in (8) and (9) respectively. Let . Then there exists a neighborhood of such that whenever a sequence derived by Algorithm 1 enters , then it converges to some with rate for some .

In order to show Theorem 1, it is necessary to study Hlder regularity and separable intersection. For detailed discussion, we refer to Noll and Rondepierre [57].

Definition 1.

[57] Let and be two sets of points in a Hibert space equipped with the inner product and the norm . Let . The set is -Hlder regular with respect to at if there exists a neighborhood of and a constant such that for every and every , one has

where . Note that is the projection of onto and is the projection of onto , with respect to the norm. We say that is Hlder regular with respect to if it is -Hlder regular with respect to for every .

Hlder regularity is mild compared with some other regularity concepts such as the prox-regularity [58], Clarke regularity [59] and super-regularity [60].

Definition 2.

[57] Let and be two sets of points in a Hibert space equipped with the inner product and the norm . We say intersects separably at with exponent and constant if there exist a neighborhood of such that for every building block in , the condition

(16)

holds, i.e., it is equivalent to

where is a projection point of onto , is a projection point of onto , and is the angle between and .

This separable intersection definition is a new geometric concept which generalized the transversal intersection [53], the linear regular intersection [54], and the intrinsic transversality intersection [61]. It has been shown that the definitions of these three kinds of intersections imply in the separable intersection.

Proof of Theorem 1. It is clear that finding a point in is equivalent to finding a point in the intersection of and .

The first task is to show that intersects separably at with exponent . Define as

(17)

with

and

It follows the definition of that and is a critical point of .

Recall that and are two manifolds. Then is locally Lipschitz continuous, i.e., for each , there is an such that is Lipschitz continuous on the open ball of center with radius . Assume that is a local smooth chart of around with bounded . Therefore, is bounded by the fact that is local Lipschitz continuous. According to the definition of semi-algebraic function [62], we can deduce that is also semi-algebraic. Then the Kurdyka-Łojasiweicz inequality [63] for holds for . It implies that there exist and a concave function such that

  1. ;

  2. is ;

  3. on ;

  4. for all with , we have

Moreover, is analytic on , thus is continuous on , where is the differential operator. For every compact subset in , there exists , where denotes the operator norm. Suppose that is an open set containing in such that is compact ( denotes the closure of and denotes the of ). Then, for every with , we have

(18)

where is the Fréchet subdifferential of . We see that the Kurdyka-Łojasiweicz inequality is satisfied for given in (17).

Here we construct a function which satisfies (i)-(iv). Because , (18) becomes

Since , there always exists a neighborhood of such that , i.e.,

(19)

for all and every .

In Algorithm 1, we construct the following sequences according to Definition 2:

Here is the projection and is the projection . Suppose and are in , , we get the proximal normal cone to at :

According to the definition of Fréchet subdifferential, if and only if for every of the form .

Note that , from (17), we have . Substitute into (19) gives

for every . It follows that

(20)

Let the angle be the angle between the iterations, which can be defined as the angle between and .

Let us consider two cases.

(i) When ,

Substitute it into (20), then

Note that , we have

(21)

when the numerator tends to , the denominator has to go to zero, which implies that , i.e., . Therefore, we get intersects separably with exponent , the corresponding constant can be set to be .

(ii) When , which means , i.e., . The infimum in (20) is attained at . (20) becomes . Therefore,

(21) is also satisfied. According to Definition 2, intersects separably.

On the other hand, intersects separably can be proved by using the similar argument.

It follows from Theorem 1 and Corollary 4 in [57] that there exists a neighborhood of such that every sequence of alternating projections that enters converges to . The convergence rate is and with . The result follows.

In the next section, we test our method and nonnegative tensor decomposition methods on the synthetic data and real-world data, and show the performance of the proposed alternating projections method is better than the other methods.

4 Experimental Results

The state-of-the-art methods for nonnegative tensor decompositions in Section 2.1 are used as follows.

  • Nonnegative CP decomposition (NCPD). For the CP scheme approximation, we will use the hierarchical ALS algorithm (referred to as “NCPD-HALS”) proposed in [26, 36], a fixed point (FP) algorithm (referred to as “NCPD-FP”) in [35], and a block coordinate descent (BCD) method (referred to as “NCPD-BCD”) from [39].

  • Nonnegative Tucker decomposition (NTD). For the Tucker scheme approximation, we apply the HALS algorithm (referred to as “NTD-HALS”), the accelerated proximal gradient algorithm (referred to as “NTD-APG”), and the plain multiple updating algorithm (referred to as “NTD-MU”) in [41] and the block coordinate descent (BCD) method (referred to as “NTD-BCD”) from [39].

In the following, we list the computational cost of these methods. The cost of the proposed NLRT method per iteration is about the same as that of NTD-type methods. As they involve the calculation of nonnegative vectors only, the cost of NCP-type methods per iteration is smaller than that of the proposed NLRT method.

Method Complexity Details of most expensive compuations
NCPD-BCD Khatri-Rao product and unfolding matrices times Khatri-Rao product.
NCPD-HALS Khatri-Rao product and unfolding matrices times Khatri-Rao product.
NCPD-FP Khatri-Rao product and unfolding matrices times Khatri-Rao product.
NTD-BCD The tensor-matrix multiplication and the matrix multiplication between the -th unfolding matrix of and its transpose.
NTD-APG The tensor-matrix multiplications among a) the -th factor matrix b) the transpose of the -th unfolding matrix of and c) the -th unfolding matrix of .
NTD-HALS HALS on unfolding matrices .
NTD-MU MU on unfolding matrices .
NLRT SVDs of unfolding matrices .
Table 1: The computational cost.

The stopping criterion of the proposed method and other comparison methods is that the relative difference between successive iterates is smaller than . All the experiments are conducted on Intel(R) Core(TM) i9-9900K CPU@3.60GHz with 32GB of RAM using Matlab.

4.1 Synthetic Datasets

We first test different methods on synthetic datasets. We generate two kinds of synthetic data as follows:

  • Test 1 (Low-rank nonnegative tensor): We generate low rank nonnegative tensors by two steps. First, a core tensor of the size (i.e., multilinear rank is ) and factor matrices of sizes

    are generated with entries uniformly distributed in

    . Second, these factor matrices are multiplied to the core tensor via the tensor-matrix product and the low rank nonnegative tensors of size are normalized to

    . Finally, we add the truncated Gaussian noise (i.e., set the negative noisy value to be 0) to the generated tensor with different signal-to-noise ratios (SNR). Given the multilinear rank (

    ), the CP rank is cannot be larger than . Therefore, we set the CP rank in the NCPD methods from three possible values: . In the results, we report the best relative approximation error in the NCPD methods.

  • Test 2 (Nonnegative tensor) We randomly generate nonnegative tensors of size where their entries follow a uniform distribution in between 0 and 1. The low rank minimizer is unknown in this setting. In the CP decomposition methods, the CP rank is set to be . In the Tucker decomposition methods, the multilinear rank is set to be .

We report the relative approximation error, which is defined as

to measure the approximation quality. The ground truth tensor is the generated tensor without noise. The relative approximation errors of the results by different methods in Test 1 are reported in Table 2. The reported entries of all the comparison methods in the table are the average results over ten different initial guesses in CP decomposition vectors and Tucker decomposition matrices. However, the results of the proposed NLRT method are determistic when the given nonnegative tensor is fixed. We see from Table 2 that the proposed NLRT method achieves the best performance and it is also quite robust to SNRs. On the other hand, the relative approximation errors in Test 2 are plotted in Fig. 1 with respect to different values of rank . Again the reported entries are the average results over ten different initial guesses in the comparison methods. From Fig. 1, we can see that the proposed NLRT method and NTD-BCD perform better than the other methods. For the tensors of the size , the superior of our method over NTD-BCD is obvious when the rank is in between 27 and 39.

Tensor size: Multilinear rank:
SNR Noisy NCPD- NCPD- NCPD- NTD- NTD- NTD- NTD- NLRT
(dB) BCD HALS FP BCD APG HALS MU
30 3.162e-02 1.860e-02 1.876e-02 2.007e-02 1.818e-02 2.167e-02 1.996e-02 2.546e-02 1.796e-02
40 1.000e-02 6.066e-03 7.358e-03 1.070e-02 6.411e-03 1.303e-02 1.150e-02 1.974e-02 5.663e-03
50 3.162e-03 2.122e-03 5.039e-03 8.861e-03 3.245e-03 1.227e-02 9.539e-03 1.895e-02 1.793e-03
Tensor size: Multilinear rank:
SNR Noisy NCPD- NCPD- NCPD- NTD- NTD- NTD- NTD- NLRT
(dB) BCD HALS FP BCD APG HALS MU
30 3.162e-02 1.813e-02 1.845e-02 2.055e-02 1.766e-02 2.278e-02 2.023e-02 3.086e-02 1.747e-02
40 1.000e-02 6.308e-03 7.441e-03 1.203e-02 6.816e-03 1.437e-02 1.141e-02 2.721e-02 5.519e-03
50 3.162e-03 2.817e-03 4.888e-03 1.029e-02 2.454e-03 1.311e-02 1.173e-02 2.576e-02 1.750e-03
Table 2: The relative approximation errors of the results by different methods in Test 1. The best values are highlighted in bold.
(a)
(b)
Figure 1: Relative approximation errors on the randomly generated tensors in Test 2 with respect to the different rank settings.

4.2 Video Data

In this subsection, we select 4 videos111Videos are available at http://trace.eas.asu.edu/yuv/ and https://sites.google.com/site/jamiezeminzhang/publications. to test our method on the task of approximation. Three videos (respectively named “foreman”, “coastguard”, and “news”) are of the size (heightwidthframe) and one (named “basketball”) is of the size . Firstly, we set the multilinear rank to be and the CP rank to be . We test our method to approximate these four videos with varying from 5 to 100. Moeover, we add the Gaussian noise to the video “coastguard” with different noise levels (SNR = 20, 30 ,40, 50), and test the approximation ability for the noisy data. We plot the relative approximation errors with respect to on four videos in Fig. 2. It can be seen that the approximation errors of the results by our method are the lowest. Fig. 3 shows the relative approximation errors on the noisy video “coastguard” with respect to the . Similarly, our method achieves the lowest approximation errors.

“foreman” “coastguard”
“news” “basketball”
Figure 2: Relative approximation errors on the videos with respect to the different rank settings.
SNR = 20 SNR = 30
SNR = 40 SNR = 50
Figure 3: Relative approximation errors on the noisy video “coastguard” with respect to the different rank settings.

4.3 Hyperspectral Data

In this subsection, we test different methods on the hyperspectral data. We consider four hyperspectral images (HSIs): a subimage of Pavia City Center dataset222Data available at http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes. of the size (heightwidthband), a subimage of Washington DC Mall dataset333Data available at https://engineering.purdue.edu/b̃iehl/MultiSpec/hyperspectral.html. of size , the RemoteImage444Data available at https://www.cs.rochester.edu/ jliu/code/TensorCompletion.zip. of the size , and a subimage of Curprite dataset555Data available at https://aviris.jpl.nasa.gov/data/free_data.html. of the size .

Figs. 4 and 5 report the relative approximation errors with respect to different values of rank . It is evidently that the relative approximation errors by our NLRT are the lowest among all the methods. It is interesting to note that the difference between our method and NTD-BCD (the second best comparison method) is more significant than that on the synthetic data. In Fig. 6, we display the pseudo-color images of the results on the Washington DC Mall dataset with multilinear rank (100,100,100) and CP-rank = 100. The pseudo-color image is composed of the 145th, 127-th, and 92-th bands as the red, green, and blue channel, respectively. In Fig. 6, we also report two image quality assessments: the mean value of the peak signal to noise ratio (MPSNR)666https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio and the mean value of the structural similarity index of all the spectral bands (MSSIM) [64]. It can be found in Fig. 6 that both visual and quality assessments of the proposed NLRT method are better than those by the other comparison methods.

Pavia City Center Washington DC Mall
RemoteImage Curprite
Figure 4: Relative approximation errors on 4 HSIs with respect to the different rank settings.
Figure 5: Relative approximation errors on the HSV with respect to the different rank settings.
NCPD-BCD NCPD-HALS NCPD-FP
MPSNR: 29.16 MSSIM: 0.84 MPSNR: 29.15 MSSIM: 0.84 MPSNR: 28.37 MSSIM: 0.81
NTD-BCD NTD-APG NTD-HALS
MPSNR: 29.74 MSSIM: 0.85 MPSNR: 28.25 MSSIM: 0.80 MPSNR: 27.25 MSSIM: 0.76
NTD-MU NLRT Original
MPSNR: 28.77 MSSIM: 0.82 MPSNR: 34.56 MSSIM: 0.94 MPSNR: Inf MSSIM: 1
Figure 6: The pseudo-color images composed of the 145th, 127-th, and 92-th bands of the non-negative low-rank approximations by different methods when setting the rank on the Washington DC Mall.

4.4 Selection of Features

One advantage of the proposed NLRT method is that it can provide a significant index based on singular values of unfolding matrices [50] that can be used to identify important singular basis vectors in the approximation. Here we take the HSI Washington DC Mall as an example. We compute the low-rank approximations of the proposed NLRT method and the other comparison methods with multlinear rank and CP rank for . For the approximation results by NCPD methods, we normalize the base vectors in 2 such that the norms of , and are equal to 1, and rearrange the resulting values in the descending order in the CP decomposition. In Fig. 7, we plot with respect to , where . Similarly, for the results of NTD methods, we also plot with respect to , where , and indicates a vector composed of the indexes corresponding to the largest norms of ’s columns. For the results by our methods, we plot with respect to , where , is the -th singular values of , and is the third-mode unfolding matrix of . The third-mode of is chosen in NTD and our NLRT, we are interested to observe how many indices required in the spectral model of given hyperspectral data.

Figure 7: The comparison of relative residuals with respect to the number of mode-3 components to be used in the tensor approximation with for the hyperspectral image Washington DC Mall.

In Fig. 7, we can see that when the number of components (namely ) increases, the relative residual decreases. Our NLRT could provide a significant index based on singular values to identify important singular basis vectors for the approximation. Thus, the relative residuals by the proposed NLRT algorithm are significantly smaller than those by the testing NTD and NCPD algorithms. Similar phenomena can be found in Fig. 8, in which and are computed using the number of indices in the first or second modes of .

Figure 8: The comparison of relative residuals with respect to the number of the first mode (upper two rows) and the second mode (bottom two rows) components to be used in the tensor approximation with for the hyperspectral image Washington DC Mall.
Figure 9: Indian Pines image and related ground truth categorization information. Left: the 10-th band of the original HSI. Right: the ground truth categorization map.
No. 1 2 3 4 5 6 7 8
Name Alfalfa Corn- Corn- Corn Grass- Grass- Grass- Hay-
no till min till pasture trees pasture-mowed windrowed
Samples 46 1428 830 237 483 730 28 478
No. 9 10 11 12 13 14 15 16
Name Oat Soybean- Soybean- Soybean- Wheat Woods Buildings-Grass- Stone-
no till min till clean Trees-Drives Steel-Towers
Samples 20 972 2455 593 205 1265 386 93
Table 3: The number of label samples in each class.

4.5 Image Classification

The advantage of the proposed NLRT method is that the important singular basis vectors can be identified. Such basis vectors can provide useful information for image recognition such as classification. Here we conduct hyperspectral image classification experiments on the Indian Pines dataset777Data available at https://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html.. This data set was captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines test site in North-western Indiana in June 1992. After removing 20 bands, which cover the region of water absorption, this HSI is of the size . The ground truth contains 16 land cover classes as shown in Fig. 9. Therefore, we set the multilinear rank to be and the CP rank to be 16 for all the testing methods. We randomly choose of the available labelled samples, which are exhibited in Table 3. Labelled samples from each class are used for training and the remaining samples are used for testing.

Classifier NCPD- NCPD- NCPD- NTD- NTD- NTD- NTD- NLRT
BCD HALS FP BCD APG HALS MU
10 1-NN 69.07 66.49 71.52 74.66 69.08 70.56 62.66 74.90
3-NN 64.31 61.46 66.77 69.22 64.73 65.62 58.94 70.11
5-NN 62.12 59.65 65.75 67.18 63.69 64.60 58.30 68.39
20 1-NN 76.91 73.96 79.01 81.95 75.90 78.33 69.90 82.05
3-NN 72.27 69.58 74.13 76.51 72.37 74.07 65.76 77.49
5-NN 70.26 67.64 72.35 73.72 70.27 71.92 64.12 75.60
30 1-NN 80.92 78.18 81.93 85.52 79.77 81.77 74.59 85.71
3-NN 76.37 73.93 77.79 80.69 75.90 78.00 69.69 81.61
5-NN 74.39 71.86 75.90 77.97 73.93 76.04 68.23 79.17
40 1-NN 83.86 81.54 84.54 88.17 82.04 84.56 77.28 88.53
3-NN 79.70 77.49 80.98 84.11 78.94 80.81 73.05 84.87
5-NN 77.17 75.17 79.42 81.23 76.86 79.08 70.78 82.98
50 1-NN 85.51 83.81 86.02 89.88 84.48 86.27 80.66 90.15
3-NN 81.55 79.82 82.90 86.35 81.40 82.80 75.59 86.52
5-NN 79.37 77.67 81.34 84.15 79.39 80.97 73.03 84.81
Table 4: The accuracy (in terms of percentage) of the classification results on the approximations by different methods. The best values are highlighted in bold.

After obtaining low rank approximations, 16 singular vectors corresponding to the largest 16 singular values of the unfolding matrix of the tensor approximation along the spectral mode (the third mode) are employed for classification. We apply the -nearest neighbour (-NN, =