# Enhanced 3DTV Regularization and Its Applications on Hyper-spectral Image Denoising and Compressed Sensing

The 3-D total variation (3DTV) is a powerful regularization term, which encodes the local smoothness prior structure underlying a hyper-spectral image (HSI), for general HSI processing tasks. This term is calculated by assuming identical and independent sparsity structures on all bands of gradient maps calculated along spatial and spectral HSI modes. This, however, is always largely deviated from the real cases, where the gradient maps are generally with different while correlated sparsity structures across all their bands. Such deviation tends to hamper the performance of the related method by adopting such prior term. To this is- sue, this paper proposes an enhanced 3DTV (E-3DTV) regularization term beyond conventional 3DTV. Instead of imposing sparsity on gradient maps themselves, the new term calculated sparsity on the subspace bases on the gradient maps along their bands, which naturally encode the correlation and difference across these bands, and more faithfully reflect the insightful configurations of an HSI. The E-3DTV term can easily replace the previous 3DTV term and be em- bedded into an HSI processing model to ameliorate its performance. The superiority of the proposed methods is substantiated by extensive experiments on two typical related tasks: HSI denoising and compressed sensing, as compared with state-of-the-arts designed for both tasks.

• 3 publications
• 16 publications
• 41 publications
• 78 publications
• 78 publications
• 3 publications
04/27/2022

### Low-rank Meets Sparseness: An Integrated Spatial-Spectral Total Variation Approach to Hyperspectral Denoising

Spatial-Spectral Total Variation (SSTV) can quantify local smoothness of...
05/30/2020

### Hyperspectral Image Denoising via Global Spatial-Spectral Total Variation Regularized Nonconvex Local Low-Rank Tensor Approximation

Hyperspectral image (HSI) denoising aims to restore clean HSI from the n...
05/29/2021

### Compressed Sensing for Photoacoustic Computed Tomography Using an Untrained Neural Network

Photoacoustic (PA) computed tomography (PACT) shows great potentials in ...
12/11/2018

### Non-local Meets Global: An Integrated Paradigm for Hyperspectral Denoising

Non-local low-rank tensor approximation has been developed as a state-of...
10/24/2020

### Non-local Meets Global: An Iterative Paradigm for Hyperspectral Image Restoration

Non-local low-rank tensor approximation has been developed as a state-of...
05/18/2021

### Make $\ell_1$ Regularization Effective in Training Sparse CNN

Compressed Sensing using 𝓁1 regularization is among the most powerful an...
04/18/2018

### Shaking Acoustic Spectral Sub-bands Can Better Regularize Learning in Affective Computing

In this work, we investigate a recently proposed regularization techniqu...

## 1 Introduction

The radiance of a real scene is distributed across a wide range of spectral bands. A hyperspectral image (HSI) consists of various intensities that represent the integrals of the radiance captured by sensors over hundreds of discrete bands. As compared with traditional image systems, HSI facilitates delivering more faithful representation for real scenes, and thus tends to be better performed on various computer vision tasks, such as classification

Zhao2016High Dong2016HSI , compressed sensing Zhang2015HSI ; Mart2015HYCA , and mineral exploration W2013S ; Goetz2009 .

In real cases, however, an HSI is always significantly corrupted by noises that are generally conducted by sensor sensitivity, photon effects, light condition and/or calibration error Goetz2009 . The HSI denoising problem is thus a critical issue and the resolving of the problem could greatly ameliorate the performance of the subsequent HSI processing tasks. Besides, the data need to be transmitted to ground station to deal with, but an HSI practically collected from an airborne sensor or a satellite is always with hundreds of image bands, which makes the HSI requiring a large storage space. This will conduct severe issues of low efficiency and high cost on transmitting them to the ground stations. Thus it is necessary to design effective techniques on HSI compressed sensing to satisfy the on-time transmit.

One of the most widely utilized priors for HSI denoising/compressed sensing is the local smoothness prior. The spatial local smoothness prior refers to the fact that similar objects/scenes (with shape) are often adjacently distributed with similar spectrum wave, and the spectral smoothness prior delivers the fact that adjacent bands of images of an HSI are usually collected with similar sensor parameter settings, and thus with similar values. Such local smoothness prior structure possessed by an HSI can be equivalently understood as the sparsity of the gradient maps calculated along both the spatial and spectral modes of the HSI, as shown in Fig. 1

. The local smoothness prior of HSI space and spectrum can then be naturally encoded as the total variances along different modes of the HSI

( are the sizes of the spatial height, width and spectrum of the HSI, respectively), that is, the norm on the gradient maps (), where , and represent the gradient maps of calculated along its spatial height, width and spectrum modes, respectively111The definitions for will be provided in detail in Section 3.. This term is called 3D-total variation regularization, or 3DTV briefly, with the form:

 ∥X∥3DTV=3∑n=1∥Gn∥1. (1)

It has been extensively shown that this term can be very helpful for various HSI processing issues He2016 ; Wang2017Hyperspectral ; Hes2016 . Some even attain state-of-the-art performance for certain tasks.

Although being successfully utilized in various tasks, the 3DTV regularization still has not sufficiently considered the insightful sparsity structure knowledge underlying the gradient maps of an HSI. The most typical limitation reflects on its similar sparsity consideration (i.e., identical and independently distributed Laplacian prior distribution set on gradient maps) imposed on all bands of the gradient maps in spectrum, spatial width and height. That is, the sparsity extents on all bands of these gradient maps are implicitly assumed to be similar and unrelated. This, however, is always deviated from the real cases. On one hand, in different bands of the gradient maps, instead of identical, the sparsity exist evident variations, which can be easily observed from Fig. 1(c). On the other hand, instead of independent, there are evident correlations across different bands of the gradient maps of an HSI, as clearly shown in Fig. 1(e), which are naturally succeeded from the band correlation property of the original HSI ZHY2014HSI ; XY2016Hyperspectral . Such a prior knowledge deviation from the real HSIs makes the capability of such a useful regularization term still has a large room to be further strengthened.

To alleviate this issue, this paper proposes an enhanced 3DTV regularization term. While similar to the conventional 3DTV, the proposed term is also operated on gradient maps of height, width and spectrum mode of HSI, the new term applies the sparsity measure (

norm, just like the conventional 3DTV) on their subspace basis maps along bands instead of the gradient maps themselves. Each basis is obtained by a linear combination of the original gradient map vectors along bands:

, where is the transformation matrix, with size , facilitating to find the () bases of . Such new regularization term rationally reflects practical correlated and different sparsity extents across different bands of the gradient maps. Specifically, representing the gradients by such calculated less bases delivers the sparsity correlation insight of an HSI, and their different coefficients (which will be introduced in detail in Section 4) represents the difference among bands of these gradient maps. Besides, just like other subspace learning methods, the gradient bases always contain much less bands as compared with original gradient maps, making them less negatively influenced by embedded corruptions, and more robust for different HSI processing tasks in practical scenarios.

In summary, this paper makes mainly three-fold contributions as follows:

1. We propose an enhanced 3DTV (E-3DTV, briefly) regularization term, to better reflect the sparsity insight of the gradient maps underlying an HSI. The non-i.i.d. sparse distribution prior characteristic of gradient maps along all HSI bands is more faithfully represented by the new regularization, and thus it is expected to replace the 3DTV term of a general HSI processing model to improve its performance.

2. By easily using the proposed E-3DTV as the unique regularization term, we formulate two models for two typical HSI processing tasks: HSI denoising and compressed sensing. The ADMM algorithms are readily designed to solve the corresponding models. The closed-form updating equations are deduced for each step of both algorithms, and they can thus be implemented efficiently.

3. Comprehensive experimental results substantiate the superiority of the proposed method beyond state-of-the-art methods previously designed on both tasks. It is verified that such easy substitution of E-3DTV to 3DTV can always bring significant performance improvements on most of experiments.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 introduces some notations and preliminaries related to this work. Section 4 presents the new regularization term, as well as the rationality of its formulation. Section 5 gives the HSI denoising model by involving E-3DTV regularization term, and Section 6 demonstrates the corresponding experimental results. Section 7 provides the model for HSI compressed sensing using the E-3DTV as its regularization, and Section 8 reports the related experimental results. Conclusions are finally made in Section 9. Throughout the paper, we denote scalars, vectors, matrices and tensors as the non-bold, bold lower case, bold upper case and upper cursive letters, respectively.

## 2 Related Work

In the following, we will review the most relevant studies on two investigated HSI processing tasks, HSI denoising and HSI compressed sensing, respectively.

### 2.1 HSI Denoising

A number of methods such as K-SVD Elad2006Image , non-local means Buades2005A , BM3D Dabov2007Image and wavelet shrinkage Othman2006Noise have been developed for 2D image denoising. These methods can be directly applied to HSI denoising by denoising each band of image independently. However, such an easy manner ignores the correlations among the spectral bands or spatial pixels, and thus usually could not provide satisfactory results.

More substantial progress on HSI denoising have been made by directly constructing models on HSIs, especially considering their spectral information. Most of them considered to construct a rational prior to deliver intrinsic structures underlying HSI. One representative prior is global spectral correlation. This prior indicates that the images located at different HSI bands are generally closely correlated. A generally adopted trick to formulate such a prior is first to unfold the HSI along its spectral mode to form a HSI matrix, and then use low-rank regularization terms to encode this prior Chen2011Denoising , Gu2014Weighted , He2015Hyperspectral , Peng2014Reweighted , XY2016Hyperspectral , ZHY2014HSI . Such a low-rank structure can be formulated implicitly into an HSI denoising model by a nuclear norm (e.g., the NNM method Candes2009Robust , Wright2009Robust ) or its non-convex variations, like the weighted nuclear norm (e.g., the WNNM method Gu2014Weighted , Peng2014Reweighted ) and the Schatten -norm (e.g., the WSNM method XY2016Hyperspectral ). Besides, it can also be directly encoded as an explicit low rank matrix factorization form, like that used in the LRMR method ZHY2014HSI ; Zhou2011GoDec . Very recently, to more faithfully deliver the multi-factor affiliation underlying HSI, some HSI denoising methods Lu2017Tensor , Wang2017Hyperspectral treated an HSI directly as its original 3-mode tensor form, and use certain low-rank tensor approximation techniques Kolda2009Tensor , e.g., the Tucker and CP decompositions, to characterize such low-rank characteristic under tensor expressions.

The spatial nonlocal similarity prior is another widely utilized prior for the task. A practical HSI always contains a collection of similar local patches all over the space, composing of homologous aggregation of micro-structures. Such a non-local spatial similarity prior among HSI patches is also commonly used in the HSI denoising model. Specifically, by extracting the common patterns among these nonlocally similar patches, the spatial noise is expected to be prominently alleviated. There are mainly two manners to encode such prior knowledge. One is by dictionary learning, which aims to build each local patch similar group of an HSI by the linear combination of a small number of atoms from a dictionary Mairal2009 , Liu2013A , Qian2013Hyperspectral , Xing2011Dictionary . It has been empirically verified that such modeling can lead to a good performance of HSI denoising. Typical methods designed in this way include BM4D Maggioni2013Nonlocal and TDL Peng2014Decomposable . Another rational way to encode such non-local similarity prior is to specify a low-rank regularization term on these similar groups, and use low-rank matrix/tensor approximation techniques to realize HSI recovery Dong2015Denoising , Mairal2009 , xie2016multispectral .

Besides, spatial local-smoothness prior is also a powerful prior utilized for HSI denoising. The gray values of an HSI collected from natural scenes are generally continuous along its spatial and spectral modes. Such prior structure can be readily characterized by the well known total variation (TV) regularization term Rudin1992Nonlinear , Chambolle2004An . Multiple HSI denoising methods Bouali2011Toward , He2016 , Wu2017Structure , Wang2017Hyperspectral , Jiang2016Hyperspectral have been raised by utilizing such a regularization, and the utilization of such prior has been verified to be helpful for the fine denoising performance. Besides, due to the convexity and conciseness of such a regularization term, it is easy to integrate this term with other prior terms into a unified model to enhance the capability of the model on HSI recovery. A typical example is the LRTV method He2016 , which combines this smoothness prior with the global correlation prior along spectrum, and obtained superior performance compared with other TV-based HSI denoising approaches.

### 2.2 HSI Compressed Sensing

HSI compressed sensing aims to possibly precisely reconstruct an HSI from a small set of compressed measurements imposed on it Candes2006Robust ; Donoho2006Compressed , which is a typical inverse problem, and needs to make it calculable by setting regularization/prior terms on the to-be-recovered HSI variables.

Based on the prior knowledge that an HSI can be sparsely represented under an appropriate redundant dictionary, Duarte and Baraniuk Duarte2010Kronecker proposed a compressed sensing method based on Kronecker product. They considered to use Kronecker products from sparse basis and sensing matrix. For a multi-dimensional signal, its -section is defined to be the part where all dimension indicators other than the -th dimension are fixed, than a sparse base of a multi-dimensional signal may consist of Kronecker products of sparse bases for each -section.

Similar to the HSI denoising method, the global spectral correlation and spatial smoothness priors have also been considered for the task. In 2011, Waters et al.Waters2011SpaRCS proposed a low rank and sparse matrix recovery method under compressed sensing, and applied the method to video and HSI compression recovery. Golbabaee et al. Golbabaee2012Hyperspectral further discussed the correlation of HSIs in the spectral dimension, and shows certain joint sparseness in the spatial dimension under a wavelet representation and proposed a method based on low rank and joint sparse matrix recovery. In 2013, Golbabaee et al. Golbabaee2013Joint proposed the joint norm and TV norm minimization model to encode the correlation of the spectral bands and local smoothness prior. Gogna et al. Gogna2015Split designed the split-Breggman algorithm with low rank and sparse or joint sparse signal recovery, combined with the Kronecker product method, and applied it to the compression recovery of HSI images.

Besides, spectral correlation priors have also been used for the task by directly setting the HSI as a tensor and constructing tensor processing models Yang2015Compressive , Wang2017Compressive for the problem in the recent years. For example, Yang et al. Yang2015Compressive used a tensor sparse representation of a HSI cube, under nonlinear compressed operator, which got some comparable results. To further ameliorate the performance especially under lower sampling rates, Wang et al. proposed the JTenDTV Wang2017Compressive model by jointing the Tucker decomposition and 3DTV norm to character the correlation of spectral bands as well as the spatial local smoothness prior, and got the comparable result.

Additional, there are also some methods proposed based on combining the compressed sensing with other HSI tasks to improve the performance of compressed sensing. The common way of fusion is to use unmixing Dias2012A , Li2012A to improve the performance of HSI compressed sensing.

## 3 Notations and Preliminaries

For a given HSI , where , and denote the sizes of spatial height, width and number of spectrum in the HSI, respectively. We denote the unfolding matrix of along the spectral mode as , which satisfies and .

Let denote the intensity at the voxel , and let

 G1(i,j,k)=X(i,j,k)−X(i+1,j,k)G2(i,j,k)=X(i,j,k)−X(i,j+1,k)G3(i,j,k)=X(i,j,k)−X(i,j,k+1) (2)

denote three difference operations at the voxel along the spatial height, width and the spectrum, respectively. We can now introduce the difference operations calculated three different modes on an HSI , i.e., , as follows:

 Gn=Dn(Xn),∀n=1,2,3, (3)

where , are the gradient-map tensors corresponds to spatial height, weight and spectrum modes respectively222

Here, we use zero padding for

before applying difference operation on it, which can keep the size of the same as and make calculation convenient in the later sections.. Then, we denote the unfolding matrices of these gradient maps as:

 Gn=unfold3(Gn),∀n=1,2,3. (4)

One can also see Fig. 1 (b) (c) and (d) for easy understanding these operations.

It is easy to find that the difference operations and on is equivalent to applying subtractions between rows in , and on is equivalent to applying subtractions between columns in . This means all the three different operations are linear333Subtractions between rows of a matrix is equivalent to pre-multiply the matrix by a proper matrix, and subtractions between columns of a matrix is equivalent to post-multiply the matrix by a proper matrix.. We can then denote following three linear operations:

 ∇nX=unfold3(Dn(fold(X)))=Gn,∀n=1,2,3. (5)

Note that this linear operations encode the relationship between an HSI and its gradient maps s, facilitating us to construct HSI processing method with priors on the gradient maps.

The most commonly used sparsity measure on the gradients maps, s, is the so call norm sparse measure:

 S(Gn)=∥Gn∥1. (6)

By adopting this sparsity measure, the widely used smoothness regularization term for HSI, 3DTV Wang2017Hyperspectral , Wang2017Compressive , can be constructed by

 (7)

One can also see Fig. 1 for easy understanding this term. We can then present the enhanced 3DTV term beyond this conventional one in the next section.

## 4 Enhanced 3DTV Regularization

We first discuss how to construct an enhanced sparsity term for each , which will help us construct the E-3DTV norm in the following sections.

### 4.1 Enhanced TV Regularization Term

For an given HSI whose unfolding matrix is , we seek to model the correlated sparse structure along its gradient map, i.e., our goal is to construct a rational sparsity measure for each gradient map , .

As studied in many previous works, the unfolded HSI can be rationally assumed to be of low rank Gu2014Weighted , He2015Hyperspectral , Peng2014Reweighted , XY2016Hyperspectral , ZHY2014HSI , He2016 . Then, it is easily to find that also possesses low-rankness property444 Since is a linear operation, we have . . This means can be expressed in following low-rank matrix factorization form:

 Gn=UnVTn, (8)

where and , () is the rank of the HSI. In addition, it easy to find the columns in is in fact a set of basis to the gradient maps . We can also rationally assume to make the subspace bases contain possibly compensate while non-repetitive information underlying the original . In this case, we can rewrite equation (8) in following formulation:

 Un=GnVn, (9)

where is now represented as a linear combination of all columns of gradient maps. Furthermore, to guarantee the bases keep sufficient information of the original , we further assume , i.e., , is of approximately similar capacity with the original .

Fig. 2 shows the bases calculated on a clean HSI and it noisy observation. It is easy to observe from Fig. 2 (b) and (c) that by adding a little simulated Gaussion noise to the HSI, the quality of gradient maps, as well as its sparsity structures, can be badly damaged, which makes the corrupted images not easily to be restored by only using the original 3DTV regularization term (7) directly imposed on it. However, we can easily observe form Fig. 2 (d) and (e) that is evidently less corrupted than original gradient maps, and its sparse structure is evidently more evident and less hampered by noises. It is thus expected to be easier to recover the bases from the corrupted ones by E-3DTV than to do this task on the original gradient maps by conventional 3DTV.

Thus, instead to design sparse regularization term for itself, we seek to design sparse regularization for , i.e. for . Based on the aforementioned analysis, the E-3DTV measure for each , , is of the following implicit expression formulation:

 (10)

Here, can be seen as a coefficient matrix that relates different gradient maps together.

It should be noted that the proposed E-3DTV measure is actually a linear transformed sparsity on

, with the transforming matrix to be determined by the input data automatically. Compared with the traditional sparsity measure (6), the proposed (40) takes the correlation prior among gradient maps into consideration when constructing sparse regularization term. It takes the advance that there exist some low-rank basis for s, which is sparse but can represent all information in with a coefficient matrix , and these bases are always more stable to be calculated than the gradient maps themselves when the HSI is corrupted.

Since it is not easy to solve (40) directly, we reformulate (40) as the following equivalent formulation:

 SE-TV(Gn)=minUn,Vn∥Un∥1s.t. Gn=UnVTn,VTnVn=I,Un∈Rhw×r,Vn∈Rs×r. (11)

Here, we call two problems equivalent if from a solution of one, a solution of the other is readily found, and vice versa. The proof of the equivalence here is presented in theorem 9.1 in the supplementary material.

### 4.2 Enhanced 3DTV Regularization Term

By adopting the proposed sparsity measure (41), we can now construct E-3DTV regularization term imposed on gradient maps calculated along all its spectral, spatial width and height modes. In similar way as 3DTV regularization (7), we model our regularization term by summing the sparsity measure of the gradient maps along different modes:

 ∥X∥E-3DTV=3∑n=1SE-TV(∇nX)=3∑n=1SE-TV(Gn). (12)

One can also see Fig. 1 for easy understanding this term. Moreover, this term is equivalent to:

 ∥X∥E-3DTV=3∑n=1minUn,Vn∥Un∥1s.t.∇nX=UnVTn,VTnVn=I,Un∈Rhw×r,Vn∈Rs×r,n=1,2,3. (13)

In the following sections, we will verify that the proposed E-3DTV regularization term (13) can be easily embedded into HSI processing models and the corresponding resolving algorithms can be easily designed.

## 5 E-3DTV methods for HSI denoising

### 5.1 The E-3DTV based HSI denoising model

The denoising task is to separate the clean data and noise from the noisy data. In many of the denoising tasks, the noise is often assumed to be Gaussian, and a quadratic loss function between the clean data and noisy data is thus often exploited. However, in real hyperspectral scenes, the types of noise are various, such as sparse noise, stripes noise, dead lines, missing pixels, and so on

XY2016Hyperspectral ; He2016

, which obviously do not obey a Gaussian distribution. Therefore, we choose the

norm, a more robust loss function to general heavy noises Wright2009Robust ; Candes2009Robust , to model the noise and construct our denoising model as:

 minX,Eτ∥X∥E-3DTV+∥E∥1 s.t. Y=X+E, (14)

where , , denote the input noisy HSI, to-be-reconstructed HSI and the noise embedded on it, respectively. is tradeoff parameter between the proposed E-3DTV measure and the noise level. Based on (13), it is easy to find that this model is equivalent to:

 minX,Un,Vn,Eτ3∑n=1∥Un∥1+∥E∥1 s.t. Y=X+E,∇nX=UnVTn,VTnVn=IVn∈Rs×r,Un∈Rhw×r,n=1,2,3. (15)

This optimization problem (15) can be readily solved by the ADMM strategy as follows.

To slove the proposed HSI denoising model by ADMM algorithm, the following augmented Lagrangian function is firstly required to be minimized:

 L(X,E,Un,Vn,Mn,Γ)=3∑n=1∥Un∥1+λ∥E∥1+3∑n=1⟨Mn,∇nX−UnVTn⟩+μ2∥∇nX−UnVTn∥2F+⟨Γ,Y−X−E⟩+μ2∥Y−X−E∥2F, (16)

where , and are the Lagrange multipliers and is a positive scalar in ADMM algorithm.

In the ADMM framework, we need to alternatively optimize each variable involved in (16) with all others fixed. We then discuss how to solve its sub-problems with respect to each involved variable.

Update . Extracting all items containing from Eq. (16), we can obtain the sub-problem as:

 Xk+1:=argminX3∑n=1⟨Mkn,∇nX−UknVkTn⟩+⟨Γk,Y−X−Ek⟩+μ23∑n=1∥∇nX−UknVkTn∥2F+μ2∥Y−X−Ek∥2F. (17)

Optimizing (17) with respect to can thus be equivalently treated as solving the following linear system:

 (μI+μ3∑n=1∇∗n∇n)(X)=μ(Y−E(k))+Γk+μ3∑n=1∇∗n(UknVkTn)−∇∗n(Mkn) (18)

where indicates the adjoint operator of . Attributed to the block-circulant of matrix corresponding to the operator , it can be diagonalized by using the FFT matrix Ng1999A . Then, can be efficiently computed by

 (19)

where fftn and ifftn indicate fast Fourier transform and its inverse transform, respectively.

is the elements-wise square, and the division is also performed element-wisely.

Update . Extracting all items containing from Eq. (16), we can obtain:

 U(k+1)n:=argminUn∥Un∥1+μ2∥Un−(∇nX+Mkn/μ)Vkn∥2. (20)

This sub-problem can be efficiently solved by calling the known soft-thresholding operator Donoho2002De :

 SΔ(x)=⎧⎪⎨⎪⎩x−Δifx>Δ,x+Δ,ifx<−Δ,0,otherwise, (21)

where and , is the threshold value. Then the solution of (20) can be calculated in close-form as follows:

 Uk+1n=S1μ((∇nX+Mkn/μ)Vkn). (22)

Update . Extracting all items containing orthogonal from Eq. (16), we have:

 Vk+1n:=argminVnμ2∥∥ ∥∥UknVTn−(∇nX+Mknμ)∥∥ ∥∥2F=argminVn⟨(∇nX+Mknμ)TUkn,Vn⟩. (23)

The global solution of this sub-problem can be achieved in close-form by following theorem xie2016multispectral :

###### Theorem 5.1

For any , the global solution of the problem:

 minVVT=I⟨A,V⟩ (24)

is =, where denotes the condensed SVD of .

We can then obtain the updating equation of as:

 {[B,D,C]=svd% ((∇nX+Mkn/μ)TUkn)Vk+1n=BCT. (25)

Update . Extracting all items containing from Eq. (16), we can get:

 Ek+1:=argminEλ∥E∥1+μ2∥∥ ∥∥E−(Y−X+Γkμ)∥∥ ∥∥2=S1μ(Y−X+Γkμ). (26)

Based on the general ADMM principle, the multipliers are further updated by the following equations:

 (27)

Summarizing the aforementioned descriptions, we can get the entire ADMM algorithm, as summarized in , for solving the model (15).

## 6 Experimental Results on HSI denoising

In this section, extensive experiments are presented to evaluate the performance of the proposed method. We applied our proposed E-3DTV prior to HSI denoising in comparison with 8 state-of-the-art methods. In order to give an overall evaluation, three quantitative quality indices are employed: PSNR, SSIMWang2004Image , ERGASWald2010Data . PSNR and SSIM are two conventional spatial-based metrics, while ERGAS is spectral-based evaluation measure. The larger PSNR and SSIM values are, and the smaller ERGAS values is, the better quality the restored images tends to be of.

To demonstrate the effectiveness of the proposed algorithm, we compared our denoising results with 8 recently developed state-of-the-art denoising methods, including non-local transform domain filter for volumetric data (BM4D) Maggioni2013Nonlocal , TDL Peng2014Decomposable , NNM Wright2009Robust , WNNM Gu2014Weighted , LRMR ZHY2014HSI , WSNM XY2016Hyperspectral , LRTV He2016 and LRTDTV Wang2017Hyperspectral . All the involved parameters in these competing methods are fine-tuned by default settings or following the rules in their papers to guarantee their possibly optimal performance. Before the experiments, the gray value of data were rescaled to [0,1]. All the experiments were performed using MATLAB (R2014a) on Windows 7 OS with dual-core Inter 3.60-GHz processor and 64 GB of RAM.

### 6.1 Synthetic Data Experiments Setting

#### 6.1.1 Experiments Setting

We adopted two synthetic HSIs data. One was considered by He2016 , generated from the Indian Pines dataset. The size of the Indian pines synthetic HSI is , with evident local smoothness property. The other one is ARIVIS DCMall HSI dataset, whose size is , with complex structure and texture information. To simulate noisy HSI data, we added six different types of noises to both original HSI data. The details are listed as follows:

Case a: Zero-mean Gaussian noise is added to each band. The variances of the Gaussian noise is 0.1.

Case b: Gaussian noise is added to each band just as Case a. In addition, some deadlines are added from band 91 to band 130, with the number of stripes randomly selected from 3 to 10, and the width of the stripes randomly generated from 1 to 3.

Case c: The mixture of zero-mean Gaussian noise and impulse noise are added to each band. The variance of the Gaussian noise and percentage of the impulse noise are set as 0.075 and 0.15, respectively.

Case d: The mixture of Gaussian and impulse noises are added like Case c, and the deadlines are added as Case b.

Case e: Zero-mean Gaussian noises with different variances and impulse noise with different percentage are added to each band, with the variance value being randomly selected from 0 to 0.2, and percentage value being randomly selected from 0 to 0.2. In addition, the deadlines are added to some bands as Case d.

Case f: The mixture of Gaussian noise, impulse noise, and deadlines are added as Case e did. In addition, some stripes are added from band 161 to band 190 in Indian pines, and some stripes are added from band 141 to band 160 in DC mall, with the number of stripes being randomly selected from 20 to 40.

As for parameter settings, there are three parameters in the model that need to be set. They are the desired rank , the noise coefficient , and the sparsity basis coefficient . Since we used the norm to model the noise, inspired by the parameter settings of the RPCA, we set , and set the sparse basis coefficient as . Note that we should manual adjust the value, because the sparse basis coefficient and data is relevant, different data have different coefficient. In our synthetic data experiments, the value of Indian pines was set as 0.004, and that of DC mall was set as 0.0004. The rank is set as 13 in all Indian pines experiments, and 6 in DC mall experiments.

#### 6.1.2 The Synthetic Indian pines HSI experiments

In terms of visual quality, two representative bands of restored HSIs in two typical Case c and Case e obtained by different methods are shown. Fig. 12 shows the restoration results of band 220 under Case c, and Fig. 4 shows the restoration results of band 120 under Case e. Compared with other methods, it can be easily observed that the proposed method recovers more meaningful details including textures and edges, and meanwhile produces better reconstruction results in smooth regions with higher PSNR and SSIM values. And the mean overall quantitative assessment results by all competing denoising methods are listed in Table. 1, and the distribution of quality indices under Case e is depicted in Fig. 5 to show the each band’s restoration performance. In order to clearly depict the top of curve in the (a-2) and (b-2) pictures, we enlarge this area in the (a-1) and (b-1) figures. From the figure, we can easily see that the evaluation index curve of E-3DTV is higher than other methods, which verifies the better recovery performance of the E-3DTV in almost all bands.

To further compare the performance of all competing methods, it would be necessary to show the spectral signatures before and after restoration. As such, Fig. 6 shows one simple example namely the spectral signatures of pixel (50, 50) in Case e. It can be clearly seen that the LRTDTV as well as E-3DTV methods show the better restoration results than other methods. Especially, the E-3DTV method can preserve some detailed changes of spectral curve. Combined with the quality indices values shown in Table 1, our proposed E-3DTV method attains the best restoration results among all compared methods.

#### 6.1.3 The Synthetic DC Mall HSI Experiments

We then show the performance of all competing methods on the DC mall data, which have more complex structure than previous one. We also provide the two visual restoration images in Figs. 7 and 8, respectively. The average of all quantitative assessment results are provided in Table. 2, the distribution of quality indices under Case e is shown in Fig. 9, and the signature restoration of pixel (100, 100) in case e are provided in Fig. 10.

From the table and figures, it can be easily seen that our proposed E-3DTV method is also of superior performance among other competing methods both in quantity and visualization. The effectiveness of the proposed E-3DTV regularizer is further substantiated.

### 6.2 Real Data Experiments

Four typical real-world HSI datasets were further used for experiments. These datasets include ARIVIS Indian, ARIVIS Low altitude, HYDICE Urbanpart and Terrain. These real HSIs all contain serious pollutions including water pollution, deadlines, strips and sparse noise, and their shapes and structures are very different. It is thus a big challenge to recover the clean HSIs from them. In order to comprehensively display the restoration effects of all methods, we select two representative bands to show the visual restoration effects for all competing methods.

From Figs. 11-18, it can be clearly observed that the methods based on utilizing the local smoothness prior and low rank prior such as LRTV and LRTDTV, together with the proposed E-3DTV method, perform better than other competing methods. Specifically, the E-3DTV method can better recover the overall structure information of HSI and suppress the noises as shown in Figs. 11, and get the best restoration result in preserving the local edge/texture information. The E-3DTV method can also be seen to better remove the structure noises (such as deadlines and stripes) and preserve the local texture of the HSIs from the Figs. 13 to 16. The proposed method can better remove the artificial shadows and overlap to effectively avoid false error information caused by low-rank frames as depicted in Fig. 18.

In all, we can see that the proposed E-3DTV method clearly outperforms all other competing methods in both its better noise suppressing effect and its finer meaningful structure preservation capabilities. This substitutes the usefulness of the proposed E-3DTV regularization on such task in real complicated cases.

## 7 E-3DTV Method for HSI Compressed Sensing

### 7.1 The E-3DTV Model

The compressed sensing task is to reconstruct the HSI from a small number of compressed measurements with global information. Then the compressed measurements can be obtained by

 y=Ψ(Z), (28)

where denote the to-be-reconstructed HSI. is generally considered to contain the clean HSI term and noise term mixed in the signals during the acquisition. Denote the clean HSI and noise as , , respectively, and then we can get . The compressive operator can be instantiated as , where is a random permutation matrix, is the WalshHadamard transform, and is a random down sampling operator. Such a compressive operator has been shown to satisfy the restricted isometry property and successfully used in dealing with various compressed sensing problems Jiang2017Surveillance ; Gogna2015Split ; CaoW2016H .

Recovering based on (28) is an ill-posed inverse problem, and thus prior information should necessarily be exploited. Here we simply adopted the proposed E-3DTV regularization as the unique prior term in the model. In addition, considering the actual transmission, some seriously polluted bands have little or no information. In order to save the transmission resources, these bands can be thrown away at transmission. So here, we assume that the noise obeys a Gaussian distribution as previous works did Wang2017Compressive , and a norm term is adopted on the noise term. The proposed HSI compressed sensing model is then formulated as follows:

 minX,Eτ∥X∥E-3DTV+12∥E∥2 s.t. y=Ψ(X+E), (29)

where is the trade-off parameter for controlling the sparsity of the gradient map of the HSI and noise regularization parameter. By adopting (13), and introducing a auxiliary matrix , we can rewrite (29) as:

 minX,Z,E,Un,Vnτ3∑n=1∥Un∥1+12∥E∥2 s.t. y=Ψ(Z),Z=X+E∇nX=UnVTn,VTnVn=I,n=1,2,3. (30)

The proposed optimization problem (30) can be readily solved by the ADMM strategy. Firstly, the following augmented Lagrangian function is required to be minimized:

 L(Z,X,E,Un,Vn,Mn,Γ1,Γ2)=3∑n=1τ∥Ui∥1+3∑n=1⟨Mn,∇nX−UnVTn⟩+μn2∥∇nX−UnVTn∥2F+⟨Γ1,y−Ψ(Z)⟩+μ42∥y−Ψ(Z)∥2F+12∥E∥2+⟨Γ2,Z−X−E⟩+μ52∥Z−X−E∥2F, (31)

where , , and are the Lagrange multipliers and s are a positive scalar in ADMM algorithm. We then alternatively optimize each variable involved in (31) with all others fixed, by following steps:

Update . Extracting all items containing from Eq. (31), we can obtain that:

 Z:=argminZμ42∥∥∥Ψ(Z)−(y+Γ1μ4)∥∥∥2F+μ52∥∥∥Z−(X+E+Γ2μ5)∥∥∥2F. (32)

Optimizing (32) with respect to can thus be equivalently treated as solving the following linear system:

 μ4Ψ∗Ψ(Z)+μ5Z=      μ5(X+E)+μ4Ψ∗y+Ψ∗Γ1−Γ2, (33)

where indicates the adjoint of . This linear system can be solved by off-the-shelf techniques such as the preconditioned conjugate gradient method.

Update . Extracting all items containing from Eq. (31), similar to the process of solving , we can obtain that optimizing is equivalent to solve the following linear system:

 3∑n=1μnD∗D(X)+μ5X=3∑n=1μnD∗(UnVTn+Mnμn)+μ5(Z−E+Γ2μ5). (34)

Similar to the process of solving in the GCS denosing model, we can get that can be efficiently computed by

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩Hx=μ5(Z−E)+Γ2+3∑n=1D∗n(μnUnVTn+Mn),Tx=μ1|fftn(D1)|2+μ2|fftn(D2)|2+μ3|fftn(D3)|2,X=ifftn(fftn(% Fold(Hx))μ1+μTx), (35)

where fftn and ifftn indicate fast Fourier transform and its inverse transform, respectively.

Update . Extracting all items containing from Eq. (31), similar to the process of solving in the GCS denosing model, we get that can be efficiently computed by

 Un=Sτμn((∇nX+Mn/μn)Vn). (36)

Update . Extracting all items containing orthogonal from Eq. (31), similar to the process of solving in the GCS denosing model, we get that can be efficiently computed by

 {[B,D,C]=svd% ((∇nX+Mn/μ)TUn),Vn=BCT. (37)

Update . Extracting all items containing from Eq.(31), similar to the process of solving in the GCS denosing model, we can get that can be efficiently computed by

 E=μ5(Z−X)+M51+μ5. (38)

Based on the general ADMM principle, the multipliers are further updated by the following equations:

 ⎧⎪ ⎪⎨⎪ ⎪⎩Mn=Mn+μn(∇nX−UnVTn),n=1,2,3,Γ1=Γ1+μ4(y−Ψ(Z)),Γ2=Γ2+μ5(Z−X−E). (39)

The above steps can then be summarized as , for solving the model (30).

## 8 Experimental Results on HSI Compressed Sensing

In this section, several real HSI data experiments were carried out to substantiate the effectiveness of the E-3DTV regularizer on HSI compressed sensing.

Four popular real-world HSI data sets were used in our experiments, including the HYDICE Urban, the HYDICE Washington DC Mall, the AVIRIS Moffett Field, and the AVIRIS Lowal altitude data sets. After removing the seriously polluted bands and cropping images for each data set, the HSI cube used for the experiments are of sizes , , and , respectively.

To thoroughly evaluate the performance of the proposed method, we implemented five state-of-the-art methods for comparison, including Kronecker compressed sensing (KCS) Li2012A , low-rank and joint-sparse recovery (SparCS) Golbabaee2012Hyperspectral , joint nuclear/TV norm minimization (JTTV) Golbabaee2013Joint , self-learning tensor nonlinear compressed sensing (SL-TNCS) Yang2015Compressive and Joint tensor/reweight 3DTV norm minimization (JTenRe3-DTV) Wang2017Compressive . As previously stated, we adopted the random permuted Hadamard transform as the compressive operator. Five different sampling ratios, namely , , , , were considered. To sufficiently assess the performance of all compared approaches, we also employed the above three quantitative picture quality indices (PQIs), PSNR, SSIM and ERGAS, for performance evaluation.

As for parameter settings, there are two parameters in the model that need to be set: the desired rank and the sparsity basis coefficient . Note that with the increase of the sampling rate, the ideal and would also increase. Specifically, we set the value of rank as 4 or 5, and the sparsity basis coefficient as 0.0075 at the sampling rate are , . With the increase of the sampling rate, the small value of rank and sparsity basis coefficient will cause the algorithm to lose some information, so as the sampling rates are , , , we increase the rank as a value from to , and set the sparsity basis coefficient as 0.015. As for the competing methods’ parameters, we use the default settings or tuned all their values to possibly guarantee their optimal performance for fair comparison.

Table. 3 compares the reconstruction results by all the compared methods. It can be easily seen that, the proposed method achieves a significantly improved performance under all such five sampling ratios with all PQIs, as compared with other competing methods.

In terms of visual quality, two representative bands of four HSIs in sampling ratio were presented in Fig. 19-26. Some demarcated areas in the subfigure are enlarged in the right bottom corner and left top corner for better visualization. From these figures and the quantitative indices numerical table, we can see that the method utilizing the local smoothness property of image, such as the JTTV, SL-NCS, JTenRe3DTV, can relatively more effectively eliminate the non-smoothness caused by compression reconstruction as the restoration image of SpaRCS show. Meanwhile, it is evident that the method based on the E-3DTV regularizer can preserve the best detail edge, texture information and image fidelity than other competing method.

We also show the PSNR and SSIM values by reconstructing each band of three data sets under sampling ratio in Fig. 27. It is easy to see that the proposed method can get higher SSIM and PSNR values than other ones for almost all HSI bands. This comprehensively illustrates the robustness of our proposed method.

From the above presentation, we can see that the structures of these four real HSIs are different, while our proposed method can get satisfactory performance. Combined the performance of denoising, it should be expected to use the E-3DTV regularizer as a general tool to ameliorate the performance for general HSI processing tasks.

## 9 Conclusion

In this paper, we have introduced an enhanced 3DTV regularization (E-3DTV, briefly), to better reflect the sparsity characteristic of the gradient maps of a natural HSI. It is surprising to us that by simply embedding such regularization term as the unique regularizer into the models designed for HSI denoising and compressed sensing, respectively, it attains state-of-the-art performance superior to the methods previously used for the tasks. This makes it hopeful further extend such a regularizer to more general HSI processing tasks, which is the main task of our future research.

There are still some limitations on the practical use of the proposed E-3DTV regularizer. For example, it should be necessary to consider to design an automatic parameter tuning strategy based on certain HSI as well as its size and structure complexity. Besides, the E-3DTV regularizer is still imposed on the unfolding matrices of the considered HSI, while not its multi-dimensional structure (i.e., tensor) itself. It should be another meaningful investigation on reformulating the proposed regularizer directly on the tensor to more sufficiently utilizing the structural knowledge of the HSI. We will further investigate these issues in our future research.

## References

• (1) Buades, A., Coll, B., & Morel, J. M. (2005). A non-local algorithm for image denoising.

Computer Vision and Pattern Recognition,c2

, 60-65.
• (2) Bouali, M., & Ladjal, S. (2011). Toward Optimal Destriping of MODIS Data Using a Unidirectional Variational Model. IEEE Transactions on Geoscience and Remote Sensing, 49(8), 259-268.
• (3) Chambolle, A. (2004). An Algorithm for Total Variation Minimization and Applications. Journal of Mathematical imaging and vision, 20(1-2), 89-97.
• (4) Candes, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Press, 489-509.
• (5) Cao, W. F., Wang, Y., Sun, J., & et al. (2016). Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements. IEEE Transactions on Image Processing, 25(9),4075-4090.
• (6)

Chen, G. Y., & Qian, S. E. (2011). Denoising of Hyperspectral Imagery Using Principal Component Analysis and Wavelet Shrinkage.

IEEE Transactions on Geoscience and Remote Sensing, 49(3), 973-980.
• (7) Candes, E. J., Li, X., Ma, Y., & et al. (2009). Robust principal component analysis? IEEE Journal of the Acm, 58(3).
• (8) Donoho, D. L. (2002). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613-627.
• (9) Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289-1306.
• (10) Dabov, K., Foi, A., Katkovnik, V., & et al. (2007). Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Transactions on Image Processing, 16(8), 2080-2095.
• (11) Dias, J. M. B., & Plaza, A. J. (2012). A new technique for hyperspectral compressive sensing using spectral unmixing. SPIE Optical Engineering + Applications 85140N.
• (12) Dong, W. S., Li, G. Y., Shi, G. M., & et al. (2015). Low-Rank Tensor Approximation with Laplacian Scale Mixture Modeling for Multiframe Image Denoising. IEEE Conference on Computer Vision and Pattern Recognition, 442-449.
• (13) Duarte, M. F., & Baraniuk, R. G. (2010). Kronecker product matrices for compressive sensing. IEEE International Conference on Acoustics Speech and Signal Processing, 3650-3653.
• (14) Dabov K., Foi A. , Katkovnik V., & et al. (2007). Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Transactions on Image Processing, 16(8), 2080-2095.
• (15) Mairal, J., Bach, F., Ponce, J., & et al. (2010). Non-local sparse models for image restoration. IEEE International Conference on Computer Vision, 2272-2279.
• (16) Dong, W., Fu, F., Shi, G., & et al. (2016). Hyperspectral Image Super-resolution via Non-negative Structured Sparse Representation. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, 25(5), 2337-2352.
• (17) Elad, M., & Aharon, M. (2006). Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. IEEE Transactions on Image Processing, 15(12), 3736-3745.
• (18) Gogna, A., Shukla, A., Agarwal H. K., & et al. (2015). Split Bregman algorithms for sparse / joint-sparse and low-rank signal recovery: Application in compressive hyperspectral imaging. IEEE International Conference on Image Processing, 1302-1306.
• (19) Golbabaee, M., & Vandergheynst, P. (2012). Hyperspectral image compressed sensing via low-rank and joint-sparse matrix recovery. IEEE International Conference on Acoustics, Speech and Signal Processing, 2741-2744,.
• (20) Golbabaee, M., & Vandergheynst, P. (2013). Joint trace/TV norm minimization: A new efficient approach for spectral compressive imaging. IEEE International Conference on Image Processing, 933-936.
• (21) Goetz, A. F. H., Ustin, S. L., & Schaepman, M. E. (2009). Three decades of hyperspectral remote sensing of the Earth: a personal view. Remote Sensing of Environment, 113(9), 5-16.
• (22) Gu, S. H., Zhang, L., Zuo, W. M., & et al. (2014). Weighted Nuclear Norm Minimization with Application to Image Denoising. IEEE Conference on Computer Vision and Pattern Recognition, 2862-2869.
• (23) He, W., Zhang, H. Y., Zhang, L. P., & et al. (2015). Hyperspectral Image Denoising via Noise-Adjusted Iterative Low-Rank Matrix Approximation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6), 3050-3061.
• (24) He, S., Zhou, H., Wang, Y., & et al. (2016). Super-resolution reconstruction of hyperspectral images via low rank tensor modeling and total variation regularization. International Geoscience and Remote Sensing Symposium, 6962-6965.
• (25) He, W., Zhang, H. Y., Zhang, L. P., & et al. (2016). Total-Variation-Regularized Low-Rank Matrix Factorization for Hyperspectral Image Restoration. IEEE Transactions on Geoscience and Remote Sensing, 54(1), 178-188.
• (26) Jiang, H., Deng, W.,& Shen, Z. W. (2017). Surveillance Video Processing Using Compressive Sensing. Inverse Problems and Imaging, 6(2), 201-214.
• (27) Jiang, C., Zhang, H. Y., Zhang, Li. P., & et al. (2016). Hyperspectral Image Denoising with a Combined Spatial and Spectral Weighted Hyperspectral Total Variation Model. Canadian Journal of Remote Sensing, 42(1), 53-72.
• (28) Kolda, T. G., & Bader, B. W. Tensor Decompositions and Applications. SIAM Review, 51(3), 455-500.
• (29) Lu, C., Feng, J., Chen, Y., & et al. (2016). Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization. IEEE Computer Vision and Pattern Recognition,5249-5257.
• (30) Liu, J., Tai, X. C., Huang, H., & et al. (2013). A weighted dictionary learning model for denoising images corrupted by mixed noise. IEEE Transactions on Image Processing, 22(3), 1108-1120.
• (31) Li, C., Sun, T., Kelly, K. F.,& et al. (2012). A compressive sensing and unmixing scheme for hyperspectral data processing. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, 21(3),1200-1210.
• (32) Martín, G., Bioucas-Dias, J. M., & Plaza, A. (2015). HYCA: A New Technique for Hyperspectral Compressive Sensing. IEEE Transactions on Geoscience and Remote Sensing, 53(5), 2819-2831.
• (33) Mairal, J., Bach, F., Ponce, J., & et al. (2009). Online dictionary learning for sparse coding,

International Conference on Machine Learning

, 689-696.
• (34) Maggioni, M., Katkovnik, V., Egiazarian, K., & et al. (2013). Nonlocal transform-domain filter for volumetric data denoising and reconstruction, IEEE Transactions on Image Processing, 22(1), 119-133.
• (35) Ng, M. K., Chan, R. H., & Tang, W. C. (1999). A Fast Algorithm for Deblurring Models with Neumann Boundary Conditions. Society for Industrial and Applied Mathematics, 21(3), 851-866.
• (36) Othman, H., & Qian, S. E. (2006). Noise reduction of hyperspectral imagery using hybrid spatial-spectral derivative-domain wavelet shrinkage. IEEE Transactions on Geoscience and Remote Sensing, 44(2), 397-408.
• (37) Peng, Y., Meng, D. Y., Xu, Z. B., & et al. (2014). Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising. IEEE Conference on Computer Vision and Pattern Recognition, 2949-2956.
• (38) Peng, Y., Suo, J., Dai, Q., & et al. (2014). Reweighted Low-Rank Matrix Recovery and its Application in Image Restoration. IEEE Transactions on Cybernetics, 44(12), 2418-2430.
• (39)

Qian, Y. T., & Ye, M. (2013). Hyperspectral Imagery Restoration Using Nonlocal Spectral-Spatial Structured Sparse Representation With Noise Estimation.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 6(2), 499-515.
• (40) Rudin, L. I., Osher, S. F., & Emad, H. F. (1992). Nonlinear total variation based noise removal algorithms. Eleventh International Conference of the Center for Nonlinear Studies on Experimental Mathematics : Computational Issues in Nonlinear Science: Computational Issues in Nonlinear Science, 259-268.
• (41) Wright, J., Ganesh, A., Rao, S., & et al. (2009). Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices. Advances in Neural Information Processing Systems, 87(4), 2080-2088.
• (42) Wu, Z. J., Wang, Q., Jin, J., & et al. (2017). Structure tensor total variation-regularized weighted nuclear norm minimization for hyperspectral image mixed denoising. Signal Processing, 131(1), 202-219.
• (43) Wang, Z., Bovik, A. C., Sheikh, H. R., & et al. (2004). Image quality assessment: from error visibility to structural similarity, IEEE Trans Image Process, 13(4), 600-612.
• (44) Wald, L. (2002). Data Fusion. Definitions and Architectures - Fusion of Images of Different Spatial Resolutions. Presses des MINES
• (45) Waters, A. E., Sankaranarayanan, A. C., & Baraniuk, R. G. (2011). SpaRCS: recovering low-rank and sparse matrices from compressive measurements, International Conference on Neural Information Processing Systems, 1089-1097.
• (46) Wang, Y., Lin, L., Zhao, Q., & et al. (2017). Compressive Sensing of Hyperspectral Images via Joint Tensor Tucker Decomposition and Weighted Total Variation Regularization. IEEE Geoscience and Remote Sensing Letters, 14(12), 2457-2461.
• (47) Wang, Y., Peng, J. J., Zhao, Q., & et al. (2018). Hyperspectral Image Restoration Via Total Variation Regularized Low-Rank Tensor Decomposition, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(4), 1227-1243.
• (48) Willett R. M., Duarte, M. F., Davenport M., &, et al. (2013). Sparsity and Structure in Hyperspectral Imaging : Sensing, Reconstruction, and Target Detection. IEEE Signal Processing Magazine, 31(1), 116-126.
• (49) Xing, Z., Zhou, M., Castrodad, A., & et al. (2011). Dictionary Learning for Noisy and Incomplete Hyperspectral Images. SIAM Journal on Imaging Sciences, 5(1), 33-56.
• (50) Xie, Y., Qu, Y. Y., Tao, D. C., & et al. (2016). Hyperspectral Image Restoration via Iteratively Regularized Weighted Schatten -Norm Minimization. IEEE Transactions on Geoscience and Remote Sensing, 54(8), 4642-4659.
• (51) Xie, Q., Zhao, Q., Meng, D. Y., & et al. (2016). Multispectral images denoising by intrinsic tensor sparsity regularization, IEEE Conference on Computer Vision and Pattern Recognition, 1692–1700.
• (52) Yang, S., Wang, M., Li, P., & et al. (2015). Compressive Hyperspectral Imaging via Sparse Tensor and Nonlinear Compressed Sensing. IEEE Transactions on Geoscience and Remote Sensing, 53(11), 5943-5957.
• (53) Zhang, L., Wei, W., Zhang, Y. N., & et al. (2015). Hyperspectral Compressive Sensing Using Manifold-Structured Sparsity Prior, IEEE International Conference on Computer Vision, 3550-3558.
• (54) Zhao, J., Zhong, Y., Shu, H., & et al. (2016). High-Resolution Image Classification Integrating Spectral-Spatial-Location Cues by Conditional Random Fields. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, 25(9), 4033-4045.
• (55) Zhang, H. Y., He, W., Zhang, L. P., & et al. (2014). Hyperspectral Image Restoration Using Low-Rank Matrix Recovery. IEEE Transactions on Geoscience and Remote Sensing, 52(8), 4729-4743.
• (56) Zhou, T. Y., & Tao, D. C. (2011). GoDec: Randomized Lowrank and Sparse Matrix Decomposition in Noisy Case. International Conference on Machine Learning, 33-40.

## Appendix

In this supplementary material, we provide the proof of the equivalence between problem (10) and problem (11) in the maintext.

### Proof of the equivalence

We call two problems equivalent if from a solution of one, a solution of the other is readily found, and vice versa. We can then give the following theorem:

###### Theorem 9.1

For any , the problem

 minV∈Rs×r∥GV∥1s.t. ∥GV∥F=∥G∥F,VTV=I, (40)

is equivalent to

 minU,V∥U∥1s.t. G=UVT, VTV=I,      U∈Rhw×r, V∈Rs×r. (41)
###### Proof

(a). We first prove that for any satisfies the constrains of (41), satisfies the constrains of (40).

By and , we can obtain that,

 ∥GV∥2F=∥UVTV∥2F=∥U∥2F=tr(UTU)=tr(U∗TUVTV)=tr(VUTUVT)=∥UVT∥2F=∥G∥2F. (42)

This implies that satisfies the constrains of (40).

(b). We then prove that for any satisfies the constrains of (40), by letting