1 Introduction
Sparse coding (SC) is a classical unsupervised feature extraction technique for finding concise representations of the data, which has been successfully applied to numerous areas across computer vision and pattern recognition
[1]. Conventional SC [9] aims to approximate vectorvalued inputs by linear combinations of a few bases. Such bases correspond to patterns that physically represent elementary objects, and they compose a dictionary.Conventional SC [9]
model suffers from the following two major problems: 1) the vectorization preprocess elementarily breaks apart the local proximity and destructs the object structures of images; and 2) the high computational complexity restricts its applications, thus small sizes of patches are usually used. Usually, the dictionary is overcomplete as the number of bases is larger than the dimension of the input image data. Therefore, the dictionary size is significantly large for high dimensional data, meaning that it requires prohibitive large number of computations for conventional SC.
However, existing approaches cannot solve the above two problems satisfactorily. Two kinds of SC models are proposed to preserve the spatial proximity of images, which are tensor sparse coding (TenSR) [2, 3] and convolutional sparse coding (CSC) [12, 13]. For TenSR models [2, 3], a series of separable dictionaries are adopted to approximate the structures in each mode of the input data. Though the sizes of the dictionaries are significantly reduced, the relationships among the modes are ignored. The object structures usually distribute across all modes of the data. For CSC models [12, 13], the dictionaries are used to capture local patterns, and the convolution operator is introduced to learn the shiftinginvariant patterns. However, optimizing such models with the convolution operator are computational challenging. Moreover, each feature map (sparse representation) has nearly the same size as the input image, which is quite larger than conventional SC, and will increase the resources for storage and the computational complexity.
The basic idea that motivates us to address these challenges lies in two aspects: 1) tensor representation is able to preserve the local proximity and to capture the elementary object structures; and 2) we exploit the tensorproduct operation under a novel algebraic framework where the tensorlinear combinations are used to approximate the images, instead of the tucker decomposition used in TenSR [2, 3] and convolution operation in CSC [12, 13]. For one aspect, tensorproduct based on circular convolution operation, which can generate the data by shifted versions of bases. Fig. 1 shows the shifted versions generated from the tensorproduct without storing them. For another aspect, the tensorlinear combination (see Definition 4) is a generalization of the standard linear combination. The number of required bases can be significantly reduced, which also reduces the computational complexity.
In this paper, we propose a novel sparse coding model, twodimensional sparse coding (2DSC), in which the input images are represented as thirdorder tensors, and the tensorlinear combinations are used for data approximation. To solve the 2DSC problem, a novel alternating minimization algorithm is presented which includes a sparse coding learning step and a dictionary learning step. For sparse coding, we propose a new iterative shrinkage thresholding algorithm based on tensorproduct, which is directly implemented in the tensor space. For dictionary learning, we show that it can be solved efficiently by transforming to a Lagrange dual problem in the frequency domain.
The rest of this paper is organized as follows: Section 3 introduces the notations and preliminary used in our paper. Section 4 presents the proposed 2DSC model, followed by the novelties of our model. In Section 5, an efficient alternating minimization algorithm for 2DSC is proposed. We demonstrate the effectiveness of our 2DSC model by simulation experiment and multispectral images denoising in Section 6. Finally, we conclude the paper in Section 7. To summarize, this paper makes the following contributions:

We propose a novel twodimensional sparse coding model (2DSC) for image representation, which preserves the local proximity of pixels and elementary object structures of images. 2DSC is superior in dealing with high dimensional data.

We discuss the geometric properties of the dictionary returned by 2DSC, and show that there exists an equivalent sum space spanned by the corresponding vectorizations of the bases. Therefore, the dictionary of 2DSC has a stronger representation capability than that of the conventional SC.

We propose an efficient alternating minimization algorithm. For coefficient learning, a novel iterative shrinkage thresholding algorithm based on tensorproduct is provided which exploits the optimization of mechanism in the tensor space. For dictionary learning, we convert it into a corresponding problem in the frequency domain and solve the dictionary by dual Lagrange which significantly reduces the number of variables to be optimized.
2 Related Work
In this section, we briefly review the related work on sparse coding models with consideration of spatial structures of the images, including sparse coding based on tensor representations [2, 3] and convolutional sparse coding [12, 13].
The first stream is the tensorbased sparse coding (TenSR) models [2, 3], which preserves the spatial structures of images by tensor representations. The data is represented based on tucker decomposition. Instead of using one dictionary as conventional SC, a series of separable dictionaries are adopted to model data, and each dictionary is corresponding to one dimension of the data. Therefore, the sizes of the dictionaries are significantly reduced due to the small size of each dimension compared with the sizes of the data. Though the sizes of dictionaries are significantly reduced, the relationships between the modes are ignored. The object structures are usually distributed across all dimensions of the data.
The second stream is the convolutional sparse coding (CSC) models [12, 13], which represents an image as the summation of convolutions of the feature maps and the corresponding filters. Convolution operator models the similarities of the proximal pixels, which further preserves the spatial structures of images. However, optimizing CSC models are computational challenging. Moreover, each feature map (sparse representation) has nearly the same size as the image, which is quite larger than conventional SC, and will increase the resources for storage and the computational complexity.
3 Notation and Preliminary
A thirdorder tensor is denoted as . The expansion of along the third dimension is represented as . The transpose of tensor is denoted as , where , and ,
, and the superscript “T” represents the transpose of matrices. The discrete Fourier transform (DFT) along the third dimension of
is denoted as .For convenience, tensor spaces , , and are denoted as , , and , respectively. denotes the set . The and Frobenius norms of tensors are denoted as , and . Furthermore, we need the following definitions.
Definition 1
[14] The tensorproduct between and is a tensor where , and denotes the circular convolution operation.
Remark 1
The tensorproduct can be efficiently computed in the frequency domain as:
(1) 
Lemma 1
[14] The tensorproduct has an equivalent matrixproduct as:
(2) 
where is the circular matrix of defined as follows:
(3) 
Definition 2
[14] The tensorlinear combinations of the tensor bases with the corresponding tensor coefficient are defined as:
(4) 
where with , and with .
Remark 2
The tensorlinear combination is a generalization of the linear combination.
Definition 3
The spanned tensor space by the tensor bases set of is defined as .
4 Problem Statement
4.1 Problem Formulation
Instead of preprocessing images into vectors, we represent images of size by , and propose a novel sparse coding model, named twodimensional sparse coding (2DSC), as follows:
(5) 
where is the tensor dictionary where each lateral slice is a basis, is the tensor coefficient. The parameter balances the approximation error and the sparsity of the tensor coefficients, and is the number of atoms. Conventional SC is a special case of (4.1) when .
4.2 Novelties of 2DSC
The proposed 2DSC is not a simply extension of conventional SC on the twodimensional data, which has novel properties. The first one is the size of dictionary in 2DSC can be significantly reduced without damaging the reconstruction accuracy due to the tensorlinear combination. The second one is shifting invariant which means that the data can be generated from 2DSC model by the shifted versions of bases without explicitly storing them.
4.2.1 “Slim” Dictionary
Lemma 2
[14] Tensor space can be generated by tensors from an orthogonal set.
Fig. 2 shows four kinds of sparse coding models for an image representation. In SC [9], space is generated by bases based on linear combination, while in 2DSC, can be spanned with only bases. In TenSR [2] and CSC [13], it is not easy to determine the number of bases for the image space. Though CSC [13] is also based on convolution operator, the sparse representations are nearly the size as the input images, which are quite larger than ours. Lemma 2 means an data can be generated by only elements with the same size based on tensor linear combination. However, elements are required based on linear combinations. With much fewer atoms for data representation can significantly reduce the computational complexity, which shows the potential applications of 2DSC model in high dimensional data.
4.2.2 Shifting Invariance
Theorem 1
The tensor space spanned by defined in (3) is equivalent to a sum space of vector subspaces in .
As shown in (2), (4) is equivalent to in the vector space which is actually a sum space as following:
(6) 
where is the vectorizations of , the bases in are circular shifted versions of those in . and are the coefficients to the corresponding dictionaries and . Fig. 1 explicitly shows the shifted versions of a base generated by tensorproduct if the image of helicopter is seen as a basis. The space generated by the tensor linear combination can be transferred to a sum space of vector spaces generated by the linear combination, which includes the shifted versions of the original atoms. Fig. 3 explicitly shows the equivalent linear combination and the shifted versions of a atom generated from tensor product by its twisted form of size and a coefficient.
5 Alternating Minimization Algorithm
Problem (4.1) is quite challenging due to the nonconvex objective function and the convolution operator. Instead of transforming (4.1) into conventional SC formulation based on Lemma 1, we propose an efficient algorithm by alternately optimizing and in the tensor space, as shown in Algorithm 1.
5.1 Learning Tensor Coefficient
For clarity, we discuss how to solve the tensor sparse representation for an image of size , which is represented as .
Given the dictionary , solving the tensor sparse representations of images are converted to the following problem as:
(7) 
By Lemma 1, (7) can be solved by conventional sparse coding algorithms, which is equivalent to
(8) 
where , , and . The size of the dictionary in (8) is significantly increased with the size of images, which also increase the computational complexity.
To alleviate this problem, we propose a novel Iterative Shrinkage Thresholding algorithm based on Tensorproduct (ISTAT) to solve (7) directly. We first rewrite (7) as:
(9) 
where stands for the data reconstruction term and stands for the sparsity constraint term . An iterative shrinkage algorithm is used to solve (9
), which can be rewritten as a linearized function around the previous estimation
with the proximal regularization and the nonsmooth regularization. Thus, at the  th iteration, can be updated by(10)  
where is a Lipschitz constant, and is the gradient defined in the tensor space. Then, (10) is equivalent to
(11) 
To solve (11), we firstly show w.r.t. the data reconstruction term :
(12) 
Secondly, we discuss how to determine the Lipschitz constant in (11). For every , , we have
(13)  
where the superscript “” represents conjugate transpose.
Thus the Lipschitz constant of used in our algorithm is .
Lastly, (11) can be solved by the proximal operator , where is the softthresholding operator .
5.2 Tensor Dictionary learning
For learning the dictionary while fixed , the optimization problem is:
(14) 
where atoms are coupled together due to the circular convolution operator. Therefore, we firstly decompose (5.2) into nearlyindependent problems (that are coupled only through the norm constraint) by DFT as follows:
(15) 
Then, we adopt the Lagrange dual [9] for solving (5.2) in frequency domain. The advantage of Lagrange dual is that the number of optimization variables is , which is much smaller than in the primal problem (5.2).
To use the Lagrange dual algorithm, firstly, we consider the Lagrangian of (5.2):
(16)  
where , is a dual variable, and .
5.3 Complexity Analysis
Table 1 shows the computational complexity of our proposed 2DSC and TenSR [2]. The definitions of and can be found in [2]. , and are the sizes of dictionaries in our model and TenSR [2]. The computational complexity of TenSR [2] is higher than ours. As shown in [2], for 21168 patches of size , TenSR takes 189 seconds for tensor coefficient learning, while in our model, it only takes about 23 seconds. The reason is due to the faster computation of tensorproduct in frequency domain, which divides the original large size problems into much smaller ones.
6 Evaluation
6.1 Dictionaries for Large Patches
We analyze the learned dictionaries from conventional SC and 2DSC with the same bases and other parameters settings. Patches of sizes () extracted from Natural Images data are normalized to have zero means. The sparsity regularizers is set to 0.1, and the number of bases are 64, which makes sure the dictionary for 2DSC is overcompleted but not for SC. Fig. 4 shows the learned dictionaries from 2DSC and conventional SC with different sizes of patches (). For small size of patches (), both models can learn the meaningful dictionaries (Gaborlike featues), such as edges, corners. For large size patches (), the performances of SC are significantly degenerated, even the bases are becoming sparsity. But 2DSC still can learn meaningful features.
Method  PSNR  SSIM  

Noisy Image  34.16  28.13  22.11  18.59  14.15  0.8177  0.5664  0.2998  0.1912  0.0983 
BwKSVD[19]  36.92  32.72  28.90  26.83  24.21  0.9107  0.8242  0.7136  0.6487  0.5617 
3DKSVD[19]  38.60  34.96  31.28  29.00  26.15  0.9388  0.8903  0.8238  0.7775  0.7261 
LRTA [18]  42.98  39.33  33.20  32.99  30.23  0.9664  0.9424  0.9096  0.8622  0.7904 
PARAFAC [17]  36.20  33.96  32.96  32.11  27.49  0.9230  0.9099  0.8555  0.7897  0.5636 
TenSR [2]  41.34  36.55  32.24  31.25  30.67  0.9700  0.9283  0.8623  0.8032  0.7768 
2DSC(Ours)  41.32  36.56  33.21  33.09  34.46  0.9766  0.9466  0.9274  0.8690  0.9248 
6.2 Multispectral Image Denoising
We apply 2DSC on multispectral images – Columbia MSI Database [20]. Each dataset contains 31 realworld images of size and is collected from to at
steps. We firstly to scale these images to [0, 255], and then add Gaussian white noise at different noise levels
. In our 2DSC model, we extract patches of size from each noisy multispectral image, and save each patch into a tensor of size . Dictionary of size are randomly initialized and trained iteratively ( iterations). Then we use the learned dictionaries to denoise the MSI images. Parameters in our scheme are , for , respectively.Table 2 shows the comparison results in terms of peak signaltonoise ratio (PSNR) and structure similarity (SSIM) [21] . There are 5 stateoftheart MSI denoising methods are involved, including bandwise KSVD (BwKSVD) method [19], 3Dcube KSVD (3DKSVD) method [19], LRTA [18], PARAFAC [17], and TenSR [2]. As shown in Table 2, our 2DSC outperforms all the comparison algorithms for the evaluations by SSIM, which measures the structure consistency between the target image and the reference image. For PSNR, our 2DSC works much better on higher level noises, especially for and . For lower noise levels, LRTA [18] is the best one, ours and TenSR [2] are comparable.
We add CSC [13] in the visualization of the denoising performances. Due to the high memory required by CSC [13] for the high resolution images, we resize the spectral images of balloons into , and only considering the first five bands. Fig. 5 shows the denoising results with the noise level . It is easy to observe that our method achieves the best denoising results. Note that for CSC [13], we denoise each images separately which does not consider the information along the brands of spectral images. Filters of size are used, and the sparsity parameters are adjusted between [0.1, 10]. We guess the reason for the denoising results of CSC [13] may come from the small size of images. Due to the powerful representations of convolution operation, filters learn much noise information.
7 Conclusion
In this paper, we propose a novel tensor based sparse coding algorithm, which can learn an efficient tensor representations of images by much smaller size of dictionary compared with conventional SC. Moreover, an much more efficient algorithm for the tensor sparse coefficients learning in tensor space is proposed. The effectiveness of our model has been demonstrated by dictionary learning for large sizes of patches. A following up work [23] applies the 2DSC scheme to image clustering by incorporating a graph regularizer.
References
 [1] Wright, John and Ma, Yi and Mairal, Julien and Sapiro, Guillermo and Huang, Thomas S and Yan, Shuicheng, Sparse representation for computer vision and pattern recognition, Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.
 [2] Qi, Na and Shi, Yunhui and Sun, Xiaoyan and Yin, Baocai, TenSR: MultiDimensional Tensor Sparse Representation, IEEE CVPR, pp. 59162925, 2016.
 [3] Qi, Na and Shi, Yunhui and Sun, Xiaoyan and Wang, Jingdong and Yin, Baocai, Two dimensional synthesis sparse model, IEEE ICME, pp. 16, 2013.
 [4] Kernfeld, Eric and Aeron, Shuchin and Kilmer, Misha, Clustering multiway data: a novel algebraic approach, arXiv preprint arXiv:1412.7056, 2014.

[5]
Kernfeld, Eric and Kilmer, Misha and Aeron, Shuchin, Tensor–tensor products with invertible linear transforms, Linear Algebra and its Applications, vol. 485, pp. 545570, 2015.

[6]
Qiu, Qiang and Chellappa, Rama, Compositional dictionaries for domain adaptive face recognition, IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 51525165, 2015.
 [7] Wong, Wai Keung and Lai, Zhihui and Xu, Yong and Wen, Jiajun and Ho, Chu Po, Joint tensor feature analysis for visual object recognition, IEEE transactions on cybernetics, vol. 45, no. 11, pp. 24252436, 2015.
 [8] Zhao, Qibin and Zhou, Guoxu and Zhang, Liqing and Cichocki, Andrzej, Tensorvariate gaussian processes regression and its application to video surveillance, IEEE ICASSP, pp. 12651269, 2014.
 [9] Lee, Honglak and Battle, Alexis and Raina, Rajat and Ng, Andrew Y, Efficient sparse coding algorithms, NIPS, pp. 801808, 2006.
 [10] Ayyala, Deepak Nag, Least Angle Regression, LARS, vol. 2, no. 23, 2008.
 [11] Donoho, David L and Tsaig, Yaakov and Drori, Iddo and Starck, JeanLuc. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit, IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 10941121, 2012.
 [12] Bristow, Hilton and Eriksson, Anders and Lucey, Simon, Fast convolutional sparse coding, IEEE CVPR, pp. 391398, 2013.
 [13] Heide, Felix and Heidrich, Wolfgang and Wetzstein, Gordon, Fast and flexible convolutional sparse coding, IEEE CVPR, pp. 51355143, 2015.
 [14] Kilmer, Misha E and Braman, Karen and Hao, Ning and Hoover, Randy C, Thirdorder tensors as operators on matrices: A theoretical and computational framework with applications in imaging, SIAM Journal on Matrix Analysis and Applications, vol. 34, no. 1, pp. 148172, 2013.
 [15] Peng, Yi and Meng, Deyu and Xu, Zongben and Gao, Chenqiang and Yang, Yi and Zhang, Biao, Decomposable nonlocal tensor dictionary learning for multispectral image denoising, IEEE CVPR, pp. 2949–2956, 2014.
 [16] Maggioni, Matteo and Katkovnik, Vladimir and Egiazarian, Karen and Foi Alessandro, Nonlocal transformdomain filter for volumetric data denoising and reconstruction, IEEE transactions on image processing, vol. 22, no. 1, pp. 119133, 2013.
 [17] Liu, Xuefeng and Bourennane, Salah and Fossati, Caroline, Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis, IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 10, pp. 37173724, 2012.
 [18] Renard, Nadine and Bourennane, Salah and BlancTalon, Jacques, Denoising and dimensionality reduction using multilinear tools for hyperspectral images, IEEE Geoscience and Remote Sensing Letters, vol. 5, no. 2, pp. 138142, 2008.
 [19] Elad, Michael and Aharon, Michal, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Transactions on Image processing, vol. 15, no. 12, pp. 37363745, 2006.
 [20] Yasuma, Fumihito and Mitsunaga, Tomoo and Iso, Daisuke and Nayar, Shree K, Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum, IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 22412253, 2010.
 [21] Wang, Zhou and Bovik, Alan C and Sheikh, Hamid R and Simoncelli, Eero P, Image quality assessment: from error visibility to structural similarity, IEEE transactions on image processing, vol. 13, no. 4, pp. 600612, 2004.
 [22] Xu, Yangyang and Yin, Wotao, A fast patchdictionary method for whole image recovery, arXiv:1408.3740, 2014.
 [23] Jiang, Fei and Liu, XiaoYang and Lu, Hongtao and Shen, Ruimin, Graph regularized tensor sparse coding for image representation, IEEE ICME, 2017.
Comments
There are no comments yet.