Efficient Two-Dimensional Sparse Coding Using Tensor-Linear Combination

03/28/2017 ∙ by Fei Jiang, et al. ∙ 0

Sparse coding (SC) is an automatic feature extraction and selection technique that is widely used in unsupervised learning. However, conventional SC vectorizes the input images, which breaks apart the local proximity of pixels and destructs the elementary object structures of images. In this paper, we propose a novel two-dimensional sparse coding (2DSC) scheme that represents the input images as the tensor-linear combinations under a novel algebraic framework. 2DSC learns much more concise dictionaries because it uses the circular convolution operator, since the shifted versions of atoms learned by conventional SC are treated as the same ones. We apply 2DSC to natural images and demonstrate that 2DSC returns meaningful dictionaries for large patches. Moreover, for mutli-spectral images denoising, the proposed 2DSC reduces computational costs with competitive performance in comparison with the state-of-the-art algorithms.



There are no comments yet.


page 1

page 4

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sparse coding (SC) is a classical unsupervised feature extraction technique for finding concise representations of the data, which has been successfully applied to numerous areas across computer vision and pattern recognition

[1]. Conventional SC [9] aims to approximate vector-valued inputs by linear combinations of a few bases. Such bases correspond to patterns that physically represent elementary objects, and they compose a dictionary.

Conventional SC [9]

model suffers from the following two major problems: 1) the vectorization preprocess elementarily breaks apart the local proximity and destructs the object structures of images; and 2) the high computational complexity restricts its applications, thus small sizes of patches are usually used. Usually, the dictionary is overcomplete as the number of bases is larger than the dimension of the input image data. Therefore, the dictionary size is significantly large for high dimensional data, meaning that it requires prohibitive large number of computations for conventional SC.

However, existing approaches cannot solve the above two problems satisfactorily. Two kinds of SC models are proposed to preserve the spatial proximity of images, which are tensor sparse coding (TenSR) [2, 3] and convolutional sparse coding (CSC) [12, 13]. For TenSR models [2, 3], a series of separable dictionaries are adopted to approximate the structures in each mode of the input data. Though the sizes of the dictionaries are significantly reduced, the relationships among the modes are ignored. The object structures usually distribute across all modes of the data. For CSC models [12, 13], the dictionaries are used to capture local patterns, and the convolution operator is introduced to learn the shifting-invariant patterns. However, optimizing such models with the convolution operator are computational challenging. Moreover, each feature map (sparse representation) has nearly the same size as the input image, which is quite larger than conventional SC, and will increase the resources for storage and the computational complexity.

Figure 1: Shifting invariance involved in tensor-product. The shifted versions correspond to a dynamic flight in a counter-clockwise direction.

The basic idea that motivates us to address these challenges lies in two aspects: 1) tensor representation is able to preserve the local proximity and to capture the elementary object structures; and 2) we exploit the tensor-product operation under a novel algebraic framework where the tensor-linear combinations are used to approximate the images, instead of the tucker decomposition used in TenSR [2, 3] and convolution operation in CSC [12, 13]. For one aspect, tensor-product based on circular convolution operation, which can generate the data by shifted versions of bases. Fig. 1 shows the shifted versions generated from the tensor-product without storing them. For another aspect, the tensor-linear combination (see Definition 4) is a generalization of the standard linear combination. The number of required bases can be significantly reduced, which also reduces the computational complexity.

In this paper, we propose a novel sparse coding model, two-dimensional sparse coding (2DSC), in which the input images are represented as third-order tensors, and the tensor-linear combinations are used for data approximation. To solve the 2DSC problem, a novel alternating minimization algorithm is presented which includes a sparse coding learning step and a dictionary learning step. For sparse coding, we propose a new iterative shrinkage thresholding algorithm based on tensor-product, which is directly implemented in the tensor space. For dictionary learning, we show that it can be solved efficiently by transforming to a Lagrange dual problem in the frequency domain.

The rest of this paper is organized as follows: Section 3 introduces the notations and preliminary used in our paper. Section 4 presents the proposed 2DSC model, followed by the novelties of our model. In Section 5, an efficient alternating minimization algorithm for 2DSC is proposed. We demonstrate the effectiveness of our 2DSC model by simulation experiment and multi-spectral images denoising in Section 6. Finally, we conclude the paper in Section 7. To summarize, this paper makes the following contributions:

  • We propose a novel two-dimensional sparse coding model (2DSC) for image representation, which preserves the local proximity of pixels and elementary object structures of images. 2DSC is superior in dealing with high dimensional data.

  • We discuss the geometric properties of the dictionary returned by 2DSC, and show that there exists an equivalent sum space spanned by the corresponding vectorizations of the bases. Therefore, the dictionary of 2DSC has a stronger representation capability than that of the conventional SC.

  • We propose an efficient alternating minimization algorithm. For coefficient learning, a novel iterative shrinkage thresholding algorithm based on tensor-product is provided which exploits the optimization of mechanism in the tensor space. For dictionary learning, we convert it into a corresponding problem in the frequency domain and solve the dictionary by dual Lagrange which significantly reduces the number of variables to be optimized.

2 Related Work

In this section, we briefly review the related work on sparse coding models with consideration of spatial structures of the images, including sparse coding based on tensor representations [2, 3] and convolutional sparse coding [12, 13].

The first stream is the tensor-based sparse coding (TenSR) models [2, 3], which preserves the spatial structures of images by tensor representations. The data is represented based on tucker decomposition. Instead of using one dictionary as conventional SC, a series of separable dictionaries are adopted to model data, and each dictionary is corresponding to one dimension of the data. Therefore, the sizes of the dictionaries are significantly reduced due to the small size of each dimension compared with the sizes of the data. Though the sizes of dictionaries are significantly reduced, the relationships between the modes are ignored. The object structures are usually distributed across all dimensions of the data.

The second stream is the convolutional sparse coding (CSC) models [12, 13], which represents an image as the summation of convolutions of the feature maps and the corresponding filters. Convolution operator models the similarities of the proximal pixels, which further preserves the spatial structures of images. However, optimizing CSC models are computational challenging. Moreover, each feature map (sparse representation) has nearly the same size as the image, which is quite larger than conventional SC, and will increase the resources for storage and the computational complexity.

Figure 2: Sparse Coding Models. (i) an image, (ii) SC is based on linear combination, where bases can represent the space. (iii) TenSR [2] is based on tucker decomposition and (iv) CSC [13] is based on convolution operator, which are not easy to determine the number of bases in their spaces. (v) Our model is based on tensor-linear combination, where tensor bases can represent the tensor space.

3 Notation and Preliminary

A third-order tensor is denoted as . The expansion of along the third dimension is represented as . The transpose of tensor is denoted as , where , and ,

, and the superscript “T” represents the transpose of matrices. The discrete Fourier transform (DFT) along the third dimension of

is denoted as .

For convenience, tensor spaces , , and are denoted as , , and , respectively. denotes the set . The and Frobenius norms of tensors are denoted as , and . Furthermore, we need the following definitions.

Definition 1

[14] The tensor-product between and is a tensor where , and denotes the circular convolution operation.

Remark 1

The tensor-product can be efficiently computed in the frequency domain as:

Lemma 1

[14] The tensor-product has an equivalent matrix-product as:


where is the circular matrix of defined as follows:

Definition 2

[14] The tensor-linear combinations of the tensor bases with the corresponding tensor coefficient are defined as:


where with , and with .

Remark 2

The tensor-linear combination is a generalization of the linear combination.

Definition 3

The spanned tensor space by the tensor bases set of is defined as .

4 Problem Statement

4.1 Problem Formulation

Instead of preprocessing images into vectors, we represent images of size by , and propose a novel sparse coding model, named two-dimensional sparse coding (2DSC), as follows:


where is the tensor dictionary where each lateral slice is a basis, is the tensor coefficient. The parameter balances the approximation error and the sparsity of the tensor coefficients, and is the number of atoms. Conventional SC is a special case of (4.1) when .

4.2 Novelties of 2DSC

The proposed 2DSC is not a simply extension of conventional SC on the two-dimensional data, which has novel properties. The first one is the size of dictionary in 2DSC can be significantly reduced without damaging the reconstruction accuracy due to the tensor-linear combination. The second one is shifting invariant which means that the data can be generated from 2DSC model by the shifted versions of bases without explicitly storing them.

4.2.1 “Slim” Dictionary

Lemma 2

[14] Tensor space can be generated by tensors from an orthogonal set.

Fig. 2 shows four kinds of sparse coding models for an image representation. In SC [9], space is generated by bases based on linear combination, while in 2DSC, can be spanned with only bases. In TenSR [2] and CSC [13], it is not easy to determine the number of bases for the image space. Though CSC [13] is also based on convolution operator, the sparse representations are nearly the size as the input images, which are quite larger than ours. Lemma 2 means an data can be generated by only elements with the same size based on tensor linear combination. However, elements are required based on linear combinations. With much fewer atoms for data representation can significantly reduce the computational complexity, which shows the potential applications of 2DSC model in high dimensional data.

4.2.2 Shifting Invariance

Theorem 1

The tensor space spanned by defined in (3) is equivalent to a sum space of vector subspaces in .

As shown in (2), (4) is equivalent to in the vector space which is actually a sum space as following:


where is the vectorizations of , the bases in are circular shifted versions of those in . and are the coefficients to the corresponding dictionaries and . Fig. 1 explicitly shows the shifted versions of a base generated by tensor-product if the image of helicopter is seen as a basis. The space generated by the tensor linear combination can be transferred to a sum space of vector spaces generated by the linear combination, which includes the shifted versions of the original atoms. Fig. 3 explicitly shows the equivalent linear combination and the shifted versions of a atom generated from tensor product by its twisted form of size and a coefficient.

Figure 3: The leftmost tensor production is equivalent to the linear combination in middle one, which corresponds to the shifted versions in rightmost one.

5 Alternating Minimization Algorithm

Problem (4.1) is quite challenging due to the non-convex objective function and the convolution operator. Instead of transforming (4.1) into conventional SC formulation based on Lemma 1, we propose an efficient algorithm by alternately optimizing and in the tensor space, as shown in Algorithm 1.

0:   images: , the number of atoms: , sparsity regularizer: , maximum iterative steps: num,
1:  Initialization: Randomly initialize , , and Lagrange dual variables ,
2:  for  to num do
3:     //Tensor Sparse Coding Learning
4:     Solving via Equation (10) in Algorithm 2,
5:     //Tensor Dictionary Learning
6:     , ,
7:     for  to  do
8:        Solving (18) for by Newton’s method,
9:        Calculate from (17),
10:     end for
11:     ,
12:  end for
12:  , .
Algorithm 1 Algorithm for 2DSC

5.1 Learning Tensor Coefficient

For clarity, we discuss how to solve the tensor sparse representation for an image of size , which is represented as .

Given the dictionary , solving the tensor sparse representations of images are converted to the following problem as:


By Lemma 1, (7) can be solved by conventional sparse coding algorithms, which is equivalent to


where , , and . The size of the dictionary in (8) is significantly increased with the size of images, which also increase the computational complexity.

To alleviate this problem, we propose a novel Iterative Shrinkage Thresholding algorithm based on Tensor-product (ISTA-T) to solve (7) directly. We first rewrite (7) as:


where stands for the data reconstruction term and stands for the sparsity constraint term . An iterative shrinkage algorithm is used to solve (9

), which can be rewritten as a linearized function around the previous estimation

with the proximal regularization and the non-smooth regularization. Thus, at the - th iteration, can be updated by


where is a Lipschitz constant, and is the gradient defined in the tensor space. Then, (10) is equivalent to


To solve (11), we firstly show w.r.t. the data reconstruction term :


Secondly, we discuss how to determine the Lipschitz constant in (11). For every , , we have


where the superscript “” represents conjugate transpose.

Thus the Lipschitz constant of used in our algorithm is .

Lastly, (11) can be solved by the proximal operator , where is the soft-thresholding operator .

To speed up the convergence of the proposed ISTA-T, an extrapolation operator is adopted [22]. Algorithm 2 summarizes the proposed ISTA-T algorithm.

0:   images , dictionary , and the maximum iterations: num,
1:  Initialization: , Set , ,
2:  for  = 1 To num  do
3:     Set ,
4:     Compute via Equation (12),
5:     Compute via ,
6:     ,
7:     ,
8:  end for
8:  Sparse Coefficient .
Algorithm 2 Iterative Shrinkage Thresholding Algorithm based on Tensor-production
Operation Complexity in Detail Complexity
SR TenSR [2]
DL TenSR [2]
Table 1: Complexity Analysis and Memory Usage of Sparse Represenation (SR) and Dictionary Learning (DL) for TenSR [2] and Ours.

5.2 Tensor Dictionary learning

For learning the dictionary while fixed , the optimization problem is:


where atoms are coupled together due to the circular convolution operator. Therefore, we firstly decompose (5.2) into nearly-independent problems (that are coupled only through the norm constraint) by DFT as follows:


Then, we adopt the Lagrange dual [9] for solving (5.2) in frequency domain. The advantage of Lagrange dual is that the number of optimization variables is , which is much smaller than in the primal problem (5.2).

To use the Lagrange dual algorithm, firstly, we consider the Lagrangian of (5.2):


where , is a dual variable, and .

Secondly, minimizing over analytically, we obtain the optimal formulation of :


Substituting (17) into the Lagrangian , we obtain the Lagrange dual function :


which is solved by Newton’s method. Once getting the dual variables, the dictionary can be recovered by Equation (17).

5.3 Complexity Analysis

Table 1 shows the computational complexity of our proposed 2DSC and TenSR [2]. The definitions of and can be found in [2]. , and are the sizes of dictionaries in our model and TenSR [2]. The computational complexity of TenSR [2] is higher than ours. As shown in [2], for 21168 patches of size , TenSR takes 189 seconds for tensor coefficient learning, while in our model, it only takes about 23 seconds. The reason is due to the faster computation of tensor-product in frequency domain, which divides the original large size problems into much smaller ones.

6 Evaluation

6.1 Dictionaries for Large Patches

Figure 4: Dictionaries learned from SC and 2DSC. For patches, both SC and 2DSC can learn Gabor-like features, while for patches, only 2DSC can learn meaningful features.

We analyze the learned dictionaries from conventional SC and 2DSC with the same bases and other parameters settings. Patches of sizes () extracted from Natural Images data are normalized to have zero means. The sparsity regularizers is set to 0.1, and the number of bases are 64, which makes sure the dictionary for 2DSC is overcompleted but not for SC. Fig. 4 shows the learned dictionaries from 2DSC and conventional SC with different sizes of patches (). For small size of patches (), both models can learn the meaningful dictionaries (Gabor-like featues), such as edges, corners. For large size patches (), the performances of SC are significantly degenerated, even the bases are becoming sparsity. But 2DSC still can learn meaningful features.

Noisy Image 34.16 28.13 22.11 18.59 14.15 0.8177 0.5664 0.2998 0.1912 0.0983
BwK-SVD[19] 36.92 32.72 28.90 26.83 24.21 0.9107 0.8242 0.7136 0.6487 0.5617
3DK-SVD[19] 38.60 34.96 31.28 29.00 26.15 0.9388 0.8903 0.8238 0.7775 0.7261
LRTA [18] 42.98 39.33 33.20 32.99 30.23 0.9664 0.9424 0.9096 0.8622 0.7904
PARAFAC [17] 36.20 33.96 32.96 32.11 27.49 0.9230 0.9099 0.8555 0.7897 0.5636
TenSR [2] 41.34 36.55 32.24 31.25 30.67 0.9700 0.9283 0.8623 0.8032 0.7768
2DSC(Ours) 41.32 36.56 33.21 33.09 34.46 0.9766 0.9466 0.9274 0.8690 0.9248
Table 2: PSNR and SSIM results of the different methods for different noise levels on balloons.
Figure 5: Visual comparison of reconstruction results by different methods on “balloons” in dataset [20]. From left to right: original image at 420nm band, noisy image, BwK-SVD[19], PARAFAC [17], LRTA [18], TenSR [2], and Ours.

6.2 Multi-spectral Image Denoising

We apply 2DSC on multispectral images – Columbia MSI Database [20]. Each dataset contains 31 real-world images of size and is collected from to at

steps. We firstly to scale these images to [0, 255], and then add Gaussian white noise at different noise levels

. In our 2DSC model, we extract patches of size from each noisy multi-spectral image, and save each patch into a tensor of size . Dictionary of size are randomly initialized and trained iteratively ( iterations). Then we use the learned dictionaries to denoise the MSI images. Parameters in our scheme are , for , respectively.

Table 2 shows the comparison results in terms of peak signal-to-noise ratio (PSNR) and structure similarity (SSIM) [21] . There are 5 state-of-the-art MSI denoising methods are involved, including band-wise KSVD (BwK-SVD) method [19], 3D-cube KSVD (3DK-SVD) method [19], LRTA [18], PARAFAC [17], and TenSR [2]. As shown in Table 2, our 2DSC outperforms all the comparison algorithms for the evaluations by SSIM, which measures the structure consistency between the target image and the reference image. For PSNR, our 2DSC works much better on higher level noises, especially for and . For lower noise levels, LRTA [18] is the best one, ours and TenSR [2] are comparable.

We add CSC [13] in the visualization of the denoising performances. Due to the high memory required by CSC [13] for the high resolution images, we resize the spectral images of balloons into , and only considering the first five bands. Fig. 5 shows the denoising results with the noise level . It is easy to observe that our method achieves the best denoising results. Note that for CSC [13], we denoise each images separately which does not consider the information along the brands of spectral images. Filters of size are used, and the sparsity parameters are adjusted between [0.1, 10]. We guess the reason for the denoising results of CSC [13] may come from the small size of images. Due to the powerful representations of convolution operation, filters learn much noise information.

7 Conclusion

In this paper, we propose a novel tensor based sparse coding algorithm, which can learn an efficient tensor representations of images by much smaller size of dictionary compared with conventional SC. Moreover, an much more efficient algorithm for the tensor sparse coefficients learning in tensor space is proposed. The effectiveness of our model has been demonstrated by dictionary learning for large sizes of patches. A following up work [23] applies the 2DSC scheme to image clustering by incorporating a graph regularizer.


  • [1] Wright, John and Ma, Yi and Mairal, Julien and Sapiro, Guillermo and Huang, Thomas S and Yan, Shuicheng, Sparse representation for computer vision and pattern recognition, Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.
  • [2] Qi, Na and Shi, Yunhui and Sun, Xiaoyan and Yin, Baocai, TenSR: Multi-Dimensional Tensor Sparse Representation, IEEE CVPR, pp. 5916-2925, 2016.
  • [3] Qi, Na and Shi, Yunhui and Sun, Xiaoyan and Wang, Jingdong and Yin, Baocai, Two dimensional synthesis sparse model, IEEE ICME, pp. 1-6, 2013.
  • [4] Kernfeld, Eric and Aeron, Shuchin and Kilmer, Misha, Clustering multi-way data: a novel algebraic approach, arXiv preprint arXiv:1412.7056, 2014.
  • [5]

    Kernfeld, Eric and Kilmer, Misha and Aeron, Shuchin, Tensor–tensor products with invertible linear transforms, Linear Algebra and its Applications, vol. 485, pp. 545-570, 2015.

  • [6]

    Qiu, Qiang and Chellappa, Rama, Compositional dictionaries for domain adaptive face recognition, IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5152-5165, 2015.

  • [7] Wong, Wai Keung and Lai, Zhihui and Xu, Yong and Wen, Jiajun and Ho, Chu Po, Joint tensor feature analysis for visual object recognition, IEEE transactions on cybernetics, vol. 45, no. 11, pp. 2425-2436, 2015.
  • [8] Zhao, Qibin and Zhou, Guoxu and Zhang, Liqing and Cichocki, Andrzej, Tensor-variate gaussian processes regression and its application to video surveillance, IEEE ICASSP, pp. 1265-1269, 2014.
  • [9] Lee, Honglak and Battle, Alexis and Raina, Rajat and Ng, Andrew Y, Efficient sparse coding algorithms, NIPS, pp. 801-808, 2006.
  • [10] Ayyala, Deepak Nag, Least Angle Regression, LARS, vol. 2, no. 23, 2008.
  • [11] Donoho, David L and Tsaig, Yaakov and Drori, Iddo and Starck, Jean-Luc. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit, IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 1094-1121, 2012.
  • [12] Bristow, Hilton and Eriksson, Anders and Lucey, Simon, Fast convolutional sparse coding, IEEE CVPR, pp. 391-398, 2013.
  • [13] Heide, Felix and Heidrich, Wolfgang and Wetzstein, Gordon, Fast and flexible convolutional sparse coding, IEEE CVPR, pp. 5135-5143, 2015.
  • [14] Kilmer, Misha E and Braman, Karen and Hao, Ning and Hoover, Randy C, Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging, SIAM Journal on Matrix Analysis and Applications, vol. 34, no. 1, pp. 148-172, 2013.
  • [15] Peng, Yi and Meng, Deyu and Xu, Zongben and Gao, Chenqiang and Yang, Yi and Zhang, Biao, Decomposable nonlocal tensor dictionary learning for multispectral image denoising, IEEE CVPR, pp. 2949–2956, 2014.
  • [16] Maggioni, Matteo and Katkovnik, Vladimir and Egiazarian, Karen and Foi Alessandro, Nonlocal transform-domain filter for volumetric data denoising and reconstruction, IEEE transactions on image processing, vol. 22, no. 1, pp. 119-133, 2013.
  • [17] Liu, Xuefeng and Bourennane, Salah and Fossati, Caroline, Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis, IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 10, pp. 3717-3724, 2012.
  • [18] Renard, Nadine and Bourennane, Salah and Blanc-Talon, Jacques, Denoising and dimensionality reduction using multilinear tools for hyperspectral images, IEEE Geoscience and Remote Sensing Letters, vol. 5, no. 2, pp. 138-142, 2008.
  • [19] Elad, Michael and Aharon, Michal, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736-3745, 2006.
  • [20] Yasuma, Fumihito and Mitsunaga, Tomoo and Iso, Daisuke and Nayar, Shree K, Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum, IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2241-2253, 2010.
  • [21] Wang, Zhou and Bovik, Alan C and Sheikh, Hamid R and Simoncelli, Eero P, Image quality assessment: from error visibility to structural similarity, IEEE transactions on image processing, vol. 13, no. 4, pp. 600-612, 2004.
  • [22] Xu, Yangyang and Yin, Wotao, A fast patch-dictionary method for whole image recovery, arXiv:1408.3740, 2014.
  • [23] Jiang, Fei and Liu, Xiao-Yang and Lu, Hongtao and Shen, Ruimin, Graph regularized tensor sparse coding for image representation, IEEE ICME, 2017.