I Introduction
one of the most important procedures in HSI is image classification, where the pixels are labeled to one of the classes based on their spectral characteristics. Due to the numerous demands in mineralogy, agriculture and surveillance, the HSI classification task is developing very rapidly and a large number of techniques have been proposed to tackle this problem [2]. Comparing with previous approaches, SVM is found highly effective on both computational efficiency and classification results. A wide variety of SVM’s modifications have been proposed to improve its performance. Some of them incorporate the contextual information in the classifiers [3, 4]. Others design sparse SVM in order to pursue a sparse decision rule by using norm as the regularizer [5].
Recently, SRC has been proposed to solve many computer vision tasks
[6, 7], where the use of sparsity as a prior often leads to stateoftheart performance. SRC has also been applied to HSI classification [8], relying on the observation that hyperspectral pixels belonging to the same class approximately lie in the same lowdimensional subspace. In order to alleviate the problem introduced by the lack of sufficient training data, Haq et al. [9] proposed the homotopybased SRC. Another way to solve the problem of insufficient training data is to employ the contextual information of neighboring pixels in the classifier, such as spectralspatial constraint classification [10].In SRC, a test sample , where is the number of spectral bands, can be written as a sparse linear combination of all the training pixels (atoms in a dictionary) as
(1) 
where , is norm. is a structured dictionary formed from concatenation of several classwise subdictionaries, are the columns of and is the total number of training samples from all the classes, and is a scalar regularization parameter.
The class label for the test pixel is determined by the minimum residual between and its approximation from each classwise subdictionary:
(2) 
where is the group or class index, and is the indicator operation zeroing out all elements of that do not belong to the class .
In the case of HSI, SRC always suffers from the nonuniqueness or instability of the sparse coefficients due to the high mutual coherency of the dictionary [11]. Fortunately, a better reconstructed signal and a more robust representation can be obtained by either exploring the dependencies of neighboring pixels or exploiting the inherent dictionary structure. Recently, structured priors have been incorporated into HSI classification [8], which can be sorted into three categories. (a) Priors that only exploit the correlations and dependencies among the neighboring spectral pixels or their sparse coefficient vectors, which includes joint sparsity [13], graph regularized Lasso (referred as the Laplacian regularized Lasso) [14] and the lowrank Lasso [15]. (b) Priors that only exploit the inherent structure of the dictionary, such as group Lasso [16]. (c) Priors that enforce structural information on both sparse coefficients and dictionary, such as collaborative group Lasso [17] and collaborative hierarchical Lasso (CHiLasso) [18]
. Besides SRC, structured sparsity prior can also be incorporated into other classifiers such as the logistic regression classifiers
[19].The main contributions of this paper are (a) to assess the SRC performance using various structured sparsity priors for HSI classification, and (b) to propose a conceptually similar prior to CHiLasso, which is called the lowrank group prior. This prior is based on the assumption that pure or mixed pixels from the same classes are highly correlated and can be represented by a combination of sparse lowrank groups (classes). The proposed prior takes advantage of both the group sparsity prior, which enforces sparsity across the groups, and the low rank prior, which encourages sparsity within the groups, by only using one regularizer.
In the following sections, we investigate the roles of different structured priors imposed on the SRC optimization algorithm. Starting with the classical sparsity norm prior, we then introduce several different priors with experimental results. The structured priors discussed are joint sparsity, Laplacian sparsity, group sparsity, sparse group sparsity, lowrank and lowrank group prior.
Ii HSI Classification Via Different Structured Sparse Priors
Iia Joint Sparsity Prior
In HSI, pixels within a small neighborhood usually consist of similar materials. Thus, their spectral characteristics are highly correlated. The spatial correlation between neighboring pixels can be indirectly incorporated through a joint sparsity model (JSM) [12] by assuming that the underlying sparse vectors associated with these pixels share a common sparsity support. Consider pixels in a small neighborhood of pixels. Let represent a matrix whose columns correspond to pixels in a spatial neighborhood in a hyperspectral image. Columns of can be represented as a linear combination of dictionary atoms , where represents a sparse matrix. In JSM, the sparse vectors of neighboring pixels, which are represented by the columns of , share the same support. Therefore, is a sparse matrix with only few nonzero rows. The rowsparse matrix X can be recovered by solving the following Lasso problem
(3) 
where is an norm and represents the th row of .
The label for the center pixel is then determined by the minimum total residual error
(4) 
where is the indicator operation zeroing out all the elements of that do not belong to the class .
IiB Laplacian Sparsity Prior
In sparse representation, due to the high coherency of the dictionary atoms, the recovered sparse coefficient vectors for multiple neighboring pixels could be partially different even when the neighboring pixels are highly correlated, and this may led to misclassification. As mentioned in the previous section, joint sparsity is able to solve such a problem by enforcing multiple pixels to select exactly the same atoms. However, in many cases, when the neighboring pixels fall on the boundary between several homogeneous regions, the neighboring pixels will belong to several distinct classes (groups) and should use different sets of subdictionary atoms. Laplacian sparsity enhances the differences between sparse coefficient vectors of the neighboring pixels that belong to different clusters. We introduce the weighting matrix , where characterizes the similarity between a pair of pixels and within a neighborhood. Optimization with an additional Laplacian sparsity prior can be expressed as
(5) 
where and are the regularization parameters. The matrix is used to characterize the similarity among neighboring pixels in the spectra space. Similar pixels will possess larger weights, and therefore, enforcing the differences between the corresponding sparse coefficient vectors to become smaller, and similarly allowing the difference between sparse coefficient vectors of dissimilar pixels to become larger. Therefore, the Laplacian sparsity prior is more flexible than the joint sparsity prior in that it does not always force all the neighboring pixels to have the same common support. In this paper, the weighting matrix is computed using the sparse subspace clustering method in [20]. Note that this grouping constraint is enforced on the testing pixels instead of the dictionary atoms, which is different from group sparsity. Let be the normalized symmetric Laplacian matrix [20], where is the degree matrix computed from . We can rewrite the equation (5) as
(6) 
The above equation can be solved by a modified featuresign search algorithm [14].
IiC Group Sparsity Prior
The SRC dictionary has an inherent groupstructured property since it is composed of several class subdictionaries, i.e., the atoms belonging to the same class are grouped together to form a subdictionary. In sparse representation, we classify pixels by measuring how well the pixels are represented by each subdictionary. Therefore, it would be reasonable to enforce the pixels to be represented by groups of atoms instead of individual ones. This could be accomplished by encouraging coefficients of only certain groups to be active and the remaining groups inactive. Group Lasso [16], for example, uses a sparsity prior that sums up the Euclidean norm of every group coefficients. It will dominate the classification performance especially when the input pixels are inherently mixed pixels. Group Lasso optimization can be represented as
(7) 
where , represents the group sparse prior defined in terms of groups, is the weight and is usually set to the square root of the cardinality of the corresponding group to compensate for the different group sizes. Here, refers to the coefficients of each group. The above group sparsity can be easily extended to the case of multiple neighboring pixels by extending problem (7) to collaborative group Lasso, which is formulated as
(8) 
where represents a collaborative group Lasso regularizer defined in terms of group and refers to each of the group coefficient matrix. When the group size is reduced to one, the group Lasso degenerates into a joint sparsity Lasso.
IiD Sparse Group Sparsity Prior
In the formulations (7) and (8), the coefficients within each group are not sparse, and all the atoms in the selected groups could be active. If the subdictionary is overcomplete, then it is necessary to enforce sparsity within each group. To achieve sparsity within the groups, an norm regularizer can be added to the group Lasso (7), which can be written as
(9) 
Similarly, Eq. (9) can be easily extended to the multiple feature case, which can be written as
(10) 
IiE Low Rank/Group Sparsity Prior
Based on the fact that spectra of neighboring pixels are highly correlated, it is reasonable to enforce the low rank sparsity prior on their coefficient matrix. The low rank prior is more flexible when compared with the joint sparsity prior which strictly enforces the row sparsity. Therefore, when neighboring pixels are composed of small nonhomogeneous regions, the low rank sparsity prior outperforms the joint sparsity prior. Low rank sparse recovery problem has been well studied in [15] and is stated as the following Lasso problem
(11) 
where is the nuclear norm [15].
To incorporate the structure of the dictionary, we now extend the low rank prior to group low rank prior, where the regularizer is obtained by summing up the rank of every group coefficient matrix,
(12) 
The low rank group prior is able to obtain the withingroup sparsity by minimizing the nuclear norm of each group. Furthermore, the summation of nuclear norms empowers the proposed prior to obtain a group sparsity pattern. Hence, the low rank group prior is able to achieve sparsity both within and across groups by using only one regularization term.
Class  Train  Test 

1  6  48 
2  137  1297 
3  80  754 
4  23  211 
5  48  449 
6  72  675 
7  3  23 
8  47  442 
9  2  18 
10  93  875 
11  235  2233 
12  59  555 
13  21  191 
14  124  1170 
15  37  343 
16  10  85 
Total  997  9369 
Optimization Techniques  ADMM/SpaRSA  Feature Sign Search  
Class  SVM  JS  LS  GS  SGS  LR  LRG  LS  
1  77.08  68.75  79.17  85.42  79.17  87.50  75.00  91.67  66.67  83.33 
2  84.96  58.84  81.94  81.34  80.62  79.92  78.60  81.71  74.42  89.90 
3  62.67  24.40  56.67  47.35  62.13  76.13  29.87  89.87  69.87  78.38 
4  8.57  49.52  27.62  49.76  37.14  54.29  15.24  67.62  64.76  88.15 
5  77.18  81.88  85.46  83.96  84.79  82.55  82.10  83.45  91.72  94.43 
6  91.82  96.88  98.36  97.48  98.96  98.36  98.21  98.36  97.02  98.52 
7  13.04  0.00  0.00  0.00  0.00  0.00  0.00  0.00  69.57  0.00 
8  96.59  96.59  100.00  99.55  99.55  99.55  99.77  99.55  99.55  100.00 
9  0.00  5.56  0.00  0.00  22.22  0.00  0.00  0.00  61.11  0.00 
10  71.30  24.00  18.94  31.89  39.95  45.58  8.61  49.60  76.46  87.43 
11  35.25  96.22  91.63  94.58  91.99  93.02  97.12  92.35  87.62  98.84 
12  42.39  32.97  45.29  64.68  69.57  65.58  20.83  82.97  78.26  91.71 
13  91.05  98.95  99.47  99.48  99.47  98.95  98.95  99.47  99.47  100.00 
14  94.85  98.97  98.97  99.49  98.80  99.31  99.83  99.31  97.77  99.57 
15  30.70  49.71  55.85  63.84  50.58  80.99  44.15  89.47  53.80  69.97 
16  27.06  88.24  95.29  97.65  95.29  98.82  97.65  97.65  85.88  97.65 
OA[]  64.94  71.17  76.41  79.40  80.19  83.19  71.90  86.46  83.74  92.58 
AA[]  56.53  60.72  68.53  64.67  69.39  72.53  59.14  76.43  79.62  79.87 
0.647  0.695  0.737  0.712  0.781  0.807  0.695  0.843  0.833  0.923 
Class  Train  Test 

1  139  6713 
2  137  1859 
3  100  2107 
4  133  3303 
5  68  1310 
6  135  4969 
7  95  1261 
8  131  3747 
9  59  967 
Total  997  42926 
Optimization Techniques  ADMM/SpaRSA  Feature Sign Search  
Class  SVM  JS  LS  GS  SGS  LR  LRG  LS  
1  84.55  57.11  77.04  95.08  94.01  97.90  91.16  94.15  72.14  95.85 
2  82.45  58.22  67.98  66.70  70.04  68.04  69.73  69.32  59.62  64.28 
3  77.08  57.33  44.32  77.55  79.45  73.56  75.80  79.73  66.21  76.51 
4  94.19  95.94  95.13  95.19  95.31  95.55  95.94  98.46  97.67  98.97 
5  99.01  100.00  99.85  100.00  100.00  100.00  100.00  100.00  99.85  100.00 
6  23.55  89.60  88.31  96.60  100.00  99.74  100.00  99.96  80.60  98.63 
7  2.06  83.27  84.38  96.59  95.24  95.56  95.06  95.24  86.76  94.69 
8  33.89  48,65  65.20  67.36  62.24  44.84  65.24  63.06  75.95  95.76 
9  53.05  93.69  99.59  99.59  93.38  93.28  93.57  94.00  90.69  98.35 
OA[]  69.84  66.51  74.05  80.82  81.15  79.07  80.81  81.02  71.41  81.84 
AA[]  61.09  75.98  80.06  88.80  87.73  85.36  87.35  87.93  81.05  91.45 
0.569  0.628  0.681  0.758  0.675  0.624  0.611  0.66  0.672  0.781 
Iii Results and Discussion
Iiia Datasets
We evaluate various structured sparsity priors on two different hyperspectral images and one toy example. The first hyperspectral image to be assessed is the Indian Pine, acquired by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), which generates 220 bands, of which 20 noisy bands are removed before classification. The spatial dimension of this image is , which contains 16 groundtruth classes, as shown in Table I. We randomly choose 997 pixels ( of all the labelled pixels) for constructing the dictionary and use the remaining pixels for testing. The second image is the University of Pavia, which is an urban image acquired by the Reflective Optics System Imaging Spectrometer (ROSIS), contains pixels. It generates 115 spectral bands, of which 12 noisy bands are removed. There are nine groundtruth classes of interests. For this image, we choose 997 pixels ( of all the labelled pixels) for constructing the dictionary and the remaining pixels for testing, as shown in Table III. The toy example consists of two different classes (class 2 and 14 of the Indian Pine test set), and each class contains 30 pixels. The dictionary is the same as that for the Indian Pine. The toy example is used to evaluate the various sparsity patterns generated by the different structured priors.
IiiB Models and Methods
The tested structured sparse priors are: (i) joint sparsity (JS), (ii) Laplacian sparsity (LS), (iii) collaborative group sparsity (GS), (iv) sparse group sparsity (SGS), (v) low rank prior (LR) and (vi) low rank group prior (LRG), corresponding to Eqs. (7), (10), (12), (14), (16) and (17), respectively. For SRC, the parameters and of different structured priors range from to . Performance on the toy example will be visually examined by the difference between the desired sparsity regions and the recovered ones. For the two hyperspectral images, classification performance is evaluated by the overall accuracy (OA), average accuracy (AA), and the coefficient measure on the test set. For each structured prior, we present the result with the highest overall accuracy using cross validation. A linear SVM is implemented for comparison, whose parameters are set in the same fashion as in [8].
In experiments, joint sparsity, group sparsity and low rank priors are solved by ADMM [21], while CHiLasso and Laplacian prior are solved by combining SpaRSA [22] and ADMM. In addition, in conformity with previous work [14], the Laplacian regularized Lasso is also solved by a modified feature sign search (FSS) method. In this paper, we try to present a fair comparison among all priors. According to the optimization technique, we sort the structured priors into two categories: (i) priors solved by ADMM and SpaRSA and (ii) priors solved by FSSbased method. The first row of Table II and Table IV show the methods used to implement the sparse recovery for each structured prior.
ADMM/SpaRSA  FFS  
JS  LS  GS  SGS  LR  LRG  LS  
1124  1874  4015  2811  2649  4403  2904  1124  11628 
IiiC Results
Sparsity patterns of the toy example are shown in Fig. 1. The expected sparsity regions are shown in Fig. 1(a), where the yaxis labels the dictionary atom index and xaxis labels the test pixel index. The red and green regions correspond to the ideal locations of the active atoms for the class 2 and 14, respectively. Nonzero coefficients that belong to other classes are shown in blue dots. The joint sparsity, Fig. 1 (c), shows clear row sparsity pattern, but many rows are mistakenly activated. As expected, active atoms in Fig. 1 (d), (e) and (g) demonstrate group sparsity patterns. Comparing the GS (d) and SGS (e), it is observed that most of the atoms are deactivated within groups using SGS. The low rank group prior (g) demonstrates a similar sparsity pattern as that of SGS. For the Laplacian sparsity (h), similarity of sparse coefficients that belong to the same classes is clearly visible.
Table II and Fig. 2 show the performance of SRCs with different priors on the Indian Pine image. A spatial window of () is used since this image consists of mostly large homogeneous regions. Among SRCs with different priors, the worst result occurs when we use simple ADMM. Joint sparsity prior gives better result than the low rank prior. This is due to the large areas of homogeneous regions in this image, which favors the joint sparsity model. The highest OA is given by the Laplacian sparsity prior via FFS, such a high performance is partly contributed to the accurate sparse recovery of the feature sign search method. Both SGS and LRG outperform GS. We can see that among ADMMbased based methods, the low rank group prior yields the smoothest result. The computational time of various structured priors for Indian Pine image are shown in Table V. Among ADMM/SpaRSAbased methods, LRG, GS and SGS take roughly similar time (2500s) to process the image, while LR and JS require longer time (4000s). LS via FFS significantly impedes the computational efficiency.
Results for the University of Pavia image are shown in Table IV. The window size for this image is () since many narrow regions are present in this image. The group sparsity prior gives the highest OA among the priors optimized by ADMM. The low rank sparsity prior gives a much better result than joint sparsity since this image contains many small homogeneous regions. The Laplacian sparsity prior via FFS gives the highest OA performance. However, the difference between performance of various structured priors is quite small.
Iv Conclusion
This paper reviews five different structured sparse priors and proposes a low rank group sparsity prior. Using these structured priors, classification results of SRCs on HSI are generally improved when compared with the classical sparsity prior. The results have confirmed that the low rank prior is a more flexible constraint compared with the joint sparsity prior, while the latter works better on large homogeneous regions. Imposing the group structured prior on the dictionary always gives higher overall accuracy compared with the prior. We have also observed that the performance is not only determined by the structured priors, but also depend on the corresponding optimization techniques.
References
 [1]
 [2] A. Plaza, J. Benediktsson, J. Boardman, J. Brazile, L. Bruzzone, G. CampsValls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, M. Marconcini, J. Tiltoni and G. Trianni, “Recent advances in techniques for hyperspectral image processing,” Remote Sens. Envir., vol. 113, no. s1, pp. s110s122, Sept. 2009.
 [3] G. CampsValls, L. GomezChova, J. MuñozMarì, J. VilaFrancés and J. CalpeMaravilla, “Composite kernels for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 9397, Jan. 2006.
 [4] L. GómezChova, G. CampsValls, J. MuñozMarí and J. CalpeMaravilla, “Semisupervised image classiﬁcation with Laplacian support vector machines,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 3, pp. 336340, Jul. 2008.
 [5] J. Zhu , S. Rosset , T. Hastie and R. Tibshirani, “1norm support vector machines,” NIPS, vol. 16, pp. 1623, Dec. 2003.

[6]
J. Wright, A. Yang, A. Ganesh, S. Sastry and Y. Ma, “Robust face recognition via sparse representation,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210227, Feb. 2009. 
[7]
J. Wright, J. Mairal, G. Sapiro, T.S. Huang, S. Yan, “Sparse Representation for computer vision and pattern recognition,”
Proceed. IEEE, vol. 98, no. 6, pp. 10311044, Apr. 2010.  [8] Y. Chen, M. Nasrabadi and T. Tran, “Hyperspectral image classification using dictionarybased sparse representation,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 6, pp. 22872302, Oct. 2011.
 [9] Q. Haq, L. Tao, F. Sun and S. Yang, “A fast and robust sparse approach for hyperspectral data classification using a few labeled samples,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 10, pp. 39733985, June 2012.
 [10] R. Ji, Y. Gao, R. Hong, Q. Liu, D. Tao and X. Li, “Spectralspatial constraint hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. PP, no. 99, pp. 113, June 2013.
 [11] M. Iordache, J. BioucasDias and A. Plaza, “Sparse unmixing of hyperspectral data,” IEEE Geosci. Remote Sens.,, vol. 49, no. 6, pp. 20142039, June 2011.
 [12] J. Tropp, A. Gilbert and M. Strauss, “Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit,” Signal Processing, vol. 54, no. 12, pp. 46344643, Dec. 2006.
 [13] E. Berg and M. Friedlander, “Jointsparse recovery from multiple measurements,” IEEE Trans. Information Theory., vol. 56, no. 5, pp. 25162527, Apr. 2010.
 [14] S. Gao, I. Tsang and L. Chia, “Laplacian sparse coding, hypergraph Laplacian sparse coding, and applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 92104, Jan. 2013.
 [15] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu and Y. Ma, “Robust recovery of subspace structures by lowrank representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 171184, Jan. 2013.
 [16] A. Rakotomamonjy, “Surveying and comparing simultaneous sparse approximation (or grouplasso) algorithms,” Signal Processing, vol. 91, no. 7, pp. 15051526, July 2011.
 [17] S.Kim and E. Xing, “Treeguided group lasso for multitask regression with structured sparsity,” ICML, vol. 6, no. 3, pp. 10951117, June 2010.
 [18] P. Sprechmann, I. Ramirez, G. Sapiro and Y. Eldar, “CHiLasso: a collaborative hierarchical sparse modeling framework,” IEEE Trans. Signal Processing, vol. 59, no. 9, pp. 41834198, Oct. 2011.
 [19] Y. Qian, M. Ye and J. Zhou, “Hyperspectral image classification based on structured sparse logistic regression and threedimensional wavelet texture features,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 4, pp. 22762291, Apr. 2012.
 [20] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” CVPR, pp. 27902797, June 2009.
 [21] S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” FTML., vol. 3, no. 1, pp. 1122, Jan. 2010.
 [22] S. Wright, R. Nowak and M. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Trans. Signal Processing, vol. 57, no. 7, pp. 24792493, July 2009.