1 Introduction
Hyperspectral images (HSIs) have abundant spectral information and high spatial resolution, which has been widely used to scene classification, object detection and environmental monitoring, etc
Benediktsson et al. (2005); Iyer et al. (2017); Tu et al. (2019). In recent year, hyperspectral image classification has become a research hotspot in recent years and plays an important role in HSI analysis Lin et al. (2018). Multiple classification techniques have been attempted and achieved good performance Wang et al. (2019).In recent years, sparse representation has attracted more and more attention from scholars. It has the advantage of high accuracy and fast processing speed, and it does not need to make statistics and assumptions on sample distribution Donoho (2006). Therefore, it has been a powerful tool for signal processing and analysis, widely applied to many fields, such as data compression Wang and Celik (2017), image restoration Julien et al. (2007); Michael and Michal (2006); Dong et al. (2013)
, and face recognition
He et al. (2017); John et al. (2009); Gao et al. (2017). Recently, many classification model based on sparse representation have been proposed and have demonstrated superiority in hyperspectral image classification Wang et al. (2019). In 2011, Chen et al. proposed a regionbased sparse representation for hyperspectral image classification, rooted in the assumption that every hyperspectral pixel belonging to one class can be represented by a common subdictionary consisting of training samples from the same class Yi et al. (2011). In the experiment, in order to achieve the higher classification accuracy, the authors got the relative optimal values by setting up different parameter values (eg.sparsity level, weighting factor and neighborhood size). In 2014, Zhang et al. proposed a method called non local weighed joint sparse representation classification (NLWJSRC) to compensate for the deficiency of utilizing different weights for different neighboring pixels around the central test pixel in joint sparsity model (JSM) Zhang et al. (2014); Baron et al. (2012). The NLWJSRC method aims to ensure that the pixels which are similar to the central test pixel contribute more to the classification process, with a larger weight, and vice versa, which supports an improved hyperspectral image classification performance. However, the authors manually set parameter values (eg.sparsity level and neighborhood size) for classification accuracy in the experiment. In 2017, Fang et al proposed a multiplefeaturebased adaptive sparse representation (MFASR) method for the hyperspectral images classification Fang et al. (2017). The MFASR utilizes an adaptive sparse representation to effectively exploit the correlations among four extracted features from the original HSI. Finally the authors still need to set different parameter values for achieving the higher classification accuracy. Therefore, for these hyperspectral image classification based on sparse representation, sparsity level is be paid more attention to Yan et al. (2019); Wei et al. (2016).A hyperspectral pixel can be represented by a dictionary consisting of different classes extracted training samples from sparse samples in sparse representation classification model. An unknown pixel can be expressed as a sparse vector whose a few nonzero entries correspond to the weight of the selected training samples. The sparse vector can be recovered by solving a sparsityconstrained optimization problem (
minimization problem), and the class label of the test hyperspectral pixels can be directly determined by the property of the recovered sparse vectors. Therefore, how to solve the sparsity optimization problem is the key of sparse model Zhang et al. (2017). Thera are many algorithms have been proposed and applied to solve the sparsity optimization problem including the greedy algorithms Wang et al. (2019) and some optimization algorithms Sayed et al. (2019). However, the classification results based on optimization algorithms depend on two parameters: the regularization parameter and the sparsity level parameter . For the greedy algorithms, such as Orthogonal Matching Pursuit (OMP) Davenport and Wakin (2010) and Subspace Pursuit (SP) Dai and Milenkovic (2009), they need to set up the appropriate in advance. Even the now improved OMP algorithms were proposed. Deanna Needell proposed the Regularized Orthogonal Matching Pursuit (ROMP) that combines the speed and ease of implementation of the greedy methods with the strong guarantees of the convex programming methods Needell and Vershynin (2010). Meanwhile, for the pursuing efficiency in reconstructing sparse signals, Jian W et al. proposed the generalized OMP (gOMP) that is finished with much smaller number of iterations when compared to OMP Wang et al. (2011). But parameter still need to be considered in advance. For optimization algorithms, such as a Fast Iterative ShrinkageThresholding Algorithm (FISTA) Beck and Teboulle (2009), Bregman Ye and Xie (2011) and the framework of Sparse Reconstruction by Separable Approximation (SpaRSA) Wright et al. (2009), all these algorithms need to set regularization parameter in advance when solving optimization problems.The parameter values need to be set artificially and they are selected depending on producing better results by repeating the algorithm many times. When the result is not met the target requirement, the parameters values will be adjusted again. We can’t test all the values and then to choose the best one, so that the theoretical optimal parameter values cannot be obtained and only the relative optimal parameter values can be selected. The results of parameters values are set artificially and the nonadaptability or nonautomation during the solution process, which limit the application of the sparse representation method.
To address the issues of the parameters values in the hyperspectral image classification based on sparse representation, Alternating Direction Method of Multipliers (ADMM) has been proposed and it is an efficient variable splitting algorithm with convergence guarantee. It considers the augmented Lagrangian function of a given sparsity model, and splits variables into subgroups, which can be alternatively optimized by solving a few simply subproblems Boyd et al. (2010). Although ADMM is generally efficient, it is not easy to determine the optimal parameters (e.g. penalty parameters) influencing the classification accuracy and speed in the sparse model. Many applications have shown that penalty parameters which be chosen too large or too small can lead to a significant influence time of the solution or result of classification 198 (1983); Fukushima (1992); Spyridon Kontogiorgis (1998). Recently, Yang et al. proposed a Deep ADMMNet for compressive sensing MRI, which greatly resolved parameter setting in this problem Yang et al. (Nov.2018). The author firstly devised a network structure based on ADMM algorithm. Related parameters are updated endtoend using LBFGS algorithm automatically rather than set up in advance.
In this paper, in order to construct the optimal parameters in a sparse model, we propose the Adaptive Sparse Deep Network based on a novel deep architecture. Our work includes two parts. Firstly, we introduce the iterative procedures in ADMM algorithm for optimizing a sparse model. The data flow graph is shown in Fig.1
. It consists of multiple stages are each of which corresponds to an iteration in ADMM algorithm. The proposed deep architecture is visible and not a blackbox optimizers, different with a fully connected network or convolution network in deep learning
Luo et al. (2018); Slavkovikj et al. (2015); Yuan et al. (2019). Then, similar to the Feedforward Neural Network (FNN), the Adaptive Sparse Deep Network is divided into three parts, namely input layer, hidden layer and output layer. Meanwhile, in the hidden layer, it has three types of operations, corresponding to different learnable parameters. Finally, all parameters can be updated by gradient descent in BackwardPropagation. An optimal sparse vector can be obtained by our method. The optimal results of hyperspectral image classification can be got directly.
The remainder of this paper is structured as follows. Section 2 reviews the sparse representation classification and hyperspectral sparse model on ADMM algorithm. Section 3 proposes the Adaptive Sparse Deep Network for the hyperspectral image classification. The experimental results of the proposed deep architecture with two hyperspectral images are given in Section 4. Finally, Section 5 summarizes this paper and makes some closing remarks.
2 Hyperspectral sparse model based on the alternating direction methods of multipliers
For a hyperspectral pixel , where L is the number of spectral bands, means the special line of a certain spatial location in HSI. Given a dictionary , where is the number of the classes, is a subdictionary, which is composed of pixels selected from the class HSI. According to sparse representation, a hyperspectral pixel can be represented by a linear combination of the dictionary . As can been seen in Fig.2, the can be represented as follows:
(1) 
where is sparse vector for the and is an error residual item. Here can be obtained by solving following constrained optimization problem and hyperspectral sparse model is shown as follows:
(2) 
where is a given upper bound on the sparsity level, which is equal to the upper bound number of nonzeros rows in . Besides, for solving the problem (2), the greedy algorithms can be used to solve the optimization problem. However, the sparsity level parameter are needed to set up in advance. Higher the sparsity will lead to higher computational cost and worse in the classification performance because misleading the dictionary atoms form the wrong classes to be selected.
The optimization problem (2
) also can be converted to a linear programming problem by replacing
with . It is shown as follows:(3) 
By adding the regularization parameter , the desired is as sparse as possible and the error is as small as possible. The problem (3) can also be described as:
(4) 
There are many optimization algorithms for solving problem (4), such as FISTA Beck and Teboulle (2009), Bregman iterative algorithm Ye and Xie (2011), and SpaRSA Wright et al. (2009). For all of the methods, the regularization parameter provides a tradeoff between fidelity to the measurements and the sensitivity, which still need to set up in advance by human experience.
In order to solve the issue of finding the optimal solutions in problem (4), ADMM, a simple but powerful algorithm that is well suited to distributed convex optimization and takes the form of a decompositioncoordination procedure, in which the solutions to small local subproblems are coordinated to find a solution to a large global problem. ADMM can be viewed as an attempt to blend the benefits of dual decomposition and augmented Lagrangian methods for constrained optimization Boyd et al. (2010). Therefore, in this paper, this method is suitable for solving problem (4).
By introducing auxiliary variables , problem (4) is equivalent to:
(5) 
Its augmented Lagrangian function is:
(6) 
where is Lagrangian multiplier and is penalty parameter.
For the sake of simplicity, formula (6) can also be written in the following form:
(7) 
here, let .
In the sparse model, it commonly need to run the ADMM algorithm in dozens of iterations to get a optimal sparse vector. Once is obtained, the class of the hyperspectral pixel can be directly determined. Comparing with the reconstruction results of each class, we classify by assigning it to the object class that minimizes the residual :
(8) 
Experience on applications has shown that the number of iterations depends significantly on the penalty parameter . If the fixed penalty parameter is chosen too small or too large, the solution time can increase significantly. Just like sparsity level or regularization parameter above, they are unknown and need to be predetermined. After determining theirs value, the model is solved and the classification result is compared with the real labels. If the result is not met, the parameters will be adjusted again, which results in the nonadaptability or nonautomation of the solution process.
In the following section, some selfadaptive rules are applied for adjusting the penalty parameter and regularization parameter. We can use the gradient descent by BackPropagation to solve this problem. Updating the corresponding parameters by the gradient computation, the selfadaptive parameters can be obtained. In the next section, the detail of a deep architecture based on ADMM algorithm will be introduced.
3 The Adaptive Sparse Deep Network
Parameters setting are mostly depended on human experience, which is unsure that the obtained solution is the global optimal result. Our work includes two parts. Firstly, the Adaptive Sparse Deep Network (ASDN) is designed based on ADMM algorithm. A deep data flow graph is shown in Fig.3. Each update iteration based on ADMM is corresponding to a stage in Fig.3. Secondly, the detail about updating the parameters by the gradient decent in the BackPropagation will be introduced. Clear network structure and data conduction relationships are shown in our methods. In ASDN, all parameters are selfadaptively updated rather than manual setting.
3.1 A Data Flow Graph of Updating Order of Parameters
Data transfer and feedback are the basis of deep networks. Firstly, a data flow graph is built, while the order of updating parameters are the key to it. Here, problem (7) can be solved by the following three subproblems based on ADMM algorithm:
(9) 
Let , then the three subproblems have the following solutions:
(10) 
where is a relaxation parameter that can improve convergence in ADMM, is a nonlinear shrinkage function and denotes an update rate. Therefore, in ADMM algorithm, the parameters updates as following order: the sparse vector (), then the nonlinear vector () and finally the multiplier ().
Based on the order of updating parameters, a deep data flow graph is devised which mapping the iteration of the ADMM algorithm. As shown in Fig.3, this graph consists of nodes corresponding to different operation in ADMM, and directed edges corresponding to the data flows between operations. Each update (9) is considered as a stage. Similarity, the iteration of ADMM algorithm corresponds to the stage of the data flow graph. In the stage of the graph, there are three types of nodes mapped from three types of operation in ADMM, i.e., sparse vector operation (), nonlinear transform operation () defined by , and multiplier update operation (). Therefore, the whole data flow graph corresponds well to each update iteration in ADMM. Given an undersampled data from HSIs, it flows over the graph and finally obtains the optimal sparse vector . In the following subsections, the Adaptive Sparse Deep Network will be proposed, which is a deep architecture based on the data flow graph.
3.2 The Architecture of the Adaptive Sparse Deep Network
In this subsection, we propose a deep architecture dubbed Adaptive Sparse Deep Network. It mainly consists three parts: input layer, hidden layer and output layer. Meanwhile, in the hidden layer, it has three types of operations corresponding to different learnable parameters.
Sparsity operation: This operation obtains a sparse vector according to Eqn.(9) and Eqn.(10). Given and , the output of the node is defined as:
(11) 
where are learnable parameters. In the first stage , and are initialized to zeros. Therefore .
Nonlinear transform operation: This operation performs nonlinear transform inspired by the ADMM algorithm in a sparse model in Eqn.(9) and Eqn.(10). Given and , the output of the node is defined as:
(12) 
where are learnable parameters.
Multiplier update operation: This operation is defined by the ADMM algorithm in Eqn.(9) and Eqn.(10). The output of this node in stage is defined as:
(13) 
where are learnable parameters.
In Forward Network, each node belongs to different operation. Each node can receive values from the former of nodes and produce the output values to the following nodes. There is no feedback in the whole network during training. The value propagates unidirectionally from the input layer to the output layer, which can be represented by a directed acyclic graph. As shown in Fig.4, it is the Adaptive Sparse Deep Network. The undersampled data from HSIs flows over the input layer to output layer in a order from circled number 1 to number 10, followed by a final sparse vector with circled number 11 and generate the optimal sparse vector. Then we can obtain the classification of HSIs. Fig.4 greatly illustrates the deep architecture described above.
After training the Adaptive Sparse Deep Network, there exists errors in the results. We can minimize losses by using gradient descent method in BackPropagation. By computing the loss function, the parameter of each node can be updated. Therefore, we get the latest parameters to train the network against. The optimization of the parameters is expected to obtain optimal sparse vector, which produces improved classification result.
Firstly, the each class of residual error between the network output and truth is defined as
(14) 
Then the loss function can be defined as:
(15) 
where are the network parameters. In the backward pass, we aim to learn the following parameters: in sparsity operation, in nonlinear transform operation and in multiplier update operation. In the following subsection, we discuss how to compute the gradients of the loss function and the parameters can be updated by using Backpropagation over the data flow graph.
3.3 Gradient Computation by BackPropagation
In FNN, the gradient propagates backward through each layer, further calculating the gradient of each layer’s parameters. Similarity, in our Adaptive Sparse Deep Network, the gradients are computed in an inverse order. Then the parameters can be learned by BackPropagation. Fig.4 shows an example, where the gradient can be computed backwardly from the operation with circled number 11 to 2 successively. For a stage, Fig.5 shows three types of nodes and the data flow over them. Each node has multiple inputs and outputs. We next introduce the detail of gradients computation for each node in a typical stage.
Multiplier update operation: As shown in Fig.5(a), this operation has three inputs: , and . Its output is the input to compute , and . Here the parameters can be updated. The gradients of loss and the parameters can be computed as:
(16) 
(17) 
We also compute gradient of the output in this operation and its inputs:, and .
Nonlinear transform update operation: As shown in Fig.5(b), this operation has two inputs: , and its output is the input for computing and in next stage. Here the parameters can be updated. The gradients of loss and the parameters can be computed as:
(18) 
(19) 
We also compute the gradients of this operation output to its inputs: and .
Sparsity update operation: As shown in Fig.5(c), this operation has two inputs: , and its output is the input for computing and in the same stage. Here the parameters can be updated. The gradients of loss and the parameters can be computed as:
(20) 
(21) 
The gradients of this operation output to its inputs: and .
Class  Name  Dictionary  Train  Test 

1  Asphalt  66  597  5968 
2  Meadows  186  932  17531 
3  Gravel  21  189  1889 
4  Trees  31  276  2757 
5  Painted metal sheets  13  269  1063 
6  Bare Soil  50  453  4526 
7  Bitumen  13  266  1051 
8  SelfBlocking Bricks  37  331  3314 
9  Shadows  9  189  749 
Total  426  3502  38848 
Class  Name  Dictionary  Train  Test 

1  Weeds1  20  181  1808 
2  Weeds2  37  335  3334 
3  Fallow  20  178  1778 
4  Fallowroughplow  14  125  1255 
5  Fallowsmooth  27  241  2410 
6  Stubble  40  356  3563 
7  Celery  36  322  3221 
8  Grapesuntrained  113  1014  10144 
9  Soilvinyarddevelop  62  558  5583 
10  Cornsenescedgreen weeds  33  295  2950 
11  Lettuceromaine4wk  11  96  961 
12  Lettuceromaine5wk  19  173  1735 
13  Lettuceromaine6wk  9  82  825 
14  Lettuceromaine7wk  11  96  963 
15  Vineyarduntrained  73  654  6541 
16  Vineyardverticaltrellis  18  163  1626 
Total  543  4869  48697 
4 Experimential Results and Analysis
In this section, four experiments were separately conducted on two real HSIs to validate the superiority of the proposed Adaptive Sparse Deep Network (ASDN).
The first hyperspectral image in our experiments is the University of Pavia, which was captured by the Reflective Optics System Imaging Spectrometer(ROSIS) optical sensor over an urban area surrounding the University of Pavia, Italy. The size of this image is 610 340 115 with a spatial resolution of 1.3 per pixel and a spectral coverage ranging from 0.43 to 0.86 . In our experiment, the 12 very noisy channels were removed. The first principal component map of Pavia University and its false color image are shown in Fig.6(a) and Fig.6(b). For this date set with 9 classes of land cover, we randomly select 1 of each class of samples for dictionary and the remainder was used for training and testing. The reference contents are shown in Table LABEL:pavia and the label map of ground truth is shown in Fig.6(c).
The second hyperspectral image in our experiments is the Salinas, which was acquired by the Airborne/Visible Infrared Imaging Specrometer(AVIRIS) sensor over Salinas Valley, California. The size of this image is 512 217 224 with a spatial resolution of 3.7 per pixel. Similar to the Pavia University image, 20 water absorption bands(108112,154167,and 224) are degraded, and the remaining 204 bands were preserved for the following experiment. The first principal component map of Pavia University and its false color image are shown in Fig.7(a) and Fig.7(b). For this date set with 16 classes of land cover, we randomly select 1 of each class of samples for dictionary and the remainder was used for training and testing. The reference contents are shown in Table.LABEL:salinas and the label map of ground truth is shown in Fig.7(c).
In these experiments, the influence of different parameters on the classification results will be tested. The overall accuracy (), average accuracy (), and kappa coefficient () are adopted in these experiments to evaluate the qualify of the classification result. All the experiments are conducted on a 3.00 GHZ computer with 64.0 Gb RAM.
4.1 Comparisons of Different Sparsity Levels
In the first experiment, the proposed ASDN was compared with the greedy algorithms, such as SP Dai and Milenkovic (2009), OMP Davenport and Wakin (2010), ROMP Needell and Vershynin (2010) and gOMP Wang et al. (2011). Different sparsity level were verified to have an impact on the performance of classification results.
Firstly, we demonstrate the effect of the sparsity level on the classification results. Then the results are averaged over five runs at each to avoid any bias induced by random sampling. These classification accuracy plots on the entire test sets are shown in Fig.8 and Fig.9. The sparsiy level ranges from 1 to 10. In Fig.8 and Fig.9, the curves are volatile and unstable. When the sparsity level , the classification accuracy achieves higher on OMP algorithm. We are still unsure that whether this sparsity level is optimal. Other greedy algorithms are similar to above. Therefore, it is the drawback of manually setting parameters. Secondly, the classification results of our method and several greedy algorithms are shown in the Table 3 and Table 4, including OA, AA and kappa coefficient. In Pavia University, the OA of ASDN is higher than SP, OMP, ROMP, gOMP, which is 3.87, 6.47, 4.78 and 6.84 respectively. Similarly, in Salinas, the OA of ASDN is higher than SP, OMP, ROMP, gOMP, which is 1.13, 2.09, 0.73 and 1.95 respectively. Although the SP algorithm and ROMP algorithm showed superior performance respectively, our method outperforms these greedy algorithms.
Class  SP  OMP  ROMP  gOMP  Adaptive Sparse 

Deep Network  
1  77.22  70.05  69.75  67.82  76.35 
2  92.86  92.71  94.55  93.99  95.33 
3  58.81  54.33  55.49  55.39  65.11 
4  84.46  78.21  83.79  78.13  92.38 
5  98.82  99.45  99.17  99.70  98.93 
6  53.46  44.20  50.65  42.57  71.65 
7  69.88  72.04  69.02  72.44  67.25 
8  73.21  76.93  76.50  72.03  77.33 
9  85.45  82.00  75.88  80.58  68.9 
81.15  78.55  80.24  78.18  85.02  
77.13  74.43  75.71  73.63  79.25  
74.69  71.06  73.37  70.50  80.07 
Class  SP  OMP  ROMP  gOMP  Adaptive Sparse 

Deep Network  
1  97.57  97.18  98.63  97.45  99.23 
2  98.46  98.32  98.94  98.54  97.58 
3  93.46  94.51  95.45  93.51  91.51 
4  98.72  97.80  99.11  98.02  97.69 
5  97.07  93.84  95.91  94.51  97.26 
6  99.51  99.80  99.72  99.83  98.54 
7  99.25  99.34  99.33  99.51  99.16 
8  75.56  73.82  78.02  74.93  85.76 
9  98.00  97.88  98.73  98.14  99.34 
10  86.78  85.57  89.52  86.59  79.80 
11  95.58  96.37  93.50  97.48  85.85 
12  99.97  99.63  99.11  99.53  98.62 
13  97.98  97.39  98.45  97.57  96.72 
14  92.04  90.68  93.17  87.35  96.05 
15  61.95  58.93  58.06  57.31  60.07 
16  92.28  94.31  94.61  95.36  88.31 
87.53  86.57  87.93  86.71  88.66  
92.76  92.21  93.14  92.23  91.97  
86.12  85.05  86.55  85.19  87.34 
4.2 Comparisons of Different Regularization Parameters
In the second experiment, the proposed ASDN was compared with the Bregman Ye and Xie (2011), FISTA Beck and Teboulle (2009), and the SpaRSR Wright et al. (2009). These algorithms are used to solve our problem 4 but the corresponding regularization parameters are need to manual setting in advance. Whether different regularization parameters affect the classification results were tested.
Different regularization parameters are applied to the problem 4. All results are averaged over five runs at each to avoid any bias induced by random sampling. The classification plots are shown in Fig.10 and Fig.11. As the increases, the classification curves of the Bregman and FISTA Steadily rise. But it is hard to determine the optimal regularization parameters. As for the curves of the SpaRSR fluctuates. We are still unsure the global optimal solutions. For these algorithms, setting the regularization parameters in advance is inevitable. Then we list the classification results of the proposed ASDN and several algorithms in Table 5 and Table 6. In terms of OA, the FISTA achieves the highest accuracy except for the proposed ASDN. Our method is higher than FISTA, which is 2.93 and 2.81 in Pavia University and Salinas. The proposed ASDN still achieves the highest accuracy compared with algorithms.
Class  Bregman  FISTA  SpaRSR  ADMM  Adaptive Sparse 

Deep Network  
1  94.87  84.41  94.20  88.92  76.35 
2  96.90  98.64  91.33  92.13  95.33 
3  68.19  47.93  3.01  48.78  65.11 
4  89.75  78.55  80.22  81.93  92.38 
5  99.65  99.40  96.96  99.40  98.93 
6  47.03  36.28  28.23  52.08  71.65 
7  17.78  59.35  0.23  23.38  67.25 
8  32.69  78.09  22.11  45.30  77.33 
9  69.33  93.12  94.98  91.71  68.9 
80.29  82.09  70.71  78.12  85.02  
68.47  75.09  56.82  69.29  79.25  
72.98  75.34  59.60  70.46  80.07 
Class  Bregman  FISTA  SpaRSR  ADMM  Adaptive Sparse 

Deep Network  
1  99.77  97.48  79.38  100.00  99.23 
2  95.97  98.48  99.32  18.55  97.58 
3  42.0  79.55  0.1  9.30  91.51 
4  98.99  98.55  95.07  99.06  97.69 
5  98.87  97.32  28.06  99.59  97.26 
6  99.78  99.29  99.16  99.64  98.54 
7  99.77  99.44  16.48  99.07  99.16 
8  91.98  83.25  96.50  85.98  85.76 
9  99.64  97.28  99.69  98.63  99.34 
10  92.51  76.52  33.65  45.33  79.80 
11  34.41  90.54  0  0  85.85 
12  98.29  99.69  8.91  30.78  98.62 
13  99.26  98.01  50.00  98.90  96.72 
14  92.19  90.46  81.11  88.20  96.05 
15  37.68  48.41  4.27  46.41  60.07 
16  97.33  88.81  65.72  82.49  88.31 
85.34  85.85  61.05  72.09  88.66  
86.15  90.19  53.59  68.87  91.97  
83.56  84.18  55.45  68.71  87.34 
4.3 Comparisons of ADMM
In the third experiment, the proposed ASDN was compared with ADMM on different penalty parameters or regularization parameters. Firstly, we demonstrated the effect of the penalty parameter on the classification results. Similarly, whether different regularization parameters have the impact on the results were tested.
Fixed or , these classification plots are shown in Fig.12 and Fig.13, where the axis denotes the range of values for , the axis denotes the range of values for and the axis is the accuracy on the test set. These classification curves fluctuates, which is difficult to find the global solution and determine the optimal parameters. For ADMM algorithm, we still need to set up related parameters in advance. The classification results of the proposed ASDN and ADMM are listed in Table 5 and Table 6. Our method still outperform than ADMM algorithm.
4.4 Comparisons of different classifiers
In the four experiment, some traditional classifiers, like ELM Miche et al. (2010)
, KNN
Zhang and Zhou (2007), SVM Shevade et al. (2000), sparsity adaptive matching pursuit (SAMP) Do et al. (2009), were compared with our method ASDN. The classification maps of the traditional classifiers are shown in Fig.14 and Fig.15, and the results are summarized in Table 7 and Table 8. For KNN, it is difficult to train the model and interpret them. The most innovative feature of the SAMP is its capability of signal reconstruction without prior information of the sparsity. The SVM, the oneagainstone strategy is applied. In these classifiers, SVM yields the excellent overall performance. But this classifier is still lower than the proposed ASDN, 1.4 in Pavia University and 1.54 in Salinas. Therefore, our method is superior to these classifiers.Class  ELM  KNN  SVM  SAMP  Adaptive Sparse 

Deep Network  
1  69.38  70.84  87.25  71.66  76.35 
2  91.50  91.40  96.30  90.62  95.33 
3  49.07  53.60  49.48  55.92  65.11 
4  81.56  74.20  86.24  75.29  92.38 
5  95.70  99.37  98.06  98.57  98.93 
6  58.38  43.92  53.72  46.73  71.65 
7  40.03  75.02  38.36  71.58  67.25 
8  59.66  73.70  80.22  70.07  77.33 
9  89.01  91.12  90.59  90.93  68.9 
75.18  77.76  83.62  77.62  85.02  
70.47  74.79  75.58  74.60  79.25  
66.57  70.47  77.79  69.94  80.07 
Class  ELM  KNN  SVM  SAMP  Adaptive Sparse 

Deep Network  
1  99.20  98.14  97.80  98.59  99.23 
2  99.65  98.24  98.60  97.38  97.58 
3  85.24  92.71  83.83  92.97  91.51 
4  93.84  98.97  98.23  97.28  97.69 
5  97.62  94.90  97.63  94.64  97.26 
6  99.65  99.58  99.59  99.27  98.54 
7  99.53  99.15  99.35  98.86  99.16 
8  79.63  68.47  88.16  69.55  85.76 
9  99.36  97.88  98.44  96.91  99.34 
10  92.25  88.09  79.68  85.76  79.80 
11  94.00  94.30  58.05  94.09  85.85 
12  96.43  99.77  99.31  99.95  98.62 
13  97.62  98.10  97.66  97.41  96.72 
14  90.42  87.44  91.43  91.64  96.05 
15  60.42  60.23  50.89  59.76  60.07 
16  95.32  89.60  89.64  88.26  88.31 
87.35  85.56  87.12  85.37  88.66  
91.52  91.60  89.28  91.39  91.97  
87.01  83.94  85.59  83.73  87.34 
5 Conclusions
In order to solve the problem that different of sparsity and regularization parameters has great influence on classification results in HSIs, the Adaptive Sparse Deep Network (ASDN) is presented in this paper, which is a deep architecture based on the data flow graph. Data transfer and feedback are the basis of ASDN. Based on ADMM algorithm, the deep data flow graph is built through the order of updating parameters in the sparse model. The proposed ASDN consists three parts: input layer, hidden layer and output layer. In Forward network, the undersampled data from HSIs flows over the input layer to output layer in a order of updating parameters, and generate the optimal sparse vector. Then for minimizing loss, parameters will be updating against by gradient computation in BackPropagation network. Therefore, in our method, related parameters are not manual setting in advance, which is different form other algorithms (greedy algorithms and algorithms) for solving this sparse model. Experiment results indicate that our method outperforms than several traditional classifiers or other algorithms for sparse model. In addition, we will make an attempt to extract new features from the HSIs for further improvements in experimental performance.
References
 Benediktsson et al. (2005) Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Transactions on Geoscience & Remote Sensing 2005, 43, 480–491.
 Iyer et al. (2017) Iyer, R.P.; Raveendran, A.; Bhuvana, S.K.T.; Kavitha, R. Hyperspectral image analysis techniques on remote sensing. Third International Conference on Sensing, 2017.
 Tu et al. (2019) Tu, B.; Zhang, X.; Kang, X.; Zhang, G.; Li, S. Density PeakBased Noisy Label Detection for Hyperspectral Image Classification. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2019, 57, 1573–1584.
 Lin et al. (2018) Lin, H.; Li, J.; Liu, C.; Li, S. Recent Advances on SpectralSpatial Hyperspectral Image Classification: An Overview and New Guidelines. IEEE Transactions on Geoscience & Remote Sensing 2018, PP, 1–19.

Wang et al. (2019)
Wang, A.; Wang, Y.; Chen, Y.
Hyperspectral image classification based on convolutional neural network and random forest.
REMOTE SENSING LETTERS 2019, 10, 1086–1094.  Donoho (2006) Donoho, D.L. Compressed sensing. IEEE Transactions on Information Theory 2006, 52, 1289–1306.
 Wang and Celik (2017) Wang, H.; Celik, T. Sparse RepresentationBased Hyperspectral Data Processing: Lossy Compression. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 2017, PP, 1–10.
 Julien et al. (2007) Julien, M.; Michael, E.; Guillermo, S. Sparse representation for color image restoration. IEEE Transactions on Image Processing 2007, 17, 53–69.
 Michael and Michal (2006) Michael, E.; Michal, A. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Tip 2006, 15, 3736–3745.
 Dong et al. (2013) Dong, W.; Zhang, L.; Shi, G.; Li, X. Nonlocally Centralized Sparse Representation for Image Restoration. IEEE TRANSACTIONS ON IMAGE PROCESSING 2013, 22, 1618–1628.
 He et al. (2017) He, Z.; Yi, S.; Cheung, Y.M.; You, X.; Tang, Y.Y. Robust Object Tracking via Key Patch Sparse Representation. IEEE TRANSACTIONS ON CYBERNETICS 2017, 47, 354–364.
 John et al. (2009) John, W.; Yang, A.Y.; Arvind, G.; S Shankar, S.; Yi, M. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis & Machine Intelligence 2009, 31, 210–227.
 Gao et al. (2017) Gao, Y.; Ma, J.; Yuille, A.L. SemiSupervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples. IEEE TRANSACTIONS ON IMAGE PROCESSING 2017, 26, 2545–2560.
 Wang et al. (2019) Wang, Q.; He, X.; Li, X. Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2019, 57, 911–923.
 Yi et al. (2011) Yi, C.; Nasrabadi, N.M.; Tran, T.D. Hyperspectral Image Classification Using DictionaryBased Sparse Representation. IEEE Transactions on Geoscience & Remote Sensing 2011, 49, 3973–3985.
 Zhang et al. (2014) Zhang, H.; Li, J.; Huang, Y.; Zhang, L. A Nonlocal Weighted Joint Sparse Representation Classification Method for Hyperspectral Imagery. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 2014, 7, 2056–2065.
 Baron et al. (2012) Baron, D.; Duarte, M.F.; Wakin, M.B.; Sarvotham, S.; Baraniuk, R.G. Distributed Compressive Sensing 2012.
 Fang et al. (2017) Fang, L.; Wang, C.; Li, S.; Benediktsson, J.A. Hyperspectral Image Classification via MultipleFeatureBased Adaptive Sparse Representation. IEEE Transactions on Instrumentation & Measurement 2017, 66, 1646–1657.
 Yan et al. (2019) Yan, J.; Chen, H.; Zhai, Y.; Liu, Y.; Liu, L. Regiondivisionbased joint sparse representation classification for hyperspectral images. IET Image Processing 2019, 13, 1694–1704.
 Wei et al. (2016) Wei, F.; Li, S.; Fang, L.; Kang, X.; Benediktsson, J.A. Hyperspectral Image Classification Via ShapeAdaptive Joint Sparse Representation. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 2016, 9, 1–1.
 Zhang et al. (2017) Zhang, Z.; Xu, Y.; Yang, J.; Li, X.; Zhang, D. A Survey of Sparse Representation: Algorithms and Applications. IEEE Access 2017, 3, 490–530.
 Wang et al. (2019) Wang, Y.; Zou, C.; Tang, Y.Y.; Li, L.; Shang, Z. Cauchy greedy algorithm for robust sparse recovery and multiclass classification. SIGNAL PROCESSING 2019, 164, 284–294.
 Sayed et al. (2019) Sayed, G.I.; Hassanien, A.E.; Azar, A.T. Feature selection via a novel chaotic crow search algorithm. NEURAL COMPUTING & APPLICATIONS 2019, 31, 171–188.
 Davenport and Wakin (2010) Davenport, M.A.; Wakin, M.B. Analysis of Orthogonal Matching Pursuit Using the Restricted Isometry Property. IEEE Transactions on Information Theory 2010, 56, 4395–4401.
 Dai and Milenkovic (2009) Dai, W.; Milenkovic, O. Subspace Pursuit for Compressive Sensing Signal Reconstruction. IEEE Transactions on Information Theory 2009, 55, 2230–2249.
 Needell and Vershynin (2010) Needell, D.; Vershynin, R. Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit. IEEE Journal of Selected Topics in Signal Processing 2010, 4, 310–316.
 Wang et al. (2011) Wang, J.; Kwon, S.; Shim, B. Generalized Orthogonal Matching Pursuit. IEEE Transactions on Signal Processing 2011, 60, 6202–6216.
 Beck and Teboulle (2009) Beck, A.; Teboulle, M. A Fast Iterative ShrinkageThresholding Algorithm for Linear Inverse Problems. Siam J Imaging Sciences 2009, 2, 183–202.
 Ye and Xie (2011) Ye, G.B.; Xie, X. Split Bregman method for large scale fused Lasso. Computational Statistics & Data Analysis 2011, 55, 1552–1569.
 Wright et al. (2009) Wright, S.J.; Nowak, R.D.; Figueiredo, M.A.T. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing 2009, 57, 2479–2493.

Boyd et al. (2010)
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J.
Distributed Optimization and Statistical Learning via the Alternating
Direction Method of Multipliers.
Foundations & Trends in Machine Learning
2010, 3, 1–122.  198 (1983) [Studies in Mathematics and Its Applications] Augmented Lagrangian Methods: Applications to the Numerical Solution of BoundaryValue Problems Volume 15 —— Chapter 1 Augmented Lagrangian Methods in Quadratic Programming; 1983.
 Fukushima (1992) Fukushima, M. Application of the alternating direction of multipliers to separable convex programming problems. Computational Optimization & Applications 1992, 1, 93–111.
 Spyridon Kontogiorgis (1998) Spyridon Kontogiorgis, R.R.M. A VariablePenalty Alternating Directions Method for Convex Optimization. Mathematical Programming 1998, 83, 29–53.
 Yang et al. (Nov.2018) Yang, Y.; Sun, J.; Huibin, L.I.; Xu, Z. ADMMCSNet: A Deep Learning Approach for Image Compressive Sensing. IEEE Transactions on Pattern Analysis and Machine Intelligence Nov.2018, PP, 1–1.
 Luo et al. (2018) Luo, Y.; Zou, J.; Yao, C.; Li, T.; Bai, G. HSICNN: A Novel Convolution Neural Network for Hyperspectral Image 2018.
 Slavkovikj et al. (2015) Slavkovikj, V.; Verstockt, S.; Neve, W.D.; Hoecke, S.V.; Walle, R.V.D. Hyperspectral Image Classification with Convolutional Neural Networks. 2015.
 Yuan et al. (2019) Yuan, Q.; Zhang, Q.; Li, J.; Shen, H.; Zhang, L. Hyperspectral Image Denoising Employing a SpatialSpectral Deep Residual Convolutional Neural Network. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2019, 57, 1205–1218.
 Miche et al. (2010) Miche, Y.; Sorjamaa, A.; Bas, P.; Jutten, C.; Lendasse, A. OPELM: optimally pruned extreme learning machine. IEEE Transactions on Neural Networks 2010, 21, 158–162.
 Zhang and Zhou (2007) Zhang, M.L.; Zhou, Z.H. MLKNN: A lazy learning approach to multilabel learning. Pattern Recognition 2007, 40, 2038–2048.
 Shevade et al. (2000) Shevade, S.K.; Keerthi, S.S.; Bhattacharyya, C.; Murthy, K.R.K. Improvements to the SMO algorithm for SVM regression. IEEE Transactions on Neural Networks 2000, 11, 1188–1193.
 Do et al. (2009) Do, T.T.; Gan, L.; Nguyen, N.; Tran, T.D. Sparsity adaptive matching pursuit algorithm for practical compressed sensing. Conference on Signals, Systems & Computers, 2009.
Comments
There are no comments yet.