Hyperspectral Image Classification Based on Adaptive Sparse Deep Network

10/21/2019 ∙ by Jingwen Yan, et al. ∙ 0

Sparse model is widely used in hyperspectral image classification.However, different of sparsity and regularization parameters has great influence on the classification results.In this paper, a novel adaptive sparse deep network based on deep architecture is proposed, which can construct the optimal sparse representation and regularization parameters by deep network.Firstly, a data flow graph is designed to represent each update iteration based on Alternating Direction Method of Multipliers (ADMM) algorithm.Forward network and Back-Propagation network are deduced.All parameters are updated by gradient descent in Back-Propagation.Then we proposed an Adaptive Sparse Deep Network.Comparing with several traditional classifiers or other algorithm for sparse model, experiment results indicate that our method achieves great improvement in HSI classification.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Hyperspectral images (HSIs) have abundant spectral information and high spatial resolution, which has been widely used to scene classification, object detection and environmental monitoring, etc 

Benediktsson et al. (2005); Iyer et al. (2017); Tu et al. (2019). In recent year, hyperspectral image classification has become a research hotspot in recent years and plays an important role in HSI analysis Lin et al. (2018). Multiple classification techniques have been attempted and achieved good performance Wang et al. (2019).

In recent years, sparse representation has attracted more and more attention from scholars. It has the advantage of high accuracy and fast processing speed, and it does not need to make statistics and assumptions on sample distribution Donoho (2006). Therefore, it has been a powerful tool for signal processing and analysis, widely applied to many fields, such as data compression Wang and Celik (2017), image restoration Julien et al. (2007); Michael and Michal (2006); Dong et al. (2013)

, and face recognition  

He et al. (2017); John et al. (2009); Gao et al. (2017). Recently, many classification model based on sparse representation have been proposed and have demonstrated superiority in hyperspectral image classification Wang et al. (2019). In 2011, Chen et al. proposed a region-based sparse representation for hyperspectral image classification, rooted in the assumption that every hyperspectral pixel belonging to one class can be represented by a common sub-dictionary consisting of training samples from the same class Yi et al. (2011). In the experiment, in order to achieve the higher classification accuracy, the authors got the relative optimal values by setting up different parameter values (eg.sparsity level, weighting factor and neighborhood size). In 2014, Zhang et al. proposed a method called non local weighed joint sparse representation classification (NLW-JSRC) to compensate for the deficiency of utilizing different weights for different neighboring pixels around the central test pixel in joint sparsity model (JSM) Zhang et al. (2014); Baron et al. (2012). The NLW-JSRC method aims to ensure that the pixels which are similar to the central test pixel contribute more to the classification process, with a larger weight, and vice versa, which supports an improved hyperspectral image classification performance. However, the authors manually set parameter values (eg.sparsity level and neighborhood size) for classification accuracy in the experiment. In 2017, Fang et al proposed a multiple-feature-based adaptive sparse representation (MFASR) method for the hyperspectral images classification Fang et al. (2017). The MFASR utilizes an adaptive sparse representation to effectively exploit the correlations among four extracted features from the original HSI. Finally the authors still need to set different parameter values for achieving the higher classification accuracy. Therefore, for these hyperspectral image classification based on sparse representation, sparsity level is be paid more attention to Yan et al. (2019); Wei et al. (2016).

A hyperspectral pixel can be represented by a dictionary consisting of different classes extracted training samples from sparse samples in sparse representation classification model. An unknown pixel can be expressed as a sparse vector whose a few nonzero entries correspond to the weight of the selected training samples. The sparse vector can be recovered by solving a sparsity-constrained optimization problem (

minimization problem), and the class label of the test hyperspectral pixels can be directly determined by the property of the recovered sparse vectors. Therefore, how to solve the sparsity optimization problem is the key of sparse model Zhang et al. (2017). Thera are many algorithms have been proposed and applied to solve the sparsity optimization problem including the greedy algorithms Wang et al. (2019) and some optimization algorithms Sayed et al. (2019). However, the classification results based on optimization algorithms depend on two parameters: the regularization parameter and the sparsity level parameter . For the greedy algorithms, such as Orthogonal Matching Pursuit (OMP) Davenport and Wakin (2010) and Subspace Pursuit (SP) Dai and Milenkovic (2009), they need to set up the appropriate in advance. Even the now improved OMP algorithms were proposed. Deanna Needell proposed the Regularized Orthogonal Matching Pursuit (ROMP) that combines the speed and ease of implementation of the greedy methods with the strong guarantees of the convex programming methods Needell and Vershynin (2010). Meanwhile, for the pursuing efficiency in reconstructing sparse signals, Jian W et al. proposed the generalized OMP (gOMP) that is finished with much smaller number of iterations when compared to OMP Wang et al. (2011). But parameter still need to be considered in advance. For optimization algorithms, such as a Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) Beck and Teboulle (2009), Bregman Ye and Xie (2011) and the framework of Sparse Reconstruction by Separable Approximation (SpaRSA) Wright et al. (2009), all these algorithms need to set regularization parameter in advance when solving optimization problems.

The parameter values need to be set artificially and they are selected depending on producing better results by repeating the algorithm many times. When the result is not met the target requirement, the parameters values will be adjusted again. We can’t test all the values and then to choose the best one, so that the theoretical optimal parameter values cannot be obtained and only the relative optimal parameter values can be selected. The results of parameters values are set artificially and the non-adaptability or non-automation during the solution process, which limit the application of the sparse representation method.

To address the issues of the parameters values in the hyperspectral image classification based on sparse representation, Alternating Direction Method of Multipliers (ADMM) has been proposed and it is an efficient variable splitting algorithm with convergence guarantee. It considers the augmented Lagrangian function of a given sparsity model, and splits variables into subgroups, which can be alternatively optimized by solving a few simply subproblems Boyd et al. (2010). Although ADMM is generally efficient, it is not easy to determine the optimal parameters (e.g. penalty parameters) influencing the classification accuracy and speed in the sparse model. Many applications have shown that penalty parameters which be chosen too large or too small can lead to a significant influence time of the solution or result of classification 198 (1983); Fukushima (1992); Spyridon Kontogiorgis (1998). Recently, Yang et al. proposed a Deep ADMM-Net for compressive sensing MRI, which greatly resolved parameter setting in this problem Yang et al. (Nov.2018). The author firstly devised a network structure based on ADMM algorithm. Related parameters are updated end-to-end using L-BFGS algorithm automatically rather than set up in advance.

In this paper, in order to construct the optimal parameters in a sparse model, we propose the Adaptive Sparse Deep Network based on a novel deep architecture. Our work includes two parts. Firstly, we introduce the iterative procedures in ADMM algorithm for optimizing a sparse model. The data flow graph is shown in Fig.1

. It consists of multiple stages are each of which corresponds to an iteration in ADMM algorithm. The proposed deep architecture is visible and not a black-box optimizers, different with a fully connected network or convolution network in deep learning 

Luo et al. (2018); Slavkovikj et al. (2015); Yuan et al. (2019)

. Then, similar to the Feedforward Neural Network (FNN), the Adaptive Sparse Deep Network is divided into three parts, namely input layer, hidden layer and output layer. Meanwhile, in the hidden layer, it has three types of operations, corresponding to different learnable parameters. Finally, all parameters can be updated by gradient descent in Backward-Propagation. An optimal sparse vector can be obtained by our method. The optimal results of hyperspectral image classification can be got directly.

The remainder of this paper is structured as follows. Section 2 reviews the sparse representation classification and hyperspectral sparse model on ADMM algorithm. Section 3 proposes the Adaptive Sparse Deep Network for the hyperspectral image classification. The experimental results of the proposed deep architecture with two hyperspectral images are given in Section 4. Finally, Section 5 summarizes this paper and makes some closing remarks.

Figure 1: The whole data flow graph for the Adaptive Sparse Deep Network of hyperspectral sparse model.
Figure 2: The Sparse Representation

2 Hyperspectral sparse model based on the alternating direction methods of multipliers

For a hyperspectral pixel , where L is the number of spectral bands, means the special line of a certain spatial location in HSI. Given a dictionary , where is the number of the classes, is a sub-dictionary, which is composed of pixels selected from the class HSI. According to sparse representation, a hyperspectral pixel can be represented by a linear combination of the dictionary . As can been seen in Fig.2, the can be represented as follows:

(1)

where is sparse vector for the and is an error residual item. Here can be obtained by solving following constrained optimization problem and hyperspectral sparse model is shown as follows:

(2)

where is a given upper bound on the sparsity level, which is equal to the upper bound number of nonzeros rows in . Besides, for solving the problem (2), the greedy algorithms can be used to solve the optimization problem. However, the sparsity level parameter are needed to set up in advance. Higher the sparsity will lead to higher computational cost and worse in the classification performance because misleading the dictionary atoms form the wrong classes to be selected.

The optimization problem (2

) also can be converted to a linear programming problem by replacing

with . It is shown as follows:

(3)

By adding the regularization parameter , the desired is as sparse as possible and the error is as small as possible. The problem (3) can also be described as:

(4)

There are many optimization algorithms for solving problem (4), such as FISTA Beck and Teboulle (2009), Bregman iterative algorithm Ye and Xie (2011), and SpaRSA Wright et al. (2009). For all of the methods, the regularization parameter provides a tradeoff between fidelity to the measurements and the sensitivity, which still need to set up in advance by human experience.

In order to solve the issue of finding the optimal solutions in problem (4), ADMM, a simple but powerful algorithm that is well suited to distributed convex optimization and takes the form of a decomposition-coordination procedure, in which the solutions to small local subproblems are coordinated to find a solution to a large global problem. ADMM can be viewed as an attempt to blend the benefits of dual decomposition and augmented Lagrangian methods for constrained optimization Boyd et al. (2010). Therefore, in this paper, this method is suitable for solving problem (4).

By introducing auxiliary variables , problem (4) is equivalent to:

(5)

Its augmented Lagrangian function is:

(6)

where is Lagrangian multiplier and is penalty parameter.

For the sake of simplicity, formula (6) can also be written in the following form:

(7)

here, let .

In the sparse model, it commonly need to run the ADMM algorithm in dozens of iterations to get a optimal sparse vector. Once is obtained, the class of the hyperspectral pixel can be directly determined. Comparing with the reconstruction results of each class, we classify by assigning it to the object class that minimizes the residual :

(8)

Experience on applications has shown that the number of iterations depends significantly on the penalty parameter . If the fixed penalty parameter is chosen too small or too large, the solution time can increase significantly. Just like sparsity level or regularization parameter above, they are unknown and need to be predetermined. After determining theirs value, the model is solved and the classification result is compared with the real labels. If the result is not met, the parameters will be adjusted again, which results in the non-adaptability or non-automation of the solution process.

In the following section, some self-adaptive rules are applied for adjusting the penalty parameter and regularization parameter. We can use the gradient descent by Back-Propagation to solve this problem. Updating the corresponding parameters by the gradient computation, the self-adaptive parameters can be obtained. In the next section, the detail of a deep architecture based on ADMM algorithm will be introduced.

3 The Adaptive Sparse Deep Network

Parameters setting are mostly depended on human experience, which is unsure that the obtained solution is the global optimal result. Our work includes two parts. Firstly, the Adaptive Sparse Deep Network (ASDN) is designed based on ADMM algorithm. A deep data flow graph is shown in Fig.3. Each update iteration based on ADMM is corresponding to a stage in Fig.3. Secondly, the detail about updating the parameters by the gradient decent in the Back-Propagation will be introduced. Clear network structure and data conduction relationships are shown in our methods. In ASDN, all parameters are self-adaptively updated rather than manual setting.

3.1 A Data Flow Graph of Updating Order of Parameters

Data transfer and feedback are the basis of deep networks. Firstly, a data flow graph is built, while the order of updating parameters are the key to it. Here, problem (7) can be solved by the following three subproblems based on ADMM algorithm:

(9)

Let , then the three subproblems have the following solutions:

(10)

where is a relaxation parameter that can improve convergence in ADMM, is a nonlinear shrinkage function and denotes an update rate. Therefore, in ADMM algorithm, the parameters updates as following order: the sparse vector (), then the non-linear vector () and finally the multiplier ().

Based on the order of updating parameters, a deep data flow graph is devised which mapping the iteration of the ADMM algorithm. As shown in Fig.3, this graph consists of nodes corresponding to different operation in ADMM, and directed edges corresponding to the data flows between operations. Each update (9) is considered as a stage. Similarity, the iteration of ADMM algorithm corresponds to the stage of the data flow graph. In the stage of the graph, there are three types of nodes mapped from three types of operation in ADMM, i.e., sparse vector operation (), nonlinear transform operation () defined by , and multiplier update operation (). Therefore, the whole data flow graph corresponds well to each update iteration in ADMM. Given an under-sampled data from HSIs, it flows over the graph and finally obtains the optimal sparse vector . In the following subsections, the Adaptive Sparse Deep Network will be proposed, which is a deep architecture based on the data flow graph.

Figure 3: The data flow graph for the Adaptive Sparse Deep Network. The graph consists of three types of nodes: sparse vector (), non-linear vector () and multiplier ().

3.2 The Architecture of the Adaptive Sparse Deep Network

In this subsection, we propose a deep architecture dubbed Adaptive Sparse Deep Network. It mainly consists three parts: input layer, hidden layer and output layer. Meanwhile, in the hidden layer, it has three types of operations corresponding to different learnable parameters.

Sparsity operation: This operation obtains a sparse vector according to Eqn.(9) and Eqn.(10). Given and , the output of the node is defined as:

(11)

where are learnable parameters. In the first stage , and are initialized to zeros. Therefore .

Nonlinear transform operation: This operation performs nonlinear transform inspired by the ADMM algorithm in a sparse model in Eqn.(9) and Eqn.(10). Given and , the output of the node is defined as:

(12)

where are learnable parameters.

Multiplier update operation: This operation is defined by the ADMM algorithm in Eqn.(9) and Eqn.(10). The output of this node in stage is defined as:

(13)

where are learnable parameters.

Figure 4: An example of Adaptive Sparse Deep Network. It mainly consists of three parts, namely: input layer, hidden layer and output layer.

In Forward Network, each node belongs to different operation. Each node can receive values from the former of nodes and produce the output values to the following nodes. There is no feedback in the whole network during training. The value propagates unidirectionally from the input layer to the output layer, which can be represented by a directed acyclic graph. As shown in Fig.4, it is the Adaptive Sparse Deep Network. The under-sampled data from HSIs flows over the input layer to output layer in a order from circled number 1 to number 10, followed by a final sparse vector with circled number 11 and generate the optimal sparse vector. Then we can obtain the classification of HSIs. Fig.4 greatly illustrates the deep architecture described above.

After training the Adaptive Sparse Deep Network, there exists errors in the results. We can minimize losses by using gradient descent method in Back-Propagation. By computing the loss function, the parameter of each node can be updated. Therefore, we get the latest parameters to train the network against. The optimization of the parameters is expected to obtain optimal sparse vector, which produces improved classification result.

Firstly, the each class of residual error between the network output and truth is defined as

(14)

Then the loss function can be defined as:

(15)

where are the network parameters. In the backward pass, we aim to learn the following parameters: in sparsity operation, in nonlinear transform operation and in multiplier update operation. In the following subsection, we discuss how to compute the gradients of the loss function and the parameters can be updated by using Back-propagation over the data flow graph.

Figure 5: shows three types of nodes and the data flow over them:(a)Multiplier operation,(b)Nonlinear transform operation and (c)Sparsity operation.

3.3 Gradient Computation by Back-Propagation

In FNN, the gradient propagates backward through each layer, further calculating the gradient of each layer’s parameters. Similarity, in our Adaptive Sparse Deep Network, the gradients are computed in an inverse order. Then the parameters can be learned by Back-Propagation. Fig.4 shows an example, where the gradient can be computed backwardly from the operation with circled number 11 to 2 successively. For a stage, Fig.5 shows three types of nodes and the data flow over them. Each node has multiple inputs and outputs. We next introduce the detail of gradients computation for each node in a typical stage.

Multiplier update operation: As shown in Fig.5(a), this operation has three inputs: , and . Its output is the input to compute , and . Here the parameters can be updated. The gradients of loss and the parameters can be computed as:

(16)
(17)

We also compute gradient of the output in this operation and its inputs:, and .

Nonlinear transform update operation: As shown in Fig.5(b), this operation has two inputs: , and its output is the input for computing and in next stage. Here the parameters can be updated. The gradients of loss and the parameters can be computed as:

(18)
(19)

We also compute the gradients of this operation output to its inputs: and .

Sparsity update operation: As shown in Fig.5(c), this operation has two inputs: , and its output is the input for computing and in the same stage. Here the parameters can be updated. The gradients of loss and the parameters can be computed as:

(20)
(21)

The gradients of this operation output to its inputs: and .

Class Name Dictionary Train Test
1 Asphalt 66 597 5968
2 Meadows 186 932 17531
3 Gravel 21 189 1889
4 Trees 31 276 2757
5 Painted metal sheets 13 269 1063
6 Bare Soil 50 453 4526
7 Bitumen 13 266 1051
8 Self-Blocking Bricks 37 331 3314
9 Shadows 9 189 749
Total 426 3502 38848
Table 1: Nine classes in the ROSIS Urban Pavia University data and the training and test set for each class
Figure 6: Reference map of Pavia University. (a)first principal component map of Pavia University; (b)false color image; (c)label map of the ground truth
Class Name Dictionary Train Test
1 Weeds-1 20 181 1808
2 Weeds-2 37 335 3334
3 Fallow 20 178 1778
4 Fallow-rough-plow 14 125 1255
5 Fallow-smooth 27 241 2410
6 Stubble 40 356 3563
7 Celery 36 322 3221
8 Grapes-untrained 113 1014 10144
9 Soil-vinyard-develop 62 558 5583
10 Corn-senesced-green weeds 33 295 2950
11 Lettuce-romaine-4wk 11 96 961
12 Lettuce-romaine-5wk 19 173 1735
13 Lettuce-romaine-6wk 9 82 825
14 Lettuce-romaine-7wk 11 96 963
15 Vineyard-untrained 73 654 6541
16 Vineyard-vertical-trellis 18 163 1626
Total 543 4869 48697
Table 2: Sixteen classes in the AVIRIS Salinas data and the training and test set for each class
Figure 7: Reference map of Salinas. (a)first principal component map of Salinas; (b)false color image; (c)label map of the ground truth

4 Experimential Results and Analysis

In this section, four experiments were separately conducted on two real HSIs to validate the superiority of the proposed Adaptive Sparse Deep Network (ASDN).

The first hyperspectral image in our experiments is the University of Pavia, which was captured by the Reflective Optics System Imaging Spectrometer(ROSIS) optical sensor over an urban area surrounding the University of Pavia, Italy. The size of this image is 610 340 115 with a spatial resolution of 1.3 per pixel and a spectral coverage ranging from 0.43 to 0.86 . In our experiment, the 12 very noisy channels were removed. The first principal component map of Pavia University and its false color image are shown in Fig.6(a) and Fig.6(b). For this date set with 9 classes of land cover, we randomly select 1 of each class of samples for dictionary and the remainder was used for training and testing. The reference contents are shown in Table LABEL:pavia and the label map of ground truth is shown in Fig.6(c).

The second hyperspectral image in our experiments is the Salinas, which was acquired by the Airborne/Visible Infrared Imaging Specrometer(AVIRIS) sensor over Salinas Valley, California. The size of this image is 512 217 224 with a spatial resolution of 3.7 per pixel. Similar to the Pavia University image, 20 water absorption bands(108-112,154-167,and 224) are degraded, and the remaining 204 bands were preserved for the following experiment. The first principal component map of Pavia University and its false color image are shown in Fig.7(a) and Fig.7(b). For this date set with 16 classes of land cover, we randomly select 1 of each class of samples for dictionary and the remainder was used for training and testing. The reference contents are shown in Table.LABEL:salinas and the label map of ground truth is shown in Fig.7(c).

In these experiments, the influence of different parameters on the classification results will be tested. The overall accuracy (), average accuracy (), and kappa coefficient () are adopted in these experiments to evaluate the qualify of the classification result. All the experiments are conducted on a 3.00 GHZ computer with 64.0 Gb RAM.

4.1 Comparisons of Different Sparsity Levels

In the first experiment, the proposed ASDN was compared with the greedy algorithms, such as SP Dai and Milenkovic (2009), OMP Davenport and Wakin (2010), ROMP Needell and Vershynin (2010) and gOMP Wang et al. (2011). Different sparsity level were verified to have an impact on the performance of classification results.

Firstly, we demonstrate the effect of the sparsity level on the classification results. Then the results are averaged over five runs at each to avoid any bias induced by random sampling. These classification accuracy plots on the entire test sets are shown in Fig.8 and Fig.9. The sparsiy level ranges from 1 to 10. In Fig.8 and Fig.9, the curves are volatile and unstable. When the sparsity level , the classification accuracy achieves higher on OMP algorithm. We are still unsure that whether this sparsity level is optimal. Other greedy algorithms are similar to above. Therefore, it is the drawback of manually setting parameters. Secondly, the classification results of our method and several greedy algorithms are shown in the Table 3 and Table 4, including OA, AA and kappa coefficient. In Pavia University, the OA of ASDN is higher than SP, OMP, ROMP, gOMP, which is 3.87, 6.47, 4.78 and 6.84 respectively. Similarly, in Salinas, the OA of ASDN is higher than SP, OMP, ROMP, gOMP, which is 1.13, 2.09, 0.73 and 1.95 respectively. Although the SP algorithm and ROMP algorithm showed superior performance respectively, our method outperforms these greedy algorithms.

Figure 8: Different sparsity levels have influence on the classification results for Pavia University. (a)OA; (b)AA; (c)Kappa
Figure 9: Different sparsity levels have influence on the classification results for Salinas. (a)OA; (b)AA; (c)Kappa
Class SP OMP ROMP gOMP Adaptive Sparse
Deep Network
1 77.22 70.05 69.75 67.82 76.35
2 92.86 92.71 94.55 93.99 95.33
3 58.81 54.33 55.49 55.39 65.11
4 84.46 78.21 83.79 78.13 92.38
5 98.82 99.45 99.17 99.70 98.93
6 53.46 44.20 50.65 42.57 71.65
7 69.88 72.04 69.02 72.44 67.25
8 73.21 76.93 76.50 72.03 77.33
9 85.45 82.00 75.88 80.58 68.9
81.15 78.55 80.24 78.18 85.02
77.13 74.43 75.71 73.63 79.25
74.69 71.06 73.37 70.50 80.07
Table 3: Classification accuracy () for Pavia University using different greedy algorithms
Class SP OMP ROMP gOMP Adaptive Sparse
Deep Network
1 97.57 97.18 98.63 97.45 99.23
2 98.46 98.32 98.94 98.54 97.58
3 93.46 94.51 95.45 93.51 91.51
4 98.72 97.80 99.11 98.02 97.69
5 97.07 93.84 95.91 94.51 97.26
6 99.51 99.80 99.72 99.83 98.54
7 99.25 99.34 99.33 99.51 99.16
8 75.56 73.82 78.02 74.93 85.76
9 98.00 97.88 98.73 98.14 99.34
10 86.78 85.57 89.52 86.59 79.80
11 95.58 96.37 93.50 97.48 85.85
12 99.97 99.63 99.11 99.53 98.62
13 97.98 97.39 98.45 97.57 96.72
14 92.04 90.68 93.17 87.35 96.05
15 61.95 58.93 58.06 57.31 60.07
16 92.28 94.31 94.61 95.36 88.31
87.53 86.57 87.93 86.71 88.66
92.76 92.21 93.14 92.23 91.97
86.12 85.05 86.55 85.19 87.34
Table 4: Classification accuracy () for Salinas using different greedy algorithms

4.2 Comparisons of Different Regularization Parameters

In the second experiment, the proposed ASDN was compared with the Bregman Ye and Xie (2011), FISTA Beck and Teboulle (2009), and the SpaRSR Wright et al. (2009). These algorithms are used to solve our problem 4 but the corresponding regularization parameters are need to manual setting in advance. Whether different regularization parameters affect the classification results were tested.

Different regularization parameters are applied to the problem 4. All results are averaged over five runs at each to avoid any bias induced by random sampling. The classification plots are shown in Fig.10 and Fig.11. As the increases, the classification curves of the Bregman and FISTA Steadily rise. But it is hard to determine the optimal regularization parameters. As for the curves of the SpaRSR fluctuates. We are still unsure the global optimal solutions. For these algorithms, setting the regularization parameters in advance is inevitable. Then we list the classification results of the proposed ASDN and several algorithms in Table 5 and Table 6. In terms of OA, the FISTA achieves the highest accuracy except for the proposed ASDN. Our method is higher than FISTA, which is 2.93 and 2.81 in Pavia University and Salinas. The proposed ASDN still achieves the highest accuracy compared with algorithms.

Figure 10: Different regularization parameters have influence on the classification results for Pavia University. (a)OA; (b)AA; (c)Kappa
Figure 11: Different regularization parameters have influence on the classification results for Salinas. (a)OA; (b)AA; (c)Kappa
Class Bregman FISTA SpaRSR ADMM Adaptive Sparse
Deep Network
1 94.87 84.41 94.20 88.92 76.35
2 96.90 98.64 91.33 92.13 95.33
3 68.19 47.93 3.01 48.78 65.11
4 89.75 78.55 80.22 81.93 92.38
5 99.65 99.40 96.96 99.40 98.93
6 47.03 36.28 28.23 52.08 71.65
7 17.78 59.35 0.23 23.38 67.25
8 32.69 78.09 22.11 45.30 77.33
9 69.33 93.12 94.98 91.71 68.9
80.29 82.09 70.71 78.12 85.02
68.47 75.09 56.82 69.29 79.25
72.98 75.34 59.60 70.46 80.07
Table 5: Classification accuracy () for Pavia University using different algorithms and ADMM algorithm
Class Bregman FISTA SpaRSR ADMM Adaptive Sparse
Deep Network
1 99.77 97.48 79.38 100.00 99.23
2 95.97 98.48 99.32 18.55 97.58
3 42.0 79.55 0.1 9.30 91.51
4 98.99 98.55 95.07 99.06 97.69
5 98.87 97.32 28.06 99.59 97.26
6 99.78 99.29 99.16 99.64 98.54
7 99.77 99.44 16.48 99.07 99.16
8 91.98 83.25 96.50 85.98 85.76
9 99.64 97.28 99.69 98.63 99.34
10 92.51 76.52 33.65 45.33 79.80
11 34.41 90.54 0 0 85.85
12 98.29 99.69 8.91 30.78 98.62
13 99.26 98.01 50.00 98.90 96.72
14 92.19 90.46 81.11 88.20 96.05
15 37.68 48.41 4.27 46.41 60.07
16 97.33 88.81 65.72 82.49 88.31
85.34 85.85 61.05 72.09 88.66
86.15 90.19 53.59 68.87 91.97
83.56 84.18 55.45 68.71 87.34
Table 6: Classification accuracy () for Salinas using different algorithms and ADMM algorithm

4.3 Comparisons of ADMM

In the third experiment, the proposed ASDN was compared with ADMM on different penalty parameters or regularization parameters. Firstly, we demonstrated the effect of the penalty parameter on the classification results. Similarly, whether different regularization parameters have the impact on the results were tested.

Fixed or , these classification plots are shown in Fig.12 and Fig.13, where the -axis denotes the range of values for , the -axis denotes the range of values for and the -axis is the accuracy on the test set. These classification curves fluctuates, which is difficult to find the global solution and determine the optimal parameters. For ADMM algorithm, we still need to set up related parameters in advance. The classification results of the proposed ASDN and ADMM are listed in Table 5 and Table 6. Our method still outperform than ADMM algorithm.

Figure 12: Classification accuracy for Pavia University using ADMM algorithm. (a)ADMM on different penalty parameters ; (b)ADMM on different regularization parameters
Figure 13: Classification accuracy for Salinas using ADMM algorithm.(a)ADMM on different penalty parameters ; (b)ADMM on different regularization parameters

4.4 Comparisons of different classifiers

In the four experiment, some traditional classifiers, like ELM Miche et al. (2010)

, KNN 

Zhang and Zhou (2007), SVM Shevade et al. (2000), sparsity adaptive matching pursuit (SAMP) Do et al. (2009), were compared with our method ASDN. The classification maps of the traditional classifiers are shown in Fig.14 and Fig.15, and the results are summarized in Table 7 and Table 8. For KNN, it is difficult to train the model and interpret them. The most innovative feature of the SAMP is its capability of signal reconstruction without prior information of the sparsity. The SVM, the one-against-one strategy is applied. In these classifiers, SVM yields the excellent overall performance. But this classifier is still lower than the proposed ASDN, 1.4 in Pavia University and 1.54 in Salinas. Therefore, our method is superior to these classifiers.

Figure 14: Classification map for the Pavia University.(a)ELM; (b)KNN; (c)SVM; (d)SAMP; (e)Adaptive Sparse Deep Network
Figure 15: Classification map for the Salinas. (a)ELM; (b)KNN; (c)SVM; (d)SAMP; (e)Adaptive Sparse Deep Network
Class ELM KNN SVM SAMP Adaptive Sparse
Deep Network
1 69.38 70.84 87.25 71.66 76.35
2 91.50 91.40 96.30 90.62 95.33
3 49.07 53.60 49.48 55.92 65.11
4 81.56 74.20 86.24 75.29 92.38
5 95.70 99.37 98.06 98.57 98.93
6 58.38 43.92 53.72 46.73 71.65
7 40.03 75.02 38.36 71.58 67.25
8 59.66 73.70 80.22 70.07 77.33
9 89.01 91.12 90.59 90.93 68.9
75.18 77.76 83.62 77.62 85.02
70.47 74.79 75.58 74.60 79.25
66.57 70.47 77.79 69.94 80.07
Table 7: Classification accuracy () for Pavia University using different classifiers
Class ELM KNN SVM SAMP Adaptive Sparse
Deep Network
1 99.20 98.14 97.80 98.59 99.23
2 99.65 98.24 98.60 97.38 97.58
3 85.24 92.71 83.83 92.97 91.51
4 93.84 98.97 98.23 97.28 97.69
5 97.62 94.90 97.63 94.64 97.26
6 99.65 99.58 99.59 99.27 98.54
7 99.53 99.15 99.35 98.86 99.16
8 79.63 68.47 88.16 69.55 85.76
9 99.36 97.88 98.44 96.91 99.34
10 92.25 88.09 79.68 85.76 79.80
11 94.00 94.30 58.05 94.09 85.85
12 96.43 99.77 99.31 99.95 98.62
13 97.62 98.10 97.66 97.41 96.72
14 90.42 87.44 91.43 91.64 96.05
15 60.42 60.23 50.89 59.76 60.07
16 95.32 89.60 89.64 88.26 88.31
87.35 85.56 87.12 85.37 88.66
91.52 91.60 89.28 91.39 91.97
87.01 83.94 85.59 83.73 87.34
Table 8: Classification accuracy () for Salinas using different classifiers

5 Conclusions

In order to solve the problem that different of sparsity and regularization parameters has great influence on classification results in HSIs, the Adaptive Sparse Deep Network (ASDN) is presented in this paper, which is a deep architecture based on the data flow graph. Data transfer and feedback are the basis of ASDN. Based on ADMM algorithm, the deep data flow graph is built through the order of updating parameters in the sparse model. The proposed ASDN consists three parts: input layer, hidden layer and output layer. In Forward network, the under-sampled data from HSIs flows over the input layer to output layer in a order of updating parameters, and generate the optimal sparse vector. Then for minimizing loss, parameters will be updating against by gradient computation in Back-Propagation network. Therefore, in our method, related parameters are not manual setting in advance, which is different form other algorithms (greedy algorithms and algorithms) for solving this sparse model. Experiment results indicate that our method outperforms than several traditional classifiers or other algorithms for sparse model. In addition, we will make an attempt to extract new features from the HSIs for further improvements in experimental performance.

References

  • Benediktsson et al. (2005) Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Transactions on Geoscience & Remote Sensing 2005, 43, 480–491.
  • Iyer et al. (2017) Iyer, R.P.; Raveendran, A.; Bhuvana, S.K.T.; Kavitha, R. Hyperspectral image analysis techniques on remote sensing. Third International Conference on Sensing, 2017.
  • Tu et al. (2019) Tu, B.; Zhang, X.; Kang, X.; Zhang, G.; Li, S. Density Peak-Based Noisy Label Detection for Hyperspectral Image Classification. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2019, 57, 1573–1584.
  • Lin et al. (2018) Lin, H.; Li, J.; Liu, C.; Li, S. Recent Advances on Spectral-Spatial Hyperspectral Image Classification: An Overview and New Guidelines. IEEE Transactions on Geoscience & Remote Sensing 2018, PP, 1–19.
  • Wang et al. (2019) Wang, A.; Wang, Y.; Chen, Y.

    Hyperspectral image classification based on convolutional neural network and random forest.

    REMOTE SENSING LETTERS 2019, 10, 1086–1094.
  • Donoho (2006) Donoho, D.L. Compressed sensing. IEEE Transactions on Information Theory 2006, 52, 1289–1306.
  • Wang and Celik (2017) Wang, H.; Celik, T. Sparse Representation-Based Hyperspectral Data Processing: Lossy Compression. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 2017, PP, 1–10.
  • Julien et al. (2007) Julien, M.; Michael, E.; Guillermo, S. Sparse representation for color image restoration. IEEE Transactions on Image Processing 2007, 17, 53–69.
  • Michael and Michal (2006) Michael, E.; Michal, A. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Tip 2006, 15, 3736–3745.
  • Dong et al. (2013) Dong, W.; Zhang, L.; Shi, G.; Li, X. Nonlocally Centralized Sparse Representation for Image Restoration. IEEE TRANSACTIONS ON IMAGE PROCESSING 2013, 22, 1618–1628.
  • He et al. (2017) He, Z.; Yi, S.; Cheung, Y.M.; You, X.; Tang, Y.Y. Robust Object Tracking via Key Patch Sparse Representation. IEEE TRANSACTIONS ON CYBERNETICS 2017, 47, 354–364.
  • John et al. (2009) John, W.; Yang, A.Y.; Arvind, G.; S Shankar, S.; Yi, M. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis & Machine Intelligence 2009, 31, 210–227.
  • Gao et al. (2017) Gao, Y.; Ma, J.; Yuille, A.L. Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples. IEEE TRANSACTIONS ON IMAGE PROCESSING 2017, 26, 2545–2560.
  • Wang et al. (2019) Wang, Q.; He, X.; Li, X. Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2019, 57, 911–923.
  • Yi et al. (2011) Yi, C.; Nasrabadi, N.M.; Tran, T.D. Hyperspectral Image Classification Using Dictionary-Based Sparse Representation. IEEE Transactions on Geoscience & Remote Sensing 2011, 49, 3973–3985.
  • Zhang et al. (2014) Zhang, H.; Li, J.; Huang, Y.; Zhang, L. A Nonlocal Weighted Joint Sparse Representation Classification Method for Hyperspectral Imagery. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 2014, 7, 2056–2065.
  • Baron et al. (2012) Baron, D.; Duarte, M.F.; Wakin, M.B.; Sarvotham, S.; Baraniuk, R.G. Distributed Compressive Sensing 2012.
  • Fang et al. (2017) Fang, L.; Wang, C.; Li, S.; Benediktsson, J.A. Hyperspectral Image Classification via Multiple-Feature-Based Adaptive Sparse Representation. IEEE Transactions on Instrumentation & Measurement 2017, 66, 1646–1657.
  • Yan et al. (2019) Yan, J.; Chen, H.; Zhai, Y.; Liu, Y.; Liu, L. Region-division-based joint sparse representation classification for hyperspectral images. IET Image Processing 2019, 13, 1694–1704.
  • Wei et al. (2016) Wei, F.; Li, S.; Fang, L.; Kang, X.; Benediktsson, J.A. Hyperspectral Image Classification Via Shape-Adaptive Joint Sparse Representation. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 2016, 9, 1–1.
  • Zhang et al. (2017) Zhang, Z.; Xu, Y.; Yang, J.; Li, X.; Zhang, D. A Survey of Sparse Representation: Algorithms and Applications. IEEE Access 2017, 3, 490–530.
  • Wang et al. (2019) Wang, Y.; Zou, C.; Tang, Y.Y.; Li, L.; Shang, Z. Cauchy greedy algorithm for robust sparse recovery and multiclass classification. SIGNAL PROCESSING 2019, 164, 284–294.
  • Sayed et al. (2019) Sayed, G.I.; Hassanien, A.E.; Azar, A.T. Feature selection via a novel chaotic crow search algorithm. NEURAL COMPUTING & APPLICATIONS 2019, 31, 171–188.
  • Davenport and Wakin (2010) Davenport, M.A.; Wakin, M.B. Analysis of Orthogonal Matching Pursuit Using the Restricted Isometry Property. IEEE Transactions on Information Theory 2010, 56, 4395–4401.
  • Dai and Milenkovic (2009) Dai, W.; Milenkovic, O. Subspace Pursuit for Compressive Sensing Signal Reconstruction. IEEE Transactions on Information Theory 2009, 55, 2230–2249.
  • Needell and Vershynin (2010) Needell, D.; Vershynin, R. Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit. IEEE Journal of Selected Topics in Signal Processing 2010, 4, 310–316.
  • Wang et al. (2011) Wang, J.; Kwon, S.; Shim, B. Generalized Orthogonal Matching Pursuit. IEEE Transactions on Signal Processing 2011, 60, 6202–6216.
  • Beck and Teboulle (2009) Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. Siam J Imaging Sciences 2009, 2, 183–202.
  • Ye and Xie (2011) Ye, G.B.; Xie, X. Split Bregman method for large scale fused Lasso. Computational Statistics & Data Analysis 2011, 55, 1552–1569.
  • Wright et al. (2009) Wright, S.J.; Nowak, R.D.; Figueiredo, M.A.T. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing 2009, 57, 2479–2493.
  • Boyd et al. (2010) Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.

    Foundations & Trends in Machine Learning

    2010, 3, 1–122.
  • 198 (1983) [Studies in Mathematics and Its Applications] Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems Volume 15 —— Chapter 1 Augmented Lagrangian Methods in Quadratic Programming; 1983.
  • Fukushima (1992) Fukushima, M. Application of the alternating direction of multipliers to separable convex programming problems. Computational Optimization & Applications 1992, 1, 93–111.
  • Spyridon Kontogiorgis (1998) Spyridon Kontogiorgis, R.R.M. A Variable-Penalty Alternating Directions Method for Convex Optimization. Mathematical Programming 1998, 83, 29–53.
  • Yang et al. (Nov.2018) Yang, Y.; Sun, J.; Huibin, L.I.; Xu, Z. ADMM-CSNet: A Deep Learning Approach for Image Compressive Sensing. IEEE Transactions on Pattern Analysis and Machine Intelligence Nov.2018, PP, 1–1.
  • Luo et al. (2018) Luo, Y.; Zou, J.; Yao, C.; Li, T.; Bai, G. HSI-CNN: A Novel Convolution Neural Network for Hyperspectral Image 2018.
  • Slavkovikj et al. (2015) Slavkovikj, V.; Verstockt, S.; Neve, W.D.; Hoecke, S.V.; Walle, R.V.D. Hyperspectral Image Classification with Convolutional Neural Networks. 2015.
  • Yuan et al. (2019) Yuan, Q.; Zhang, Q.; Li, J.; Shen, H.; Zhang, L. Hyperspectral Image Denoising Employing a Spatial-Spectral Deep Residual Convolutional Neural Network. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2019, 57, 1205–1218.
  • Miche et al. (2010) Miche, Y.; Sorjamaa, A.; Bas, P.; Jutten, C.; Lendasse, A. OP-ELM: optimally pruned extreme learning machine. IEEE Transactions on Neural Networks 2010, 21, 158–162.
  • Zhang and Zhou (2007) Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 2007, 40, 2038–2048.
  • Shevade et al. (2000) Shevade, S.K.; Keerthi, S.S.; Bhattacharyya, C.; Murthy, K.R.K. Improvements to the SMO algorithm for SVM regression. IEEE Transactions on Neural Networks 2000, 11, 1188–1193.
  • Do et al. (2009) Do, T.T.; Gan, L.; Nguyen, N.; Tran, T.D. Sparsity adaptive matching pursuit algorithm for practical compressed sensing. Conference on Signals, Systems & Computers, 2009.