Pixel DAG-Recurrent Neural Network for Spectral-Spatial Hyperspectral Image Classification

06/09/2019 ∙ by Xiufang Li, et al. ∙ 0

Exploiting rich spatial and spectral features contributes to improve the classification accuracy of hyperspectral images (HSIs). In this paper, based on the mechanism of the population receptive field (pRF) in human visual cortex, we further utilize the spatial correlation of pixels in images and propose pixel directed acyclic graph recurrent neural network (Pixel DAG-RNN) to extract and apply spectral-spatial features for HSIs classification. In our model, an undirected cyclic graph (UCG) is used to represent the relevance connectivity of pixels in an image patch, and four DAGs are used to approximate the spatial relationship of UCGs. In order to avoid overfitting, weight sharing and dropout are adopted. The higher classification performance of our model on HSIs classification has been verified by experiments on three benchmark data sets.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

HSIs usually provide abundant spectral and spatial information of ground targets. Therefore, their interpretations such as the classification has been widely used in the geological survey, vegetation research and so on. However, the sufficient and efficient utilization of spectral and spatial information in HSIs classification is challenging such as Hughes phenomenon[1]

. A large number of researches have been done on the classification of HSIs from traditional methods such as independent component analysis (ICA)


to popular deep convolutional neural network (DCNN)


Due to the increase of spectral dimension and nonlinear of spectral space, some traditional methods are not appropriate to classify HSIs. Subsequently, ensemble learning method is proposed for HSIs classification such as

[4]. However, most of ensemble learning methods only consider the spectral rather than spectral-spatial information. Then, some sparsity-based algorithms, for instance [5], are used to extract spatial-spectral features. These methods are incapable of capturing robust and abstract features with the complex and varied environment.

Recently, deep neural network (DNN) has attracted more attentions due to its advantages of extracting high-rank abstract features and achievements in computer vision and natural language process, such as the images classification

[6] and speech recognition [7]. Y. Chen et al.[8] proposed 3-D CNN, it is the first time to extract spectral-spatial features of HSIs simultaneously by DCNN, and this method improves classification performance obviously. L. Mou et al. [9] take hyperspectral pixels as sequential data and apply recurrent neural network (RNN) to classify HSIs. These above methods extract robust spatial-spectral information in HSIs classification by multilayer convolutional neural network. We know, these convolution processes used to extract pixel features utilize the ideology of receptive field. However, this operation only extracts the feature of every pixel and ignore the correlation of adjacent pixels which is significant to realize category in the pRF. For detail, the perception ability of the pRF in the human visual cortex[10] is related to the focus of the vision and surrounding scenes. The influence of surrounding scenes on the central target reduces with the increase of interval. Here, we use UCG and DAG to describe this relation. Due to the ability of RNN on dealing with sequence data, we proposed a Pixel DAG-RNN model for HSIs classification.

This new method has three advantages: (1) Our model further applies the pRF mechanism of the human visual system in identifying the pixel category. Besides the pixel feature, we further consider the spatial correlation of adjacent pixels. (2) We apply UCGs and DAGs to represent the correlation of pixels and RNN to apply spatial sequence features for HSIs classification. This model makes full use of both spectral information and spatial correlations of pixels. (3) In order to prevent the overfitting phenomenon caused by a limited number of training samples, weight sharing and dropout are used.

2 Pixel DAG-Recurrent Neural Network

2.1 Motivation

Figure 1: (a) Schematic diagram of the human cerebral cortex, where hV4 is marked by red rectangle box. (b) The pRFs spatial array of hV4. A series of different size circles are used to denote pRFs. (c) An image patch with the center pixel represents target unit. (d) The diagram of correlation with distance

We know, in the human vision system, that sounding scenes also affect the realization of the central target and these effects obey visual mechanism. In the pRF mechanism of the human visual system, which illustrated in Fig.1 (a), more intensive attentions are focused on the central target and the attentions on sounding scenes because blur and sparse with increased distance as shown in Fig.1 (b). Based on this theoretical mechanism, we utilize the same principle of identifying pixel categories. The importance of surrounding layer pixels on central pixel reduces with the increased interval. Fig.1 (c) shows the reduced importance of surrounding pixel layers on a central pixel by shallower color. Fig.1 (d) express the variance curve of the importance of surrounding pixel layers on the central pixel with the distance. These rules demonstrate the importance of spatial structure sequentiality in understanding images. In addition, RNN is more suitable for processing sequence data. Therefore, on the basis of the mechanism of pRF mechanism of the human visual system in HSIs classification, we used UCGs and DAGs to connect pixels and design Pixel DAG-RNN to extract spectral-spatial features.

2.2 Pixel DAG-RNN

Figure 2: Architecture of Pixel DAG-RNN for classification. The leftmost is a sample of 8-neighborhood DAG in the southeast direction.

As shown in Fig. 2, a sample of directed acyclic graph (DAG) can be represented as , where denotes the vertex set and denotes the arc set, which represent the row and represent the column from to , represents the arc from vertex to . We can get a series of contextual dependencies among vertexes and input them to a recurrent neural network. Subsequently, hidden layer are generated with the same structure as . denotes the value of hidden layer at , which is related to its local input

and the hidden representation of its predecessors. Due to the special structure of hidden layer

, we should calculate it sequentially. The hidden layer and output are computed as follows:


where and represent the connection weights and biases. is the direct predecessor set of vertex in the DAG, , and

are the nonlinear activation functions. From the above formulas, we can see that this is an autoregressive model with the following conditional distributions:


The recurrent weights are shared in order to avoid overfitting. When calculating the hidden layer , we start at the DAG’s source vertex and calculate the next vertex of the hidden layer according to the structure of DAG, until to the last vertex . Therefore, include the information of all the DAG’s vertex. A nonlinear function is used to obtain the final output . Loss is denoted as follows:


where represents real label,

denotes loss function.

2.3 Pixel DAG-RNN for HSIs Classification

In HSIs classification, each pixel having hundreds of spectral data is classified into a class of object. In order to apply the neighborhood information, we use UCG to represent the spatial relationship of the image

Figure 3: Architecture of Pixel DAG-RNN for HSIs classification

patch and then apply four different DAGs to approximate the topology of the UCG. Based on DAG’s definition in spatial structural sequentiality of pixels, we apply DAG-RNNs model to classify HSIs for making full use of the spatial structural sequentiality of pixels as shown in Fig. 3. The spectral-spatial features are extracted by Pixel DAG-RNNs, and then we concatenate the four feature vectors at the end vertex to a final vector. Two full connection layers and softmax are used for the final classification.

The detail of using DAGs to approximate UCG is illustrated in Fig. 4. Suppose we make an image patch with the size of . The black unit denotes the target pixel. 8 neighborhood UCG is used to represent the spatial relationship of pixels. Because of its loopy property, we can’t get a fixed sequence applied in RNNs. In order to fully use the semantic contextual dependencies of the image patch, we use the combination of a set of small DAGs with the height and width being () to represent the UCG. Four 8 neighborhood DAGs are respectively used in the end vertex as the target unit. Those four dictionaries are southeast, southwest, northeast, and northwest. Therefore, we can route anywhere orderly and use the information of any pixel in the image patch. The order of calculation is row by row and pixel by pixel within every row, suggested by [11]. Pixel DAG-RNN is applied to each DAG to generate the hidden layer (), so as to take advantage of the local feature with a broader view of contextual awareness. Those operations can be expressed as follows:


where and are the connection weights and biases, is the hidden layer at vertex in d direction DAG. is the direct predecessor set of vertex in direction . Here, the weights and biases are shared across all vertexes in direction . The memory length is ().

Figure 4: Schematic diagram of image patch decomposition. The black unit denotes target pixel.

Our proposed model can sequentially get the feature from edge to center, because of the accumulation of the parameters in the forward process, the information of the neighborhood is gradually weakened. It is consistent with the principle: the importance of surrounding pixel to central pixel reduces with the increased interval.

3 Experimental Results and Analysis

In this experiment, we use three benchmark datasets: Indian Pines data, University of Pavia data, and Kennedy Space Center data and their size are , , and

, respectively. Their usable number of bands are 200, 115 and 176 respectively. In addition, in order to verify the effectiveness of our method, we choose the RBF-SVM, SOMP, 3D-CNN as contrast experiments and take overall accuracy (OA), average accuracy (AA), and Kappa coefficient in the form of mean standard deviation to measure the performance of our model.

3.1 Parameters Setting and Experiment Results

For all experiments, the numbers of labeled samples for training and testing are the same as 3-D CNN experiments[8]. The experiment parameters in Indian Pines data as follows: SOMP uses square window. The window size of 3D-CNN is , and the convolution kernel size is . For Pixel DAG-RNN, we use 8 neighborhood information and DAG-RNN with 128 dimensions to extract features. In addition, the block size is , the learning rate is 0.005 and dropout is 0.4. The parameters of experiments on University of Pavia data are listed: the window sizes of SOMP and 3-D CNN are and respectively. For Pixel DAG-RNN, four DAGs are used to represent the image patch. In KSC data, SOMP uses window, and the parameters of 3-D CNN is the same as University of Pavia data. In Pixel DAG-RNN, the patch size is . An 8 neighborhood UCG to represent the contextual awareness, and then combine four DAGs to approximate the topology of UCG. Besides 8 neighborhood UCG, we also use a 4 neighborhood UCG to represent an image patch. The overall results of different methods on three data sets are listed in Table 1.

In Table 1, we can find that 8-P-DAG-RNN obtains the best classification performance compared with other methods on OA, AA and Kappa. Compare to other four models which apply spatial-spectral features, RBF-SVM obtain lower classification accuracy because of its only feature extraction on spectral dimension. In University of Pavia and KSC data, 3-D CNN obtains slightly higher accuracies on OA,AA and Kappa. However, in Indian Pines data, SOMP and 3-D CNN obtain nearly the same classification performances due to more categories with fewer training samples which is adverse condition for deep neural network. Because 4-P-DAG-RNN don’t consider diagonal connections between units in UCG, it obtains some losses on classification accuracy compared to 8-P-DAG-RNN. On the whole, our novel model 8-P-DAG-RNN has the best classification performance because of its further use of the spatial contextual dependency.

Pines OA 82.200.39 95.100.39 94.930.65 95.130.76 96.420.24
AA 87.950.86 93.711.33 95.371.04 95.181.23 96.580.31
Kappa 0.8630.005 0.9430.005 0.9410.008 0.9440.009 0.9590.003
Pavia OA 89.420.40 97.340.54 98.610.57 98.750.29 99.291.75
AA 89.620.24 95.680.54 98.470.40 98.280.36 99.070.28
Kappa 0.8590.005 0.9640.007 0.9810.008 0.9830.004 0.9900.002
KSC OA 89.041.35 92.651.43 94.290.90 96.000.52 97.450.72
AA 85.611.62 92.341.71 92.711.35 93.130.73 95.751.17
Kappa 0.8780.015 0.9180.016 0.9360.010 0.9560.006 0.9720.008
Table 1: OA(%), AA(%) and Kappa

For further expressing contributions of our model on each class, we listed the classification accuracies of all categories on KSC data in Table 2. From Table 2, we find that 8-P-DAG-RNN obtains the highest accuracy in 8 of 13 categories, such as ”Surb”, ”Slash pine”, ”Graminoid marsh”, ”Spartina marsh”, ”Cattail marsh”, ”Salt marsh”, ”Mud flats”, ”Water”. Other five classes obtain the second or third high accuracy because of their fewer training samples. In addition, we analyse the influence of memory length on classification accuracy. As mentioned in 3.2, an image patch can be decomposed into four small image patches and the size of small image patch is called as memory length, such as the memory length of Fig. 4 is 4. Through experiments with different memory length(m={5,6,7,8}) of image patches on three datasets, we find that when the memory length is 6, the University of Pavia dataset can reach the best performance(OA = 99.29length for the Indian Pines and KSC datasets is 7

Scrub 92.372.92 98.131.31 94.482.32 97.751.36 98.271.31
Willow swamp 85.394.08 95.344.88 84.0810.77 90.615.64 95.283.80
CP hammock 90.581.91 97.842.14 83.305.94 93.825.11 97.341.34
Slash pine 74.195.71 85.204.43 83.677.62 85.326.74 89.634.61
Oak/Broadleaf 70.366.08 92.336.22 92.863.65 73.137.93 85.098.29
Hardwood 55.588.61 91.534.29 89.874.79 84.024.99 88.184.81
Swamp 94.343.95 100.000.00 99.091.52 92.116.14 95.304.24
Graminoid marsh 75.097.41 78.765.84 93.554.70 97.242.01 97.731.92
Spartina marsh 98.271.35 94.633.08 91.914.66 99.830.26 99.950.10
Cattail marsh 95.312.75 95.112.40 96.396.34 99.540.47 99.690.40
Salt marsh 95.662.23 99.180.52 97.482.57 99.150.31 99.504.12
Mud flats 87.685.11 72.4310.27 98.592.42 98.111.92 98.781.23
Water 98.160.50 100.000.00 100.000.00 100.000.00 100.000.00
Table 2: classification accuracy for every class(%)on the KSC data set.

4 Conclusion

In this paper, we use Pixel DAG-RNN to extract spectral-spatial features for HSIs classification. It can effectively exploit the spatial correlation of pixels by UCG and then combination of four directed acyclic graphs (DAGs) to approximate the UCG s topology. In addition, this model also utilizes the advantage of RNN on extracting and using sequence data in network architecture. The superiority of Pixel DAG-RNN has been verified by experiments on three benchmark HSIs data sets. Further, weights sharing and dropout are used to prevent overfitting. The future work will devote to use pixel information efficiently.


  • [1] Antonio Plaza, Jon Atli Benediktsson, Joseph W Boardman, Jason Brazile, Lorenzo Bruzzone, Gustavo Camps-Valls, Jocelyn Chanussot, Mathieu Fauvel, Paolo Gamba, Anthony Gualtieri, et al., “Recent advances in techniques for hyperspectral image processing,” Remote sensing of environment, vol. 113, pp. S110–S122, 2009.
  • [2] Alberto Villa, Jón Atli Benediktsson, Jocelyn Chanussot, and Christian Jutten, “Hyperspectral image classification with independent component discriminant analysis,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4865–4876, 2011.
  • [3] Jun Yue, Wenzhi Zhao, Shanjun Mao, and Hui Liu, “Spectral–spatial classification of hyperspectral images using deep convolutional neural networks,” Remote Sensing Letters, vol. 6, no. 6, pp. 468–477, 2015.
  • [4] Björn Waske, Sebastian van der Linden, Jón Atli Benediktsson, Andreas Rabe, and Patrick Hostert,

    “Sensitivity of support vector machines to random feature selection in classification of hyperspectral data,”

    IEEE Trans. Geosci. Remote Sens., vol. 48, no. 7, pp. 2880–2889, 2010.
  • [5] Jianing Wang, Licheng Jiao, Hongying Liu, Shuyuan Yang, and Liu Fang, “Hyperspectral image classification by spatial–spectral derivative-aided kernel joint sparse representation,” IEEE Journal of Selected Topics in Applied Earth Observations Remote Sensing, vol. 8, no. 6, pp. 2485–2500, 2015.
  • [6] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton,

    Imagenet classification with deep convolutional neural networks,”

    in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [7] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645–6649.
  • [8] Yushi Chen, Hanlu Jiang, Chunyang Li, Xiuping Jia, and Pedram Ghamisi,

    Deep feature extraction and classification of hyperspectral images based on convolutional neural networks,”

    IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10, pp. 6232–6251, 2016.
  • [9] Lichao Mou, Pedram Ghamisi, and Xiang Zhu Xiao, “Deep recurrent neural networks for hyperspectral image classification,” IEEE Transactions on Geoscience Remote Sensing, vol. 55, no. 7, pp. 3639–3655, 2017.
  • [10] Brian A Wandell and Jonathan Winawer, “Computational neuroimaging and population receptive fields,” Trends in cognitive sciences, vol. 19, no. 6, pp. 349–357, 2015.
  • [11] Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al., “Conditional image generation with pixelcnn decoders,” in Advances in Neural Information Processing Systems, 2016, pp. 4790–4798.