I Introduction
Hyperspectral image (HSI) classification has received considerable attention in recent years for a variety of application using neural networkbased techniques. Hyperspectral imagery has several hundreds of contiguous narrow spectral bands from the visible to the infrared frequency in the entire electromagnetic spectrum. It is expected to provide finer classification with such high spectral resolution due to having a distinct spectral signature for each pixel instances. However, such a large number of spectral dimension creates the curse of dimensionality. Along with this, the following issues bring challenges in the classification of HSIs. 1) limited training examples, and 2) large spatial variability of spectral signature. In general, contiguous spectral bands may contain some redundant information which leads to the Hughes phenomenon
[1]. It causes accuracy drop in classification when there is an imbalance between the high number of spectral channels and scarce training examples. Conventionally, dimensionreduction techniques are used to extract a better spectral feature. For instance, Independent Component Discriminant Analysis (ICDA) [2]tries to find statistically independent components using ICA where it assumes that at most one component has a Gaussian distribution. ICA uses higher order statistics to compute uncorrelated components compared to the PCA
[3] which uses covariance matrix. Some nonlinear techniques such as quadratic discriminant analysis [4], kernelbased methods [5] are also employed to handle nonlinearity in HSIs. However, extracted features in the reduced dimensional space may not be optimal for classification. HSI classification task gets more complicated with the following facts: i) Spectral signature of objects belonging to the same class may be different. ii) Similarly, the spectral signature of objects belonging to different class may be the same. Therefore, only spectral components may not be sufficient to provide features for classification. Recent studies prove that incorporation of spatial context along with spectral components improves classification task considerably. There are two ways of exploiting spatial and spectral information for classification. In the first approach, spatial and spectral information are processed separately and then combined at the decision level [35, 19]. The second strategy uses joint spectralspatial features [34, 36, 33, 37]. In this paper, we have adopted the second strategy to accomplish the task of classification of hyperspectral images with higher accuracy than the state of the art techniques.In literature, 1D [33], 2D [34], and 3D [20] CNN based architectures are wellknown for HSI classification. Also, hybridisation of different CNNtype architecture is employed [21]. 1DCNNs uses pixelpair strategy [33] which combined a set of two training samples. This set reveals the neighborhood of the observed pixel. Yet, it can not use full power of spatial information in hyperspectral image classification. It completely ignores neighborhood profile. In general, 2D and 3D CNN based approaches are more suitable in such a scenario. However, there are many other architectures, e.g., Deep Belief Network [22, 23, 24, 25, 26]
[27, 28, 29, 30, 31, 32]which provide efficient solution to the hyperspectral image classification problem. In the present context, we are more interested in scrutinizing various CNN architecture for the current problem. In general, a few core components are available for making any CNN architecture. For example, convolution, pooling, batchnormalization
[44], and activation layers. In practice, there are various ways of using convolution mechanism. A few of them are very popular, namely, point convolution, group convolution, depthwise separable convolution [43]. Similarly, there is a variation to the pooling mechanism, namely, adaptive pooling [45]. Recently, many midlevel components are developed, e.g., inception module which comprises of multiscale convolutions. Midlevel components are sequentially combined to make a large network, such as, VGG
[39], GoogleNet [40], etc. Additionally, skip architecture [41]proves to be a successful way of making a very deep network to deal with the vanishing gradient problem. Hyperspectral image classification is still an interesting and challenging problem where the effectiveness of various core components of CNN and their arrangement to resolve the classification problem, needs to be studied.
In this paper, we present a CNN architecture which performs three major tasks in a pipeline of processing, such as, :1) band reduction, 2) feature extraction, and 3) classification. The first block of processing uses pointwise 3D convolution layers. For feature extraction, we have used multiscale convolution module. In our work, we have proposed two architectures for feature extraction which eventually lead to two different CNN architectures. In the first architecture, we have used inception module with four parallel convolution structures for feature extraction. Additionally, we have used similar multiscale convolutions in inception like structure but with a different arrangement. The second architecture extracts finer contextual information compared to the first one. We feed the extracted features to a fully connected layer to form the highlevel representation. We have trained our network in an end to end manner by minimizing crossentropy loss using Adam optimizer. Our proposed architecture gives a state of the art performance without any data augmentation on three benchmark HSI classification datasets. Besides this new architecture, we have proposed a way to incorporate spatial information along with the spectral one. It not only covers the neighborhood profile for a given window, but also it observes the change of neighborhood by shifting its current window. This process appears to be more beneficial towards the boundary location compared to the still window. The contribution of this paper can be summarized as follows:

A novel technique to incorporate spatial information with the spectral one has been proposed. The design is aimed at improving classification accuracy at the boundary location of each class.

A novel end to end shallow and wider neural network has been proposed, which is a hybridization of 3D CNN with 2DCNN. This hybrid structure provides a solution for appropriately using spectral information and extract more delicate features. Also, we have shown two different arrangements of similar multiscale convolutional layers to extract distinctive features.
Section IIA gives a detailed description of the proposed classification framework, including the technique of inclusion of the spatial information. Performance and comparisons among the proposed networks and current state of the art methods are presented in Section III. This paper is concluded in Section IV.
Ii Proposed Classification Framework
The proposed classification framework shown in Fig 1 mainly consists of three tasks: i) organizing a targetpixelorientation model using available training samples, ii) constructing a CNN architecture to extract uncorrelated spectral information, and iii) learning spatial correlation with neighboring pixels.
Iia TargetPixelOrientation Model for Training Samples
Consider a hyperspectral data set with spectral bands. We have labeled samples denoted as in an feature space and class labels where is the number of class. Let be the number of available labeled samples in the class and . We propose a TargetPixelOrientation (TPO) scheme. In this scheme, we consider a window whose center pixel is the target pixel. We select eight neighbors of the target pixel by simply shifting the window into eight different directions in a clockwise manner. Fig 2 shows one example of how we prepare eight neighbors of a target pixel with window. We have demonstrated target pixel with a blue box which is surrounded by window shown in a red border. First subimage in Fig 2 depicts the window when the target pixel occupies the center position of that window. Other eight subimages are the neighbors of the first subimage which are numbered by 1 to 8. We consider each of nine windows as one view for the target pixel. However, we have described TPO with one spectral channel to make the illustration simple. In our proposed system, we have considered
spectral channels. Therefore, input to the model is 4dimensional tensor. We perform the following operation to form the input for our models.
(1) 
Where is a function which is responsible for stacking of channels and represents patch of spectral channel in view. represents nine views in TPO scheme. We have converted labeled samples to such that each has dimension.
IiA1 Advantage of TPO for class boundary
We observe that a patch of a pixel appears very differently at the boundary region of any class compared to pixels in the nonboundary area. In general, the nonboundary pixels are surrounded by pixels belonging to the same class. In such a scenario, TPO provides more than one views for the target pixel at the boundary region. We have illustrated this with a twoclass situation in Fig3
. The patch of a target pixel at near boundary contains all pixels of similar class (blue). However, the patch of a target pixel at the border includes pixels of two classes (blue and red). If we consider only one patch surrounded that target pixel, we may fail to classify border pixels. In this scenario
TPO brings a different view of patches for a single target pixel at the boundary. We have shown TPO of target pixel at border and nearborder in Fig 4 and Fig 5 respectively. In the given situation, there is at least one view where every pixel belongs to the blue class for the border pixel. However, there are other views which are similar to the views of the pixel at the near boundary.IiB Network Architecture
The framework of the HSI classification is shown in Fig 1. It consists of mainly three blocks, namely, bandreduction, feature extraction, and classification. TPO extracts samples from the given dataset as described in Section IIA. The label of each sample is that of the pixel located in the center of the first view among the nine views (discussed in Sec IIA).
IiB1 BandReduction
This block contains three consecutive “BasicConv3d” layers. The designed “BasicConv3d” layer contains 3D batchnormalization layer and rectified linear unit (ReLU) layers sequentially after 3D pointwise convolution layer. Parameter of 3D convolution layer is the input channel, output channel, and the kernel size. However, we have adjusted kernel size experimentally for the different dataset. Hence, we have used p, q, and r notation in defining kernel size in Fig
6. Assume the spectral dimensionality and spatial size . The first 3D convolutional layer (C1) primarily filters the prepared data with nine kernels, producing a feature map. As we have used pointwise 3D convolution, there is no change in the spatial size of the sample. But, the size of the spectral channel is changed based on the value of p which is 8 in this example. The size of the spectral channel in the convolved sample can be computed using the following equation.(2) 
Where represents the size of the spectral channel, which is in this case. , and
represent kernel size, padding, and stride. For the above example,
, and holds. Therefore, we are getting 96 channels in the convolved sample. The second layer (C2) combines the features obtained in the C1 layer with nine kernels, resulting in a feature map. The third layer (C3) combines the features obtained in the C1 layer with nine kernels, resulting in a feature map. We have a reduced number of bands from 103 to 50 at this point. Now we reshape our data in 3 dimensions by stacking nine views for each 50 spectral information, leading to sized sample. We feed the reshaped output of bandreduction block to feature extraction layer.IiB2 Feature Extraction
We have taken a tiny patch as an input sample. Our assumption is that a shallow but wider network, i.e., “multiscale filter bank” extracts more appropriate features from small patches. Hence, we have considered similar to inceptionmodule for feature extraction. We have used inception module in two different ways forming two separate networks. Fig 7 and Fig 8 depict feature extraction modules of TPOCNN1 and TPOCNN2 . Each “BasicConv2d” layer in the figures contains a 2D batchnormalization layer and rectified linear units (ReLU) sequentially after 2D convolution layer. Parameter of the 2D convolution layer is the input channel, output channel, and the kernel size. Each rectangular block of “BasicConv2d” in the diagram contains parameters of the 2D convolution layer. We denote this by , where refers to the kernel size of the convolution layer and is the number of input channel. On the other hand, each block of “AvgPool2d” depicts the kernel size and the stride value for the average pooling layer in the diagram. We denote this by , where refers to the kernel size of the pooling layer. TPOCNN1 uses a multiscale filter bank that locally convolves the input sample with four parallel blocks with different filter sizes in convolution layer. Each parallel block consists of either one or many “BasicConv2D” layer and pooling layer. TPOCNN1 has the following details: :( , followed by followed by , followed by and followed by . The and filters are used to exploit local spatial correlations of the input sample while the filters are used to address correlations among nine views and their respective spectral information. The outputs of the TPOCNN1 feature extraction layer are combined at concatenation layer to form a joint viewspatiospectral feature map used as input to the subsequent layers. However, since the size of the feature maps from the four convolutional filters is different from each other, we have padded the input feature with zeros to match the size of the output feature maps of each parallel blocks. For example, we have padded input with 0, 1 and 2 zeros for , and filters, respectively. In TPOCNN1, we have used one adaptive average pooling [45] layer with output size sequentially after concatenation layer. However, we have split the inception architecture of TPOCNN1 into three small inception layer. Each has two parallel convolutional layers. Each concatenation layer is followed by an adaptive average pooling layer with output . Finally, we concatenate all the pooled information.
IiB3 Classification
Outputs of feature extraction block are flattened and fed to the fully connected layers whose output channel is the number of class. The fully connected layers is followed by 1D BatchNormalization layer and a softmax activation function. In general, the classification layer can be defined as
(3) 
where is the input of the fully connected layer, and and are the weights and bias of the fully connected layer, respectively. BN(·) is the 1D BatchNormalization layer. is the
dimensional vector which represents the probability that a sample belong the
class.IiC Learning the Proposed Network
We have trained the proposed networks by minimizing crossentropy loss function. Let
represents the groundtruth for the training samples present in a batch .denotes the conditional probability distribution of the model, i.e model predicts that
training sample belongs to the class with probability . The crossentropy loss function is given by(4) 
In our dataset, the ground truth is represented as onehot encoded vector. i.e., Each
is a dimensional vector where represents the number of classes. If the class label of the sample is then,To train the model, Adam optimizer with a batch size of 512 samples is used with a weight decay of 0.0001 . We initially set a base learning rate as 0.0001. All the layers are initialized from a uniform distribution.
Iii Experimental Results
U. Pavia  Indian Pines  Salinas  
Sensor  ROSIS  AVIRIS  AVIRIS  
Place 




wavelength range  0.430.86  0.42.5  0.42.5  
Spatial Resolution  1.3 m  20 m  3.7 m  
No of bands  103  220  224  
No. of Classes  9  16  16  
Image size 
Iiia Datasets
The performance of HSI classification is observed by experimenting with three popular datasets: the Pavia University scene (U.P) (Fig9), the Indian Pines (I.P) (Fig10), and the Salinas (S) dataset (11). Table I contains a brief description about the datasets. We have discarded water absorption bands in Indian Pines. Also, we have rejected some classes in Indian Pines dataset which has less than 400 samples. We have selected 200 labeled pixels from each class to prepare a training set for each of the three HSI datasets. The rest of the labeled samples constitute the test set. As different spectral channels ranges differently, we normalize them to the range [0, 1] using the function defined in Eq. 5, where denotes the pixel value of a given spectral channel and and
provide mean and standard deviation of the dataset.
(5) 
IiiB Quantitative Metrics
We evaluate the performance of the proposed architecture quantitatively in terms of the following metrics.
IiiB1 Overall Accuracy(OA)
Overall Accuracy is computed using the following formulae in the test samples where number of classes, considered for a given HSI dataset:
(6) 
IiiB2 Average Accuracy (AA)
Average Accuracy is computed using the following formulae in the test samples where number of classes, considered for a given HSI dataset:
(7) 
IiiB3 score
The score [18] is a statistical measure about the agreement between two classifiers. Each classifier classifies N samples into C mutually exclusive classes. score is given by the following equation:
(8) 
where is the relative observed agreement between classifiers, and is the hypothetical probability of chance agreement. indicates complete agreement between two classifiers, while refers no agreement at all.
IiiC Implementation Platform
The network is implemented in pytorch
^{1}^{1}1https://pytorch.org/, a popular deep learning library, written in python. We have trained our models on a machine with GeForce GTX 1080 Ti GPU.IiiD Comparison with Other Methods
The key features of our proposed methods are 1) use of spatial feature with spectral one, 2) bandreduction using several consecutive 3D CNNs and 3) feature extraction with a multiscale convolutional network. We have chosen six state of the art methods namely,: 1) CNNPPF [33], 2) DRCNN [34], 3) 2SFusion, 4) BASS [37], 5) DPPML [36], and 6) SCNN+SVM [38]. Every comparable method exploits spatial features along with the spectral one. CNNPPF uses a pixel pair strategy to increase the number of training samples and feeds them into a deep network having 1D convolutional layers. DRCNN exploits diverse regionbased 2D patches from the image to produce more discriminative features. On the contrary, 2SFusion processes spatial and spectral information separately and fuses them using adaptive classspecific weights. However, BASSNET extracts band specific spatialspectral features. In DPPML
, convolutional neural networks with multiscale convolution are used to extract deep multiscale features from the HSI. SVMbased methods are common in traditional hyperspectral image classification. In
SCNN+SVM, the Siamese convolutional neural network extracts spectralspatial features of HSI and feeds them to a SVM classifier. In general, the performance of deep learningbased algorithms supersedes traditional techniques (e.g, kNN, SVM, ELM). We have compared the performance of the proposed techniques with the best results reported for each of these state of the art techniques. In SCNN+SVM and 2SFusion, performance on the salinas dataset is not reported. To maintain consistency in the results, we ran our algorithm with the classes and the number of samples for each class used in 2SFusion, DRCNN, DPPML for Indian Pines.Class 


BASS 








Asphalt  200  97.42  97.71  100  97.47  98.43  99.38  99.78  100  
Meadows  200  95.76  97.93  98.12  99.92  99.45  99.59  99.88  99.99  
Gravel  200  94.05  94.95  99.12  83.80  99.14  97.33  99.21  100  
Trees  200  97.52  97.80  99.40  98.98  99.50  99.31  99.41  99.93  
Painted metal sheets  200  100  100  99.18  100  100  100  100  100  
Bare Soil  200  99.13  96.60  99.10  97.75  100  99.99  99.75  100  
Bitumen  200  96.19  98.14  98.50  77.44  99.70  99.85  100  100  
SelfBlocking Bricks  200  93.62  95.46  99.91  96.65  99.55  99.02  99.77  100  
Shadows  200  99.60  100  100  99.65  100  100  100  100  
OA  96.48  99.68  97.50  99.56  99.46  99.72  99.78  99.99 
Class 


BASS 





Asphalt  200  92.99  96.09  98.25  100  100  
Meadows  200  96.66  98.25  99.64  99.92  99.75  
Gravel  200  95.58  100  97.10  100  99.68  
Trees  200  100  99.24  99.86  99.73  99.82  
Sheets  200  100  100  100  100  100  
Bare soil  200  96.24  94.82  98.87  100  100  
Bitumen  200  87.80  94.41  98.57  99.74  100  
Bricks  200  98.98  97.46  100  100  100  
Shadows  200  99.81  99.90  100  100  99.72  
OA  94.34  96.77  99.04  99.89  99.84 
Class 









Alfalfa          100  100  100  
Cornnotill  98.20  99.03  100  100  95.35  100  100  
Cornmintill  99.79  99.74  99.51  99.67  98.75  100  100  
Corn          100  100  100  
Grasspasture  100  100  100  100  100  100  100  
Grasstrees          99.32  100  100  
Grasspasturemowed          100  100  100  
Haywindrowed  100  100  98.84  98.85  100  100  100  
Oats          100  100  100  
Soybeannotill  99.78  99.61  100  100  100  100  100  
Soybeanmintill  96.69  97.80  100  99.91  98.03  100  100  
Soybeanclean  99.86  100  100  100  100  100  100  
Wheat          97.87  100  100  
Woods  99.99  100  100  100  99.62  100  100  
BuildingsGrassTreesDrives          98.53  100  100  
StoneSteelTowers          100  100  100  
OA  98.54  99.08  99.54  99.55  98.65  100  100 
Class 

BASS 






Brocoligreenweeds1  100  100  100  100  100  100  
Brocoligreenweeds2  99.88  99.97  100  100  100  100  
Fallow  99.60  100  99.98  100  100  99.72  
Fallowroughplow  99.49  99.66  99.89  99.25  100  100  
Fallowsmooth  98.34  99.59  99.83  99.44  99.84  99.88  
Stubble  99.97  100  100  100  100  100  
Celery  100  99.91  99.96  99.87  100  100  
Grapesuntrained  88.68  90.11  94.14  95.36  94.30  98.17  
Soilvinyarddevelop  98.33  99.73  99.99  100  99.75  100  
Cornsenescedgreenweeds  98.60  97.46  99.20  98.85  94.02  99.35  
Lettuceromaine4wk  99.54  99.08  99.99  99.77  100  100  
Lettuceromaine5wk  100  100  100  100  100  100  
Lettuceromaine6wk  99.44  99.44  100  99.86  100  100  
Lettuceromaine7wk  98.96  100  100  99.77  100  100  
Vinyarduntrained  83.53  83.94  95.52  90.50  95.08  94.03  
Vinyardverticaltrellis  99.31  99.38  99.72  98.94  100  100  
OA  94.80  95.36  98.33  97.51  97.98  98.72 
IiiE Results and Discussion
The performance of the proposed TPOCNN1 and TPOCNN2 on testsamples are compared with the aforementioned deep learningbased classifiers in Tables II, III, IV, V. We have considered spatial window for generating the outcome of our algorithms. We have seen that our models supersede other methods for every dataset. However, TPOCNN2 produces better results compared to TPOCNN1 in University of Pavia and Salinas datasets whereas their performances are comparable in Indian Pines. The results signify that the arrangements of multiscale convolutions in TPOCNN2 is able to extract more useful features for the classification compared to TPOCNN1.
We have shown thematic maps generating from the classification of three HSI scenes using our networks in Figure 12. In order to check the consistency of our network, we repeat experiments 10 times with different training sets. Table VI shows the mean and standard deviation of OA over these 10 experiments for each data set.
Datasets  U.P  I.P  S  

TPOCNN  1  2  1  2  1  2  
OA  Mean  99.76  99.90  99.67  99.65  98.10  98.65 
Stddev  0.24  0.06  0.14  0.17  0.37  0.28  
AA  Mean  99.80  99.94  99.82  99.78  97.87  98.49 
Stddev  0.20  0.06  0.07  0.11  0.42  0.30  
Mean  99.44  99.86  99.61  99.58  99.29  99.44  
Stddev  0.32  0.08  0.17  0.20  0.18  0.14 
Datasets  TPOCNN1  TPOCNN2  

U.P  I.P  S  U.P  I.P  S  
98.66  97.59  95.55  99.23  97.78  95.67  
99.67  99.27  97.20  99.68  99.49  97.78  
99.84  99.71  93.70  99.94  99.89  99.14 
IiiF Comparison of Different Hyperparameter Settings
There are two hyperparameters which have a direct effect on the accuracy of classification task: 1) Spatial size of input image, and the 2) Number of spectral channels obtained from the band reduction block. Figures 12(b) and 12(a) depicts test accuracies on 3 HSI datasets for different choices of input patch size on the same randomly selected training sample. We observe that with increasing patch size accuracies in Indian Pines and Salinas increases in both the networks. However, for Salinas accuracy drops when patch size reaches in TPOCNN1. This behavior again supports the fact that the arrangements of multiscale convolutions of TPOCNN2 is superior to TPOCNN1 with respect to feature extraction. Table VIII shows the adjusted (refer to Section IIB1) parameters used for 3 HSI datasets. We vary the value of to get a different number of bands and observe its impact on classification accuracy. We did not observe any monotonically increasing or decreasing behavior in overall classification accuracy for changing the number of bands with varying value of . Figures 12(d) and 12(c) depicts the change in overall accuracy (OA) for varying .
U.P  I.P  S  

p  8  32  32 
q  16  57  61 
r  32  64  64 
IiiG Classification performance with decreasing number of training samples
In this section, the influence of decreasing the number of samples on the classification accuracy has been studied on the University of Pavia, Indian Pines, and Salinas datasets. We present the experimental results with the setup as mentioned above with spatial resolution. Here, each class selects a fixed number of samples per class from the labeled pixels. To showcase the effect of decreasing number of training samples on the classification accuracy, we have chosen several values of N, e.g., 150, 100, and 50. Our proposed architecture can still beat most of the comparable methods with 150 training samples per class. Table IX reassures that network architecture for feature extraction in TPOCNN2 brings more useful feature for U. Pavia and India Pines even with a small number of training samples compared to TPOCNN1. However we observe a small deviation of this results in the Salinas dataset with 100 training samples per class.
#samples  150  100  50  

TPOCNN  1  2  1  2  1  2  
U.P  OA 







AA 















OA 







AA 














S  OA 







AA 













IiiH Analysis on the TPO strategy
In order to judge how the TPO strategy as described in Section IIA affects the performance of the classifier, we compare the classification results using a single view where the position of the target pixel is the center of the given window. Table X supports the fact that TPO strategy has a direct effect on the classification accuracies. We observe that OA increases by 6.85%, 8.36%, 1.13% in TPOCNN1 model and 3.09%, 15.11% 2.75% in TPOCNN2 model for classification of U. Pavia, Indian Pines, Salinas, respectively. In brief, TPOscheme improves results compared to the single view in U.Pavia and IndianPines for both the models. However, we observe a different behavior with Salinas in TPOCNN1. This suggests consistent behavior of TPOCNN2 and a positive impact of TPOstrategy on the model for all three datasets.
view  stat  U.P 

S  

TPOCNN1  
OA  98.71  96.91  91.13  
AA  98.68  98.51  95.02  
9  98.26  96.38  90.06  
OA  91.86  88.55  92.26  
AA  92.08  93.31  96.37  
1  89.03  86.39  91.34  
TPOCNN2  
OA  99.01  97.34  94.88  
AA  99.18  98.60  97.97  
9  98.66  96.81  94.27  
OA  95.92  82.83  92.13  
AA  95.68  88.18  96.58  
1  94.56  79.48  91.20 
Iv Conclusion
In this paper, a hybrid of 3D and 2D CNNbased network architecture is proposed for HSI classification. We also propose a strategy (namely, targetpixel orientation (TPO)) to incorporate spatial and spectral information of HSI. In general, classification accuracy degrades due to misclassification at the boundary region. Our approach attempts to take care of this limitation by using the orientation of the Targetpixelview. Our architectural design of neural network exploits pointwise 3D convolutions for band reduction whereas we adopt multiscale 2D inception like architecture for feature extraction. We have tested the granular arrangement of multiscale convolutions in inception like architecture in (TPOCNN2). We find TPOCNN2 provides better results compared to TPOCNN1. The experimental results with real hyperspectral images demonstrates the positive impact of including TPO strategy. Also, the proposed work improves the performance of the classification accuracy compared to the state of the art methods even with a smaller number of training samples (For example, 150 samples per class). All the experimental results suggest that the arrangement of multiscale convolutions in TPOCNN2 provides more useful features compared to TPOCNN1.
References
 [1] G. Hughes, “On the mean accuracy of statistical pattern recognizers,” in IEEE Transactions on Information Theory, vol. 14, no. 1, pp. 5563, 1968.
 [2] A. Villa, J. A. Benediktsson, J. Chanussot and C. Jutten, “Hyperspectral Image Classification With Independent Component Discriminant Analysis,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 12, pp. 48654876, Dec. 2011.
 [3] G. Licciardi, P. R. Marpu, J. Chanussot and J. A. Benediktsson, “Linear Versus Nonlinear PCA for the Classification of Hyperspectral Data Based on the Extended Morphological Profiles,” in IEEE Geoscience and Remote Sensing Letters, vol. 9, no. 3, pp. 447451, 2012.
 [4] J. Li et al., “Multiple Feature Learning for Hyperspectral Image Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 3, pp. 15921606, 2015.
 [5] G. CampsValls and L. Bruzzone, “Kernelbased methods for hyperspectral image classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 43, no. 6, pp. 13511362, 2005.
 [6] W. Li, S. Prasad, J. E. Fowler and L. M. Bruce, “LocalityPreserving Dimensionality Reduction and Classification for Hyperspectral Image Analysis,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 4, pp. 11851198, 2012.
 [7] X. Wang, Y. Kong, Y. Gao and Y. Cheng, “Dimensionality Reduction for Hyperspectral Data Based on Pairwise Constraint Discriminative Analysis and Nonnegative Sparse Divergence,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 4, pp. 15521562, 2017.
 [8] S. Chen and D. Zhang, “Semisupervised Dimensionality Reduction With Pairwise Constraints for Hyperspectral Image Classification,” in IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 2, pp. 369373, 2011.
 [9] W. Zhao and S. Du, “Spectral–Spatial Feature Extraction for Hyperspectral Image Classification: A Dimension Reduction and Deep Learning Approach,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 45444554, 2016.
 [10] F. A. Mianji and Y. Zhang, “Robust Hyperspectral Classification Using Relevance Vector Machine,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 6, pp. 21002112, 2011.
 [11] A. Samat, P. Du, S. Liu, J. Li and L. Cheng, “ : Ensemble Extreme Learning Machines for Hyperspectral Image Classification,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 4, pp. 10601069, 2014.
 [12] W. Li, C. Chen, H. Su and Q. Du, “Local Binary Patterns and Extreme Learning Machine for Hyperspectral Imagery Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 7, pp. 36813693, 2015.
 [13] T. Lu, S. Li, L. Fang, L. Bruzzone and J. A. Benediktsson, “SettoSet DistanceBased Spectral–Spatial Classification of Hyperspectral Images,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 12, pp. 71227134, 2016.
 [14] GuangBin Huang, QinYu Zhu, CheeKheong Siew, “Extreme learning machine: Theory and applications,” Neurocomputing, vol 70, Issues 13, pp. 489501, 2006.

[15]
Jie Gui, Zhenan Sun, Wei Jia, Rongxiang Hu, Yingke Lei, Shuiwang Ji, “Discriminant sparse neighborhood preserving embedding for face recognition,” Pattern Recognition, vol. 45, Issue 8, pp 28842893, 2012.
 [16] D. Lunga, S. Prasad, M. M. Crawford and O. Ersoy, “ManifoldLearningBased Feature Extraction for Classification of Hyperspectral Data: A Review of Advances in Manifold Learning,” in IEEE Signal Processing Magazine, vol. 31, no. 1, pp. 5566, 2014.
 [17] Christian Szegedy, Sergey Ioffe and Vincent Vanhoucke, “Inceptionv4, InceptionResNet and the Impact of Residual Connections on Learning,” in CoRR, vol. abs/1602.07261, arXiv, 2016.
 [18] Cohen, J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, vol. 20, no. 1, pp. 3746. 1960.
 [19] S. Jia, X. Zhang and Q. Li, “Spectral–Spatial Hyperspectral Image Classification UsingRegularized LowRank Representation and Sparse RepresentationBased Graph Cuts,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp. 24732484, 2015.

[20]
H. Zhang, Y. Li, Y. Jiang, P. Wang, Q. Shen and C. Shen, “Hyperspectral Classification Based on Lightweight 3DCNN With Transfer Learning,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 8, pp. 58135828, 2019.
 [21] S. K. Roy, G. Krishna, S. R. Dubey and B. B. Chaudhuri, “HybridSN: Exploring 3D2D CNN Feature Hierarchy for Hyperspectral Image Classification,” in IEEE Geoscience and Remote Sensing Letters, pp. 15, 2019.
 [22] Y. Chen, X. Zhao and X. Jia, “Spectral–Spatial Classification of Hyperspectral Data Based on Deep Belief Network,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp. 23812392, June 2015.
 [23] T. Li, J. Zhang and Y. Zhang, “Classification of hyperspectral image based on deep belief networks,” IEEE International Conference on Image Processing (ICIP), pp. 51325136, 2014.
 [24] P. Zhong, Z. Gong, S. Li and C. Schönlieb, “Learning to Diversify Deep Belief Networks for Hyperspectral Image Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 6, pp. 35163530, 2017.
 [25] P. Zhong, Zhiqiang Gong and C. Schönlieb, “A DBNcrf for spectralspatial classification of hyperspectral data,” 23rd International Conference on Pattern Recognition (ICPR), pp. 12191224, 2016.
 [26] Y. Chen, X. Zhao and X. Jia, “Spectral–Spatial Classification of Hyperspectral Data Based on Deep Belief Network,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp. 23812392, 2015.
 [27] J. Feng, L. Liu, X. Zhang, R. Wang and H. Liu, “Hyperspectral image classification based on stacked marginal discriminative autoencoder,” 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, 2017, pp. 36683671.

[28]
Y. Sun, J. Li, W. Wang, A. Plaza and Z. Chen, “Active learning based autoencoder for hyperspectral imagery classification,” IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 469472 , 2016.

[29]
J. E. Ball and P. Wei, “Deep Learning Hyperspectral Image Classification using Multiple ClassBased Denoising Autoencoders, Mixed Pixel Training Augmentation, and Morphological Operations,” IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 69036906, 2018.
 [30] J. Feng, L. Liu, X. Cao, L. Jiao, T. Sun and X. Zhang, “Marginal Stacked Autoencoder With AdaptivelySpatial Regularization for Hyperspectral Image Classification,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 9, pp. 32973311, 2018.
 [31] S. Zhou, Z. Xue and P. Du, “Semisupervised Stacked Autoencoder With Cotraining for Hyperspectral Image Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 6, pp. 38133826, June 2019.
 [32] C. Tao, H. Pan, Y. Li and Z. Zou, “Unsupervised Spectral–Spatial Feature Learning With Stacked Sparse Autoencoder for Hyperspectral Imagery Classification,” in IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 12, pp. 24382442, 2015.
 [33] W. Li, G. Wu, F. Zhang and Q. Du, “Hyperspectral Image Classification Using Deep PixelPair Features,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 844853, Feb. 2017.
 [34] M. Zhang, W. Li and Q. Du, “Diverse RegionBased CNN for Hyperspectral Image Classification,” in IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 26232634, June 2018.
 [35] S. Hao, W. Wang, Y. Ye, T. Nie and L. Bruzzone, “TwoStream Deep Architecture for Hyperspectral Image Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 4, pp. 23492361, April 2018.
 [36] Z. Gong, P. Zhong, Y. Yu, W. Hu and S. Li, “A CNN With Multiscale Convolution and Diversified Metric for Hyperspectral Image Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 6, pp. 35993618, June 2019.
 [37] A. Santara et al., “BASS Net: BandAdaptive SpectralSpatial Feature Learning Neural Network for Hyperspectral Image Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 9, pp. 52935301, 2017.

[38]
B. Liu, X. Yu, P. Zhang, A. Yu, Q. Fu and X. Wei, “Supervised Deep Feature Extraction for Hyperspectral Image Classification,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 4, pp. 19091921, April 2018.
 [39] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for LargeScale Image Recognition”, in CoRR, vol. abs/1409.1556, arXiv, 2014.
 [40] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D.Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, “Going Deeper with Convolutions”, in CoRR, vol. abs/1409.4842, arXiv, 2014.
 [41] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition”,in CoRR, vol. abs/1512.03385, arXiv, 2015.
 [42]

[43]
Ma, Ningning and Zhang, Xiangyu and Zheng, HaiTao and Sun, Jian, “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design”, in Computer Vision – ECCV, pp. 122–138, 2018.
 [44] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, in CoRR, vol. abs/1502.03167, arXiv, 2015.
 [45] B. McFee, J. Salamon and J. P. Bello, ”Adaptive Pooling Operators for Weakly Labeled Sound Event Detection,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 11, pp. 21802193, 2018.
Comments
There are no comments yet.