Pixel-Wise PolSAR Image Classification via a Novel Complex-Valued Deep Fully Convolutional Network

09/29/2019 ∙ by Yice Cao, et al. ∙ Xidian University 18

Although complex-valued (CV) neural networks have shown better classification results compared to their real-valued (RV) counterparts for polarimetric synthetic aperture radar (PolSAR) classification, the extension of pixel-level RV networks to the complex domain has not yet thoroughly examined. This paper presents a novel complex-valued deep fully convolutional neural network (CV-FCN) designed for PolSAR image classification. Specifically, CV-FCN uses PolSAR CV data that includes the phase information and utilizes the deep FCN architecture that performs pixel-level labeling. It integrates the feature extraction module and the classification module in a united framework. Technically, for the particularity of PolSAR data, a dedicated complex-valued weight initialization scheme is defined to initialize CV-FCN. It considers the distribution of polarization data to conduct CV-FCN training from scratch in an efficient and fast manner. CV-FCN employs a complex downsampling-then-upsampling scheme to extract dense features. To enrich discriminative information, multi-level CV features that retain more polarization information are extracted via the complex downsampling scheme. Then, a complex upsampling scheme is proposed to predict dense CV labeling. It employs complex max-unpooling layers to greatly capture more spatial information for better robustness to speckle noise. In addition, to achieve faster convergence and obtain more precise classification results, a novel average cross-entropy loss function is derived for CV-FCN optimization. Experiments on real PolSAR datasets demonstrate that CV-FCN achieves better classification performance than other state-of-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 7

page 8

page 10

page 12

page 13

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Polarimetric synthetic aperture radar (PolSAR) images have received a lot of attention as they can provide more comprehensive and abundant information compared with SAR images [1]

. In the process of PolSAR image analysis and interpretation, PolSAR image classification is arguably rather typical and important. Until now, numerous traditional schemes have been developed for PolSAR image classification, such as Wishart classifiers

[2, 3, 4], target decompositions (TDs) [5, 6, 7, 8] and random fields (RFs) [9, 10, 11]. However, these traditional methods focus on extracting features which are not only mostly low-level and hand-crafted but also involve a considerable amount of manual trial and error [12]. Besides, hand-engineered features such as TDs features heavily rely on the complex analysis of PolSAR data. Meanwhile, the selection of descriptive feature sets is a burden regarding computation time.

With the rapid development of learning algorithms, several machine learning tools do perform feature learning (or at least feature optimization), such as support vector machines (SVMs)

[13, 14]

and random forest (RF)

[12]. However, they are still shallow models that focus on a large number of input features and may not be robust to nonlinear data [15]

. Recently, deep learning (DL) has achieved remarkable results in the remote sensing community

[16, 17, 18]. Compared with the aforementioned conventional methods, DL techniques can automatically learn discriminative features and perform advanced tasks by multiple neural layers in an end-to-end manner, thereby reducing manual error and achieving promising results [19]

. In recent years, some better DL-based algorithms have significantly improved the performance of PolSAR image classification, such as sparse autoencoder (SAE)

[20]

, deep belief network (DBN)

[21], convolutional neural network (CNN) [22, 23], deep fully convolutional neural network (FCN) [24, 25, 26], and so on.

Notably, most studies on DL methods for PolSAR classification tasks predominantly focus on the case of real-valued neural networks (RV-NNs). In RV-NNs, input, weights, and output are all modeled as real-valued (RV) numbers. This means that projections are required to convert the PolSAR complex-valued (CV) data to RV features as RV-NNs input. Although RV-NNs have demonstrated excellent performance in PolSAR image classification tasks, there are a couple of problems stated by RV features. Firstly, it is unclear which projection yields the best performance towards a particular PolSAR image. Although the descriptive feature set generated by multi-projection has achieved remarkable results, a larger feature set will increase computing time and memory consumption, and may even cause data redundancy problems [12]. Secondly, projection sometimes means a loss of valuable information, especially the phase information, which may lead to unsatisfactory results. In fact, the phase of multichannel SAR data can provide useful information in the interpretation of SAR images. Especially for PolSAR systems, phase differences between polarizations have received significant attention for a couple of decades [27, 28, 29, 30].

In view of the aforementioned problems, some researchers have begun to investigate networks which are tailored to CV data of PolSAR images rather than requiring any projection to classify PolSAR images. Hansch et al. [31] first proposed the complex-valued MLPs (CV-MLPs) for land use classification in PolSAR images. Shang et al. [32] suggested a complex-valued feedforward neural networks in the Poincare sphere parameter space. Moreover, an improved quaternion neural network [33] and a quaternion autoencoder [34] have been proposed for PolSAR land classification. Recently, a complex-valued CNN (CV-CNN) specifically designed for PolSAR image classification has been proposed by Zhang et al. [35]

, where the authors derived a complex backpropagation algorithm based on stochastic gradient descent for CV-CNN training.

Although CV-NNs have achieved remarkable breakthroughs for PolSAR image classification, they still suffer some challenges. Firstly, we find that relatively deep networks architectures have not received considerable attention in the complex domain. Structures of the above CV-NNs are relatively simple with limited feature extraction layers. This results in limited learning characteristics, which may yield the risk of sub-optimal classification results. Secondly, these networks fail to sufficiently take spatial information into account to effectively reduce the impact of speckles on classification results. Due to the inherent existence of speckle in PolSAR images, the pixel-based classification accuracy is easily affected and even leads to incorrect results. In this case, those CV-NNs would be ineffective to explicitly distinguish complex classes, since only local contexts caused by small image patches are considered. Thirdly, it is necessary to construct a CV-NN for direct pixel-wise labeling to predict fast and effectively. Actually, the image classification is a dense (pixel-level) problem that aims at assigning a label to each pixel in the input image. However, existing CV-NNs usually assign an entire input image patch to a category. This results in a large amount of redundant processing and leads to seriously repetitive computation.

In response to the above challenges, this paper explores a complex-valued deep FCN architecture, which is an extension of FCN to the complex domain. The FCN is first proposed in [36] and is an excellent pixel-level classifier for semantic labeling. Typically, FCN outputs a 2-dimensional (2D) spatial image and can preserve certain spatial context information for accurate labeling results. Recently, FCNs have demonstrated remarkable classification ability in the remote sensing community [37, 38]. However, to utilize FCN in the complex domain (i.e., CV-FCN) for PolSAR image classification, some tricky problems need to be tackled. Firstly, the CV-FCN tailored to PolSAR data requires a proper scheme for complex-valued weight initialization. Generally, FCNs are often pre-trained on VGG-16 [39], whose parameters are first trained using optical images and are all real-valued numbers. However, those parameters are not appropriate to initialize CV weights for CV-FCN and are ineffective for PolSAR images since they cannot preserve polarimetric phases information. Therefore, a proper complex-valued weight initialization scheme not only can effectively initialize CV weights but also has the potential to reduce the risks of vanishing or exploding gradients, thereby training rapidly and improving the performance of networks. Secondly, layers in the upsampling scheme of CV-FCN should be constructed in the complex domain. Although some works have extended some layers to the complex domain [31, 35, 40], upsampling layers have not yet thoroughly examined in such domain. Finally, in the training processing of CV-FCN, it is necessary to select a loss function for CV predicted labeling. The aim is to achieve faster convergence during CV-FCN optimization and obtain higher classification accuracy. Thus, how to design a reasonable loss function in the complex domain that is suitable for PolSAR images classification needs to be solved.

In view of the above-involved limitations, we present a novel complex-valued deep fully convolutional network (CV-FCN) for classification of PolSAR imagery. The proposed deep CV-FCN adopts the complex downsampling-then-upsampling scheme to achieve pixel-wise classification results. To this end, this paper focuses on four works: 1) complex-valued weights initialization for faster PolSAR feature learning; 2) multi-level CV features extraction for enriching discriminative information; 3) more spatial information recovery for stronger speckle noise immunity; 4) average cross-entropy loss function for more precise labeling results. Specifically, CV weights for CV-FCN are first initialized by a new complex-valued weight initialization scheme. This scheme explicitly focuses on the statistical characteristic of PolSAR data for training. Thus, it is very effective for faster training. Then, different-level CV features that retain more polarization information are extracted via the complex downsampling section. Those CV features have a powerful discriminative capacity for various classes. Subsequently, the complex upsampling section upsamples low-resolution CV feature maps and generates dense labeling. Notably, for more spatial information retaining, the complex max-unpooling layers are used in the upsampling section. Those layers recover more spatial information by the max locations maps to reduce the effect of speckles on coherent labeling results as well as improve boundary delineation. In addition, to promote CV-FCN training more effectively, an average cross-entropy loss function is employed to update CV-FCN parameters. The loss function performs cross-entropy operations on the real and imaginary parts of CV predicted labeling, respectively. In this way, the phase information is also taken into account during parameters updating, resulting in more precise classification for PolSAR images. Extensive experimental results evidently reflect the effectiveness of CV-FCN for classification of PolSAR imagery. In summary, the major contributions of this paper can be highlighted as follows:

(a)
(b)
Fig. 1:

The deep CV-FCN framework for PolSAR image classification, where includes two modules: the feature extraction module (in the upper part) and the classification module (in the lower part). ’Complex Conv’ denotes the complex convolution layer, ’Complex BN’ denotes the complex batch normalization layer, ’

ReLU’ denotes the complex-valued ReLU activation. The purple dotted arrows denotes the transfer of the max locations maps.

In summary, the major contributions of this paper can be highlighted as follows:

  1. The CV-FCN structure is proposed for PolSAR image classification, which weights, biases, input, and output are all modeled as complex values. The CV-FCN directly utilizes PolSAR CV data as input without any data projection, in which case it can extract multilevel and more robust CV features, which can retain more polarization information and have the powerful discriminative capacity for various categories.

  2. A new complex-valued weight initialization scheme is employed to initialize CV-FCN parameters and conduct CV-FCN training from scratch. It allows CV-FCN to mine polarimetric features after only relatively few tuning. Thus, it can make CV-FCN training faster and save computation time.

  3. A complex upsampling scheme for CV-FCN is proposed to capture more spatial information by max-unpooling layers. This scheme can not only eliminate the upsampling learning simplifying optimization but also recover more spatial information by max locations maps to reduce the impact of speckles. Thus, smoother and more coherent classification results can be achieved.

  4. A new average cross-entropy loss function in the complex domain is employed for CV-FCN optimization. It takes the phase information into account during parameters updating by average cross-entropy operation of CV predicted labels. Therefore, the new loss function enables CV-FCN optimization more precise while boosting the labeling accuracy.

The remainder of this paper is organized as follows. Section II formulates a detailed theory for the classification method of CV-FCN. In section III, we conduct experiments on real benchmark PolSAR images and give detailed comparisons and analyses. Finally, the conclusion and future works are discussed in Section IV.

Ii Proposed CV-FCN for Classification of PolSAR Imagery

In this work, a deep CV-FCN is proposed to conduct PolSAR image classification. The CV-FCN method integrates the feature extraction module and the classification module in a unified framework. Thus, features extracted through CV-FCN that is trained by PolSAR data are more able to distinguish various categories for PolSAR classification tasks. In the following, we first give the framework of the deep CV-FCN classification method in Section II-A. Then, to learn more discriminative features for classification faster and more accurately, it is critical to train CV-FCN suitable for PolSAR images. Thus, we highlight and introduce four critical works for CV-FCN training in Section II-B, C, D. They include CV weight initialization, deep and multi-level CV feature extraction, more spatial information recovery, and loss function for more precise optimization. Finally, the CV-FCN classification algorithm is summarized in Section II-E.

Ii-a Framework of the Deep CV-FCN Classification Method

The framework of CV-FCN classification method is shown in Fig. 1, which is composed of two separate modules: feature extraction module and classification module. In the feature extraction module, CV-FCN is trained to exploit the discriminative information. Then, the trained CV-FCN is used to classify PolSAR images in the classification module.

The data patches set and the corresponding label patches set are first prepared as input to CV-FCN before training. The two sets are generated from the PolSAR data set and the corresponding ground truth mask, respectively. Let the CV PolSAR dataset be , where and are the height and width of the spatial dimensions respectively, and is the number of complex bands, is the complex domain. The corresponding ground truth mask is denoted as . The set of all data patches cropped from the given data set is denoted as , and the corresponding label patches set is , where and () represent one data patch and corresponding label patch, respectively. Here is the total number of patches, and are the patch size in the spatial dimension.

In the feature extraction module, the CV-FCN is mainly trained. A novel complex-valued weight initialization scheme is first adopted to initialize CV-FCN. Then, a certain percent of patches from the set are randomly chosen as the training data patches to the network. These data patches are forward-propagated through the complex downsampling section of CV-FCN [marked by red dotted boxes in Fig. 1] to extract multi-level CV feature maps. Then those low-resolution feature maps are upsampled by the complex upsampling section [marked by blue dotted boxes in Fig. 1] to generate predicated label patches. Subsequently, calculate the error between predicated label patches and the corresponding label patches according to a novel loss function, and then iteratively updating CV parameters in CV-FCN. According to some certain conditions, the updating iteration will terminate when the error value does not substantially change.

In the classification module, we feed the entire PolSAR dataset

to the trained network. The label of every pixel in this PolSAR image is predicted based on the output of the last complex softmax layer. Notably, compared with a CNN model which predicts a single label for the center of each image patch, the CV-FCN model can predict all pixels in the entire image at one time. Thus, this enables pixel-level labeling and can decrease the computation time during the prediction.

Section Block Module type Dimension Stride Pad
Downsampling Section B 1 Complex Convolution 331212 1 1
3-61

Complex Max-Pooling

22 2 0
2-61 B 2 Complex Convolution 331224 1 1
3-61 Complex Max-Pooling 22 2 0
2-61 B 3 Complex Convolution 332448 1 1
3-61 Complex Max-Pooling 22 2 0
2-61 B 4 Complex Convolution 334896 1 1
3-61 Complex Max-Pooling 22 2 0
2-61 B 5 Complex Convolution 3396192 1 1
3-61 Complex Max-Pooling 22 2 0
1-61 B 6 Complex Convolution 11192192 1 1
Upsampling Section B 7 Complex Up-Pooling 22 1 0
3-61 Complex Convolution 3319296 1 1
2-61 B 8 Complex Up-Pooling 22 1 0
3-61 Complex Convolution 339648 1 1
2-61 B 9 Complex Up-Pooling 22 1 0
3-61 Complex Convolution 334824 1 1
2-61 B 10 Complex Up-Pooling 22 1 0
3-61 Complex Convolution 332412 1 1
2-61 B 11 Complex Up-Pooling 22 1 0
3-61 Complex Convolution 33122 1 1
1-61 Complex Softmax
TABLE I: Detailed Configuration of the CV-FCN. Denotes the Total
Number of Classes. The Complex BN Layers and ReLU
Layers are Omitted for Brevity
(a)
Fig. 2: An simple illustration of complex max-pooling operator and complex max-unpooling operator. Where the green box and the black box are the structures of the complex max-pooling operator and the complex max-unpooling operator, respectively.

Ii-B New Complex-valued Weight Initialization Scheme Using Polarization Data Distribution for Faster Feature Learning

The CV-FCN architecture for PolSAR image classification task has been systematically built, the weight initialization problem will arise when training the network. Generally, deep ConvNets can update from pre-trained weights generated by the transfer learning technique. However, those weights are all real-valued numbers and only reflect the backscattering intensities, while the loss of the polarimetric phase

[41]. Here, based on the distributions of polarization data, a new complex-valued (CV) weight initialization scheme is employed for faster network learning.

For RV networks that process PolSAR images, learned weights, commonly known as kernels, can well characterize scattering patterns, particularly in high-level layers [22]. In [35]

, the initialization scheme is just to initialize the real and imaginary parts of a CV weight separately with a uniform distribution. Fortunately, for a reciprocal medium, a complex scattering vector

can be modeled by a multivariate complex Gaussian distribution, where individual complex scattering coefficient

of is assumed to have complex Gaussian distribution [1]. Thus, we utilize this distribution to initialize complex weights in CV-FCN.

Suppose that a CV weight is denoted as , where the real component and the imaginary component

are all identically Gaussian distributed with 0 mean and variance

. Here, the initialization criterion proposed by He et al. [42] is used to calculate the variance of , i.e., , where

is the number of input units, since this criterion provides the current best practice when the activation function is ReLU.

Notably, the CV weight can also be denoted as , where the magnitude follows the Rayleigh distribution. The expectation and the variance are given by

(1)
(2)

where is the single parameter in the Rayleigh distribution. In addition, the variance and the variance can be defined as

(3)
(4)

According to the initialization rules of [40], in the case of symmetrically distributed around 0, . Thus, can be formulated as

(5)

Taking Equation (1) and Equation (2) into account, is calculated as

(6)

According to He’s initialization criterion and Equation (6), the single parameter in the Rayleigh distribution can be computed as . At this point, the Rayleigh distribution can be used to initialize the amplitude . In addition, the phase is initialized by using the uniform distribution between and . Thus, the initialization of the complex weight is finished.

It is worth noting that our initialization scheme is quite different from the random initialization on both the real and imaginary parts of a CV weight [35]. The most notable superiority of the new initialization scheme will be explicitly focusing on the statistical characteristic of training data, which makes it possible to learn a CV network suitable for PolSAR images after a small amount of fine-tuning. We can understand that the network exhibits some of the same properties as the data to be learned at the beginning, which seems to give a priori rather than the initial random information. Thus, it is possible to increase the potential chance of learning some special property of PolSAR datasets and is much effective for faster training.

Ii-C Deep CV-FCN for Dense Feature Extraction

In the forward propagation of CV-FCN training procedure, dense features are extracted through the complex downsampling-then-upsampling scheme. The detailed configuration of CV-FCN is shown in Table I. The complex downsampling section first extracts effective multi-level CV features through downsampling blocks (i.e., B1-B5 in Fig. 1). Then, the complex upsampling section recovers more spatial information in a simple manner and produces dense labeling through a series of upsampling blocks (i.e., B7-B11 in Fig. 1). In particular, fully skip connections between the complex downsampling section and the complex upsampling section fuse shallow, fine features and deep, coarse features to preserve sufficient detailed information for complex classes distinction.

Ii-C1 Multi-level Complex-valued Feature Extraction via the Complex Downsampling Section

The complex downsampling section consisting of downsampling blocks extracts 2-D CV features of different levels. In CV-FCN, five downsampling blocks are employed to extract more abstract and extensive features. Each of them contains four layers, including a complex convolution layer, a complex batch normalization layer, a complex activation layer, and a complex max-pooling layer. Among these layers, the main feature extraction work is performed in the complex convolution layer. Compared with the real convolution layer, it extracts CV features retaining more polarization information and discriminative information through the complex convolution operation.

In the th complex convolution layer, given complex filters and complex bias , where is the number of input channel and is the number of output channel. The output complex feature maps outputted by the complex convolution layer is computed by

(7)

where is the given input complex feature maps, and is the input feature map size. is the convolution operation in the complex domain. The matrix notation of the th output complex feature map is given by

(8)

where , and and are respectively the real part and the imaginary part of . is the convolution operation in the real domain. Thus the th output complex feature map can be represented as

(9)

The complex batch normalization (BN) layer [40]

is performed for normalization after complex convolution, which holds great potential to relieve networks from overfitting. For the non-linear transformation of CV features, we find that the complex-valued ReLU (

ReLU) as the complex activation can provide us good results. The ReLU is defined as

(10)

where . Then the output in the th complex nonlinear layer can be given

(11)

Furthermore, the complex max-pooling layer [35] is adopted to generalize features into a higher level. In this way, features are more robust and CV-FCN can converge well. After five downsampling blocks, the block 6 (B6 in Fig. 1) including a complex convolution layer with 11 kernels and a complex batch normalization layer densifies its sparse input and extracts complex convolution features.

Ii-C2 Using Complex Upsampling Section for More Spatial Information Recovery to stronger Speckle Noise Immunity

After the complex downsampling section for multi-level CV features extraction, a complex upsampling section is implemented to upsample those CV feature maps. Specifically, the new complex max-unpooling layers are employed in the complex upsampling section. The reason is two-fold. On the one hand, compared with the complex deconvolution layer which is another upsampling operation, the complex max-unpooling layer reduces the number of trainable parameters and mitigates information loss due to complex pooling operations. On the other hand, owing to the inherent existence of speckle in PolSAR images, obtaining smooth labeling results is not easy. This issue can be addressed by the complex max-unpooling layer that recovers more spatial information by the max locations maps [represented by purple dotted arrows in Fig. 1]. The spatial information is a critical indicator for confusing categories classification, which captures more wider visual cues to stronger speckle noise immunity.

To be more intuitive, Fig. 2 illustrates an example of the complex max-unpooling operation. The green and black boxes are simple structures of the complex max-pooling and complex max-unpooling, respectively. As shown in the green box, the amplitude feature map is formed by the real and imaginary feature maps where the red dotted box represents 22 pooling window with a stride of 2. In the amplitude feature map, four maximum amplitude values are chosen by corresponding pooling windows which are marked by orange, blue, green, and yellow, respectively. They construct the pooled map. At the same time, locations of those maxima are recorded in a set of switch variables which is visualized by the so-called max locations map. On the other hand, within the black box, the real and imaginary input maps are upsampled by the usage of this max locations map, respectively. Then the real and imaginary unpooled maps are produced. Here, those unpooled maps are sparse wherein white regions have the values of 0. This will ensure that the resolution of the output is higher than the resolution of its input.

In particular, we perform fully skip connections which can fuse multi-level features to preserve sufficient discriminative information for the classification of complex classes. Finally, the complex output layer with the complex softmax function is used to calculate the prediction probability map. Thus, the output of CV-FCN can be formulated as

(12)

where is the inputs of the complex output layer, is the softmax function in the real domain. In this layer, output feature maps are the same size as the data cubic fed into CV-FCN. This enables pixel-to-pixel training. After the complex downsampling section and the complex upsampling section, the complex forward propagation process of the training phase is completed.

Ii-D Average Cross-entropy Loss Function for Precise CV-FCN Optimization

To promote CV-FCN training more effectively and achieve more precise results, a novel loss function is used as the learning objective to iteratively update CV-FCN parameters during the backpropagation. includes and . Usually, for multi-class classification tasks, the cross-entropy loss function performs well to update parameters. Compared with the quadratic cost function, it can increase the training speed and promote the training of NNs more effectively. Thus, a novel average cross-entropy loss function is employed for CV predicted labels in PolSAR classification tasks, which is based on the definition of the popular cross-entropy loss function. Formally, the average cross-entropy (ACE) loss function is defined as

(13)

where indicates the output data cubic in the last complex softmax layer and is the total number of classes. is the sparse representation of the true label patch

, which is converted by one-hot encoding. Notably, non-zeros positions within

are instead of . This means that we also take the phase information into account during parameters updating. As a result, the updated CV-FCN can work effectively, leading to more precise classification results for PolSAR images.

can be updated iteratively by and learning rate according to

(14)
(15)

To calculate (14) and (15), the key point is computing the partial derivatives. Note

is a real-valued loss function, it can be back-propagated through CV-FCN according to the generalized complex chain rule in

[31]. Thus, the partial derivatives can be calculated as follows:

(16)
(17)

When the value of loss function no longer decreases, the parameters update is suspended and the training phase is completed. Then the trained network will be used to predict the entire PolSAR image in the classification phase.

Input: PolSAR dataset , learning rate , batch size, momentum parameter.

Output: Dense label .

1:  Construct the data patches set and the label patches set using ;
2:  Initialize CV-FCN parameters by Section II-B;
3:  Choose the entire training set from and ;
4:  Repeat:
5:  Forward pass the complex downsampling section to obtain multi-level feature maps by Section II-C1;
6:  Call the complex upsampling section to recover more spatial information by Section II-C2;
7:  Calculate loss function by Section II-D;
8:  Update by and .
9:  Until:meet the termination criterion.
10:  Classify the entire PolSAR image by forward passing the trained network to obtain .
11:  End
Algorithm 1 CV-FCN Classification Algorithm for PolSAR Imagery

Ii-E CV-FCN PolSAR Classification Algorithm

For more intuitive, the proposed CV-FCN PolSAR Classification algorithm is illustrated by Algorithm 1. Specially, we first construct the entire training set for CV-FCN and employ the new complex-valued weight initialization scheme to initialize the network. And then, we train CV-FCN by alternately updating CV-FCN parameters using the average cross-entropy loss function. Finally, the entire PolSAR image is classified using the trained network.

Iii Experimental Analysis and Evaluation

In this section, experimental datasets description and evaluation metrics are first presented. Then, input data vector and experimental settings are listed for CV-FCN training. Moreover, the effectiveness of some strategies for CV-FCN is analyzed in detail by a series of special experiments. Finally, comparisons with other classification methods on three PolSAR datasets are presented to demonstrate the superiority of the proposed CV-FCN.

Iii-a Experimental Datasets Description

We use three benchmark PolSAR datasets for experiments. Details about these datasets are listed as follows.

(a)
(b)
(c)
Fig. 3: Flevoland Benchmark PolSAR image and related ground truth categorization information. (a) The PauliRGB image. (b) The ground truth categorization map. (c) Color code of different classes.
(a)
(b)
(c)
Fig. 4: San Francisco PolSAR image and related ground truth categorization information. (a) The PauliRGB image. (b) The ground truth categorization map. (c) Color code of different classes.
(a)
(b)
(c)
Fig. 5: Oberpfaffenhofen PolSAR image and related ground truth categorization information. (a) The PauliRGB image. (b) The ground truth categorization map. (c) Color code of different classes.

Iii-A1 Flevoland Benchmark dataset

Fig. 3(a) shows the PauliRGB image of Flevoland Benchmark data, which was acquired by NASA/JPL AIRSAR in 1991. The size of the image is 10201024. The ground-truth class labels and the corresponding color codes are shown in Fig. 3(b) and Fig. 3(c), respectively. There are 14 classes in the image including potato, fruit, oats, beet, barley, onions, wheat, beans, peas, maize, flax, rapeseed, grass, and lucerne.

Iii-A2 San Francisco dataset

This AIRSAR full PolSAR image provides good coverage of four targets including water, vegetation, low-density urban and high-density urban. The original data has a dimension of 9001024 pixels with a spatial resolution of 10 m, as shown in Figure 4(a). The ground-truth class labels and color codes are shown in Fig. 4(b) and Fig. 4(c).

Iii-A3 Oberpfaffenhofen dataset

This data is an ESAR data of Oberpfaffenhofen area in Germany, provided by the German Aerospace Center, has a size of 13001200 pixels. The Pauli- RGB image, the ground-truth class labels, and color codes are respectively shown in Fig. 5(a)-(c). There are three classes in the image: built-up areas, wood land, and open areas.

Iii-B Evaluation Metrics

With the hand-marked ground-truth images, the overall accuracy (OA), average accuracy (AA) and Kappa coefficient () are used as the evaluation measures for classification performance evaluation. Where OA represents the ratio of the number of correctly labeled pixels divided by the total number of test pixels; AA is defined as the average of individual class accuracy; Kappa which does not consider the successful classification that obtained by chance gives a good representation of the overall performance of the classifiers. The larger values of three criteria, the better classification performance.

Iii-C Preparing for Classifier Model Training

Iii-C1 Complex-valued Input Vector for CV-FCN

Before training CV-FCN, the CV input vector needs to be determined. CV-FCN works directly on the PolSAR CV data without any data projection from the complex to the real domain. Since the coherency matrix or covariance matrix completely describes the distributed target [1], the PolSAR data is usually presented in these formats. The polarimetric coherency matrix is calculated as

(18)

where the superscript denotes the complex conjugate transpose, is the number of looks, and denotes the th scattering vector in the multi-look processing window.

The coherency matrix is a Hermitian positive semidefinite matrix which implies that main diagonal elements are RV and other CV elements are conjugate symmetric about the main diagonal. Therefore, the six elements of the upper triangular matrix of can be used to fully represent the PolSAR data [1]. So we utilize these six elements to construct the CV input vector for CV-FCN, which is represented by

(19)

Here, imaginary parts of in CV input feature vector are all expanded with a value of 0. On the other hand, compared with CV input feature vector with phase information, the RV input feature vector without phase information can be represented by

(20)

Iii-C2 Parameter Settings

Relevant parameter settings are required before CV-FCN training. For PolSAR image classification, some works of literature have discussed the sampling rate and parameter settings of NNs structures [23, 24, 25] in detail. Hence, we no longer spend time discussing again and will choose them through experiments.

In this paper, the sliding window operation in [43, 44] is used to generate the data patches set from experimental images and corresponding label patches set from ground truth images. Here, we choose 128 as the default setting of sliding windows size and 40 as the default setting of stride for all experimental datasets, which is a trade between classification performance and computational burden. Additionally, to mitigate overfitting on datasets, the data augmentation strategy [26] was carried out by vertically and horizontally flipping all patches. Then all these patches are the input data of the proposed CV-FCN, where 90 for training and 10 for validation, respectively. It is worth noting that only labeled pixels in individual label patch are considered in modifying parameters of the network during the training [25].

Moreover, Adam with momentum 0.9 is used to update CV-FCN parameters. The learning rate

is 0.0001. The size of mini-batch is empirically set to 30. The training epoch number is 200 until the objective function converges. Additionally, dropout regularization is adopted to reduce overfitting. In this paper, all non-deep methods are run on Matlab R2014b, and DL-based methods are implemented in the Keras framework with TensorFlow as the back end. The machine used for experiments is a Lenovo Y720 cube gaming PC with an Intel Core i7-7700 CPU, an Nvidia GeForce GTX 1080 GPU, and 16GB RAM under Ubuntu 18.04 LTS operating system. To make comparisons as fair as possible, we take the average of 10 experiments as the final result.

Iii-D CV-FCN Model Analysis and Discussions

To evaluate the performance of some aspects in CV-FCN model, two ablation experiments and two comparison experiments are conducted as follows. Notably, the most perspective of the proposed CV-FCN is the complex-valued upsampling scheme, in which the fully skip connections and the max location maps are the two important strategies. Therefore, two ablation experiments are designed for comparison and evaluation. Specifically, the impact of fully skip connections in CV-FCN structure is investigated firstly. Then, the effect of max locations maps on classification performance is evaluated on all datasets. Additionally, a comparison experiment about the new weight initialization scheme is conducted. Finally, the effectiveness of different loss functions on precise classification is compared. The two ablation experiments are all conducted on all three datasets. Moreover, the third experiment and the fourth experiment is conducted on the Flevoland Benchmark dataset and the Oberpfaffenhofen dataset, separately. For the Flevoland Benchmark dataset, 5 labeled pixels per class are randomly chosen for training, and the rest for the test. For the San Francisco image and the Oberpfaffenhofen image, about 1 of the whole labeled reference pixels are randomly selected for training, and the rest is used to evaluate experimental performance.

Dataset Methods OA AA
Flevoland NSCV-FCN 98.73% 94.97% 0.9851
2-51 CV-FCN 99.72% 98.14% 0.9967
SanFrancisco NSCV-FCN 98.37% 98.12% 0.9774
2-51 CV-FCN 99.69% 99.65% 0.9957
Oberpfaffenhofen NSCV-FCN 93.36% 90.51% 0.8831
2-51 CV-FCN 97.26% 96.38% 0.9531
TABLE II: Overall, Average Accuraties (%) and Kappa Coefficient of
the NSCV-FCN Method and the CV-FCN Method
Dataset Methods OA AA
Flevoland NLCV-FCN 99.29% 96.41% 0.9915
2-51 CV-FCN 99.72% 98.14% 0.9967
SanFrancisco NLCV-FCN 98.97% 98.84% 0.9857
2-51 CV-FCN 99.69% 99.65% 0.9957
Oberpfaffenhofen NLCV-FCN 95.80% 94.31% 0.9273
2-51 CV-FCN 97.26% 96.38% 0.9531
TABLE III: Overall, Average Accuraties (%) and Kappa Coefficient of
the NLCV-FCN Method and the CV-FCN Method

Iii-D1 Ablation Experiment 1 - Impact of Fully Skip Connections

The fully skip connections is an important part of CV-FCN because it enables the network to enhance more detail. The core idea is to superimpose feature maps of different levels to improve the final classification effect. Here, to evaluate the effectiveness of fully skip connections on classification accuracy, we construct the CV-FCN structure without skip connections. We use NSCV-FCN to represent this CV-FCN network. Table II contains the evaluation indices for classification.

As illustrated in Table II, CV-FCN outperforms NSCV-FCN which reveals that fully skip connections are useful for improving classification accuracy. Compared with NSCV-FCN, the proposed CV-FCN increases the accuracy by 0.99 of OA, 3.17 of AA, and 0.0116 of the Kappa coefficient, respectively, on the Flevoland dataset. Moreover, on the SanFrancisco dataset, CV-FCN is able to achieve the accuracy increments by 1.32 of OA, 1.53 of AA, and 0.0183 of Kappa, respectively. In particular, on the Oberpfaffenhofen dataset, CV-FCN increases the accuracy significantly by 3.9 of OA, 5.87 of AA, and 0.07 of Kappa, respectively. This superiority can be attributed to the fact that fully skip connections fuse features of different levels to preserve more discriminative information for PolSAR image classification.

Iii-D2 Ablation Experiment 2 - Impact of Max Location Maps

The most prominent trait in the complex upsampling section is that the max location maps are utilized to perform nonlinear upsampling of the feature maps, which are beneficial for more precise reconstruction output. To examine the effect of max locations maps, we construct a CV-FCN structure wherein complex upsampling layers upsample feature maps without the guidance of max locations maps. We call this network as NLCV-FCN. The experimental results on all datasets are shown in Table III.

As illustrated in Table III, CV-FCN outperforms NLCV-FCN on all three datasets. Specifically, CV-FCN is able to achieve the accuracy increments by 0.43 of OA, 1.73 of AA, and 0.0052 of Kappa, respectively, on the Flevoland dataset; by 0.72 of OA, 0.81 of AA, and 0.001 of Kappa, respectively, on the SanFrancisco dataset; by 1.46 of OA, 2.07 of AA, and 0.0258 of Kappa, respectively, on the Oberpfaffenhofen dataset. These results suggest that the max locations maps benefit the classification accuracy since they have the capacity of retrieving more sufficient spatial information.

(a)
Fig. 6: The validation curves of different weight initialization schemes on the Flevoland Benchmark dataset. Red line denotes the proposed complex-valued weight initialization scheme, and blue line denotes the old initialization scheme.
Epochs 10 20 30 40 50 60 70 80 90 100 110 120 130 140
CWI-1 OA 68.51% 84.58% 89.05% 94.99% 96.08% 97.56% 97.78% 98.54% 99.03% 99.12% 98.85% 99.25% 98.40% 97.60%
AA 28.55% 48.55% 62.30% 75.13% 79.90% 88.89% 89.31% 96.34% 96.56% 96.66% 95.82% 98.02% 96.30% 95.14%
0.6163 0.8174 0.8709 0.9410 0.9539 0.9713 0.9738 0.9828 0.9885 0.9897 0.9865 0.9912 0.9812 0.9717
CWI-2 OA 91.78% 94.89% 97.79% 99.09% 98.93% 99.36% 98.07% 98.81% 98.03% 98.77% 98.79% 97.70% 95.17% 94.11%
AA 67.31% 75.51% 90.08% 96.59% 96.99% 98.08% 95.06% 96.26% 95.64% 97.50% 97.13% 94.11% 94.95% 73.15%
0.9029 0.9398 0.9740 0.9893 0.9874 0.9925 0.9773 0.9865 0.9768 0.9855 0.9857 0.9729 0.9434 0.9305
TABLE IV: Overall, average accuraties (%) and Kappa coefficient of different complex weight initialization methods on the Flevoland Benchmark PolSAR image.
(a)
(b)
Fig. 7: The overall accuracy curves for different loss functions of training and validation on the Oberpfaffenhofen dataset. Red lines denote the proposed loss function, blue lines denote the CMSE loss function, and green lines denote the CMAE loss function. (a) The overall accuracy of training; (b) The overall accuracy of validation.
(a)
(b)
(c)
Fig. 8: Classification results of Oberpfaffenhofen area with different loss functions. (a) CMSE loss function; (b) CMAE loss function; (c) Proposed ACE loss function.

Iii-D3 Comparison Experiment 1 - Complex Weight Initialization

To evaluate the impact of complex-valued weight initialization which is critical for CV-FCN learning, we conduct a comparison experiment on the Flevoland Benchmark dataset. We only utilize the complex weight initialization in [35] as the old CV initialization scheme for comparison. The old CV initialization scheme is to initialize the real and imaginary parts of a CV weight separately with a uniform distribution [35]. Fig. 6 illustrates the difference in the validation curves of one presentative experiment, where the proposed weight initialization and the compared weight initialization are denoted by CWI-1 and CWI-2, respectively. Furthermore, we also report the evaluation indices of two initialization schemes as a function of epoch. Specifically, we first train CV-FCN for 10 epochs and then update classification results every 10 epochs. Table IV contains a comparison of the results.

As shown in Fig. 6, both old initialization and proposed initialization lead to convergence, but proposed initialization trains CV-FCN faster and reaches the optimal value earlier. As illustrated in Table IV, proposed initialization achieves the best results when the training epoch is set as around 60, while the old initialization achieves is around 120. These results validate that the proposed initialization not only facilitates faster learning but also improves the classification performance of CV-FCN. This may be attributed to the ability of proposed initialization to reduce risks of vanishing or exploding gradients, which has great significance for deep networks training. Additionally, these phenomenons, in part, illustrate that the proposed initialization scheme is suitable for CV-FCN to achieve given PolSAR image classification.

Iii-D4 Comparison Experiment 2 - Loss Function

We carry out a comparison experiment on the Oberpfaffenhofen dataset to evaluate the effectiveness of the complex average cross-entropy loss function. The complex-valued mean square error (MSE) in [35] and the complex-valued mean absolute error (MAE) are utilized as compared loss functions, which will be respectively denoted by CMSE and CMAE. The CMSE and the CMAE can be respectively expressed as

(21)
(22)

The overall accuracy curves for different loss functions of training and validation illustrated in Fig. 7(a) and Fig. 7(b), respectively. Moreover, the resulted typical classification maps for different loss functions are shown in Fig. 8.

As seen from Fig. 7, the proposed complex loss function denoted by ACE and the CMAE can converge faster than CMSE. The training and validation accuracies by using the ACE and CMAE remain relatively stable after 120 epochs, while CMSE still does not achieve similar stability until 250 epochs. Additionally, the best accuracy of the proposed loss function is higher than the CMAE when validation. As shown in Fig. 8, the classification map with CMSE is smoother than CMAE and ACE, this finding is potentially explained that the CMSE loss can mitigate the effects of the speckle noise. However, boundary delineation between different categories are ambiguous due to being too smooth. Although the classification map with CMAE contains clear structural information, it has more misclassification points since affected by the speckle noise. Notably, the proposed loss function can achieve correct boundary localization as well as shows better robustness to the speckle noise. Thus, these above phenomena can partially establish the effectiveness of the proposed loss function employed.

Class Non-deep Methods DL-based Methods
SVM Wishart MRF RV-MLP RV-SCNN RV-DCNN RV-FCN CV-MLP CV-SCNN CV-DCNN CV-FCN
Potato 86.29% 79.27% 99.26% 99.01% 99.86% 99.90% 99.89% 99.14% 99.63% 99.69% 99.98%
Fruit 89.43% 82.24% 96.78% 99.98% 98.87% 98.19% 99.05% 88.54% 97.37% 96.53% 99.97%
Oats 87.23% 88.59% 96.63% 83.14% 55.42% 99.97% 66.94% 95.34% 100% 100% 77.57%
Beet 68.49% 67.71% 82.25% 76.91% 84.87% 83.25% 99.28% 80.82% 85.52% 90.42% 99.97%
Barely 76.37% 76.40% 93.47% 99.91% 100% 99.99% 99.67% 99.65% 99.76% 100% 100%
Onions 32.11% 33.66% 32.68% 56.81% 65.36% 89.08% 69.89% 50.80% 83.50% 89.13% 99.24%
Wheat 84.05% 86.90% 97.27% 99.92% 99.97% 99.99% 99.96% 99.86% 99.99% 99.97% 100%
Beans 92.42% 88.82% 99.54% 81.33% 82.57% 88.69% 90.51% 71.16% 91.65% 97.79% 98.79%
Peas 93.47% 94.81% 99.95% 100% 99.96% 100% 99.82% 100% 100% 100% 99.90%
Maize 62.64% 61.09% 14.19% 98.91% 93.23% 87.74% 91.00% 99.53% 90.12% 98.15% 98.85%
Flax 97.19% 94.88% 99.51% 61.78% 56.41% 69.04% 99.79% 87.77% 64.96% 96.65% 99.97%
Rapeseed 87.45% 66.63% 94.76% 100% 100% 100% 99.94% 100% 99.98% 100% 99.99%
Grass 66.63% 65.70% 74.22% 88.13% 98.54% 99.12% 93.53% 83.11% 98.35% 100% 99.87%
Lucerne 79.98% 82.72% 89.19% 98.95% 99.93% 100% 99.36% 98.17% 99.99% 100% 99.97%
OA 81.67% 76.44% 92.58% 95.34% 96.09% 97.18% 98.63% 96.21% 97.06% 98.75% 99.72%
AA 78.84% 76.39% 83.55% 88.91% 88.21% 93.92% 93.47% 90.35% 93.63% 97.74% 98.14%
0.7880 0.7287 0.9131 0.9450 0.9538 0.9668 0.9837 0.9554 0.9654 0.9853 0.9967
TABLE V: Individual Class, Overall, Average Accuraties (%) and Kappa Coefficient of all Competing Methods on
the Flevoland Benchmark PolSAR Image
Network Module type dimension stride pad
RV-DCNN Convolution 3399 1 1
2-51 Max-Pooling 22 2 0
2-51 Convolution 33918 1 1
2-51 Max-Pooling 22 2 0
2-51 Convolution 331836 1 1
2-51 Max-Pooling 22 2 0
2-51 Convolution 333672 1 1
2-51 Max-Pooling 22 2 0
2-51 Convolution 3372144 1 1
2-51 Max-Pooling 22 2 0
2-51 Fully Connection 144144 1
2-51 Fully Connection 144 1
2-51 Softmax
CV-DCNN Complex Convolution 331212 1 1
2-51 Complex Max-Pooling 22 2 0
2-51 Complex Convolution 331224 1 1
2-51 Complex Max-Pooling 22 2 0
2-51 Complex Convolution 332448 1 1
2-51 Complex Max-Pooling 22 2 0
2-51 Complex Convolution 334896 1 1
2-51 Complex Max-Pooling 22 2 0
2-51 Complex Convolution 3396192 1 1
2-51 Complex Max-Pooling 22 2 0
2-51 Complex Fully Connection 192192 1
2-51 Complex Fully Connection 1922 1
2-51 Complex Softmax
2-51
TABLE VI: Detailed Configuration of the RV-DCNN and the CV-DCNN.
Denotes the Total Number of Classes. The ReLU Layers
in RV-DCNN, Complex BN Layers and ReLU Layers in CV-DCNN are Omitted for Brevity

Iii-E Comapring Models

We demonstrate the effectiveness of the proposed method by comparison with some state-of-the art methods including SVM [13], Wishart classifier [3], Markov random field (MRF) [9], MLP [20], CVNN [31], CNN [22], CV-CNN [35], and FCN [43]. In the previous second part, we have already introduced the structure of CV-FCN. The specific settings of comparison methods are briefly described as follows.

  • Non-deep methods: The non-deep methods include SVM [13], Wishart classifier [3], and MRF [9]. They all adopt the input feature vector shown in Equation (20

    ). For SVM-based methods, the radial basis function (RBF) kernel is chosen advised by

    [13]. For MRF, parameters are set according to the original publication [9].

  • RV-FCN: To compare with CV-FCN, we operate the RV-FCN [43]

    that is the same as that it does in CV-FCN. Since the dimension of the input patch for RV-FCN is 9 and the dimension for CV-FCN is 6. For a fair comparison, we adjust parameter settings in RV-FCN to have the same degree of freedom (DoF) as CV-FCN. We mainly adjust the number of kernels in every convolutional layer of RV-FCN, which are about 0.8 times that in CV-FCN.

  • RV-MLP/CV-MLP: Referring to [20]

    , we choose a three-layer CV-MLP network, which consists of 128 neurons in the first complex hidden layer and 256 in the second complex hidden layer. The last layer is a softmax classifier to predict the probability distribution. To have the same DoF, for RV-MLP, there are 96 neurons in the first hidden layer and 180 in the second hidden layer. For MLPs, we choose a 32

    32 neighborhood of each pixel as the patch fed into networks to consider more contextual information.

  • RV-SCNN/CV-SCNN: We use SV-SCNN and CV-SCNN to represent networks in [22] and [35], respectively. According to SV-SCNN in [22], the architecture of CV-SCNN is adjusted, which contains the input layer, two convolution layers interleaved with two pooling layers, two fully connected layers, and the softmax layer. For SCNNs, a 3232 neighborhood of each pixel is employed as the patch for training.

  • RV-DCNN/CV-DCNN: The downsampling section in FCN is transformed from a CNN structure. Therefore, for a fair comparison between FCN and CNN, we construct a new CNN structure represented by DCNN according to CV-FCN structure. Table VI reports the detail configuration of SV-DCNN and CV-DCNN. Compared with CNNs in [35], DCNNs contain more convolutional layers. For DCNNs, we have the same operation as SCNNs to generate patches for training.

Iii-F Classification Performance Evaluation

To evaluate the effectiveness of CV-FCN, comparisons with above models on three PolSAR datasets are presented as follows.

(a) GroundTruth
(b) SVM
(c) Wishart
(d) MRF
(e) RV-MLP
(f) RV-SCNN
(g) RV-DCNN
(h) RV-FCN
(i) CV-MLP
(j) CV-SCNN
(k) CV-DCNN
(l) CV-FCN
Fig. 9:  Classification results of Flevoland Benchmark area data with different methods.
(a) GroundTruth
(b) SVM
(c) Wishart
(d) MRF
(e) RV-MLP
(f) RV-SCNN
(g) RV-DCNN
(h) RV-FCN
(i) CV-MLP
(j) CV-SCNN
(k) CV-DCNN
(l) CV-FCN
Fig. 10: Classification results of San Francisco area data with different methods.

Iii-F1 Flevoland Benchmark Dataset Result

For this dataset, we randomly choose 5 of available labeled samples per class for training. The classification maps obtained from all methods are shown in Fig. 9, and the accuracies are reported in Table V.

As shown in Fig. 9(b) and Fig. 9(c), classification maps obtained from SVM and Wishart are seriously affected by speckle noisy points since they only consider polarimetric information. Compared with Fig. 9(b) and Fig. 9(c), the classification map from MRF shown in Figure 9(d) is much clearer in which misclassification pixels are significantly reduced. The reason is that MRF can embed the spatial smoothness information into the classification stage. Fig. 9(e)-(l) demonstrate the classification resluts from all DL-based methods, where Fig. 9(e)-(h) are the results of different RV-NNs and Fig. 9(i)-(l) give the results of different CV-NNs. It can be seen that all DL-based methods outperform non-deep methods, which indicates that learning features have stronger discriminative ability than traditional features.

When comparing RV-NNs, it can be seen that RV-FCN performs best for the classification of flax class [marked by white ovals in Fig. 9(e)-(h)]. In addition, among CV-NNs, CV-FCN has the highest classification accuracy on the beet class [marked by yellow ovals in Fig. 9(i)-(l)] and the whole class label map of CV-FCN is much clearer than others. The above two results indicate that proposed FCN architecture is advantageous for PolSAR classification compared to other network structures, especially CNNs.

Moreover, comparing RV-NNs and CV-NNs directly, we can observe that CVNNs have better performance than their RV counterparts. For example, Fig. 9(h) and Fig. 9(l) are classification results from RVFCN and CV-FCN, respectively. The confusion between oats class and beet class is severe in Fig. 9(h), but does not appear in Fig. 9(l) [marked by sky-blue rectangles]. This confirms the effectiveness of complex-valued features with phase information for the classification of PolSAR imagery. From the overall effects depicted in Fig. 9, the classification map of CV-FCN is noticeably closer to the ground truth map.

Class Non-deep Methods DL-based Methods
SVM Wishart MRF RV-MLP RV-SCNN RV-DCNN RV-FCN CV-MLP CV-SCNN CV-DCNN CV-FCN
Class1 99.99% 96.47% 99.37% 99.05% 97.78% 93.35% 99.62% 99.12% 99.68% 97.59% 99.98%
Class2 87.49% 83.23% 89.09% 91.65% 84.77% 97.10% 98.91% 93.28% 89.68% 98.69% 99.38%
Class3 77.51% 56.45% 82.85% 90.51% 96.88% 95.16% 96.68% 92.34% 96.50% 95.75% 99.54%
Class4 77.66% 64.54% 82.16% 96.91% 98.09% 99.81% 98.79% 96.65% 97.62% 99.45% 99.72%
OA 88.46% 79.13% 90.53% 94.96% 94.23% 95.74% 98.61% 95.77% 96.26% 97.74% 99.69%
AA 85.66% 75.17% 88.37% 94.53% 93.96% 96.36% 98.49% 95.35% 95.87% 97.87% 99.65%
0.8394 0.7128 0.8683 0.9303 0.9203 0.9416 0.9807 0.9415 0.9483 0.9687 0.9957
TABLE VII: Individual Class, Overall, Average Accuraties (%) and Kappa Coefficient of all Competing Methods on
the San Francisco PolSAR Image

The evaluation indices of all methods are listed in Table V. As shown in Table V, MRF and all DL-based methods achieve OA exceeding 90. All CV-NNs methods achieve better performance than their RV counterparts in terms of all evaluation metrics. In particular, the largest part of changes in metrics is AA values. Furthermore, CV-FCN outperforms other compared methods in terms of three quantitative criteria. Although compared with CV-DCNN, CV-FCN attains only 0.97 improvement in terms of OA, all classes besides oats show comparable or higher accuracy which is consistent with the results shown in Fig. 9. In summary, from Fig. 9 and Table V, for Flevoland Benchmark dataset, CV-FCN achieves the best performance compared with other methods and has powerful ability to distinguish different terrain categories.

(a) GroundTruth
(b) SVM
(c) Wishart
(d) MRF
(e) RV-MLP
(f) RV-SCNN
(g) RV-DCNN
(h) RV-FCN
(i) CV-MLP
(j) CV-SCNN
(k) CV-DCNN
(l) CV-FCN
Fig. 11: Classification results of Oberpfaffenhofen area data with different methods.
Methods OA AA
Non-deep SVM 82.36% 76.10% 0.6927
Wishart 80.90% 74.11% 0.6671
MRF 83.70% 77.83% 0.7156
DL-based RV-MLP 89.36% 86.27% 0.8186
RV-SCNN 93.35% 94.16% 0.8889
RV-DCNN 94.75% 93.42% 0.9097
RV-FCN 95.51% 94.38% 0.9227
2-51 CV-MLP 90.02% 86.59% 0.8279
CV-SCNN 93.52% 93.46% 0.8909
CV-DCNN 95.76% 94.87% 0.9274
CV-FCN 97.26% 96.38% 0.9531
TABLE VIII: Individual Class, Overall, Average Accuraties (%) and
Kappa Coefficient of all Competing Methods on
the Oberfaffenhofen PolSAR Image
(a)
Fig. 12: Classification accuracies of Oberpfaffenhofen area classes with different methods.

Iii-F2 San Francisco Dataset Result

For the San Francisco dataset, we randomly choose 1 labeled pixels per class for training, and the remaining for testing. The classification results obtained from all methods are shown in Fig. 10, and Table VII reports the evaluation metrics of them.

Fig. 10(b) and Fig. 10(c) give classification results using SVM and Wishart classifier. It can be viewed that vegetation, low-density urban and high-density urban are severely mixed and there are many isolated pixels in images. Fig. 10(d) shows the classification result obtained from MRF, where the confusion between low-density urban and high-density urban is not severe. In addition, misclassification occurs much slightly than the previous two methods, as MRF can consider the spatial information to obtain a smoother classification map. Nevertheless, due to limited discriminative features, it is difficult for non-deep methods to distinguish complex backscatters, especially for vegetation and urban areas.

Fig. 10(e)-(l) show the classification results of DL-based methods. From Figure 10(e)-(h), it is worth noting that RV-FCN outperforms than the other three methods, where boundaries between categories are much clearer. In addition, from Fig. 10(i)-(l), CV-FCN yields the optimal visual effect compared with other CV-NNs. All of the above analysis about NNs demonstrate the effectiveness of proposed FCN structure, which can capture more discriminative features and effectively incorporate more spatial information. Furthermore, from Fig. 10, CV-FCN achieves the best performance, which illustrates that both the FCN structure and the phase information have contributions to improve classification accuracies.

As Table VII shows, CV-FCN achieves the highest classification accuracy. The OA value of CV-FCN is about 3, 4 higher than CV-DCNN and CV-SCNN, respectively, which indicates that proposed FCN structure is suitable for PolSAR data. In addition, the results of CV-FCN are slightly better than RV-FCN. This confirms that the phase information plays an important role in the improvement of classification accuracy. Furthermore, CV-FCN yields the highest accuracies in all evaluation metrics, which is coincident with results in Fig. 10.

Iii-F3 Oberpfaffenhofen Dataset Result

For the Oberpfaffenhofen dataset, we also choose 1 of pixels with ground-truth class labels for training. Fig. 11 shows the visual classification results. The overall evaluation indices are given in Table VIII and Fig. 12 demonstrates the classification accuracies of every class obtained from different methods.

From Table VIII, CV-FCN achieves the best performance in terms of all metrics. The accuracies of non-deep methods are poor which are all below 85 in terms of OA. This might be a result of limited labeled pixels as prior information and little discriminative features. It can be also seen that CV-NNs have better performance than their RV counterparts. However, this superiority is not prominent. In terms of OA, CV-MLP, CV-SCNN, CV-DCNN, and CV-FCN are only 0.66, 0.17, 1.01, and 1.75 higher than RV-MLP, RV-SCNN, RV-DCNN, and RV-FCN, respectively.

Fig. 11(e)-(h) and Fig. 11(i)-(l) show classification results of RV-NNs and CV-NNs, respectively. As shown in Fig. 11(e)-(h), the classification result using RV-FCN is much clear than the other three, especially in the purple boxes which are noticeably closer to the ground truth map. This situation also occurs in the comparison among CV-NNs. Comparing all results shown in Fig. 11, for this dataset, the classification map of CV-FCN is the best close to the ground truth map.

As shown in Fig. 12, non-deep methods have poor abilities to distinguish built-up areas and wood land. That can also be observed in Fig. 11(b)-(d) where the misclassification in whole images is severe and all classification maps have many isolated pixels. However, the accuracies of wood land and open areas using DCNNs and FCNs are all over 95, which illustrates the discriminative feature learning ability of deep networks. In addition, CV-FCN is advantageous in terms of accuracies for all categories relative to other methods, which demonstrates its effectiveness in extracting more discriminative features. Overall, the above analyses exactly illustrate that CV-FCN can exhibit better contextual consistency and extracts more discriminative features for PolSAR image classification.

As the above comparisons demonstrate, the classification performance of CV-FCN exceeds other methods on all PolSAR datasets. On the one hand, CV-FCN improves the classification accuracy effectively compared to its RV counterpart (i.e., RV-FCN). Meanwhile, this conclusion is also established in other network structures, which confirms the validity of complex-valued features containing the phase information. On the other hand, compared with CNN structures, CV-FCN can perform more coherent labeling and show better robustness to the speckle noise, while resulting in smooth classification with precise location. This demonstrates the effectiveness of CV-FCN architecture in considering more spatial information and extracting more discriminative features for PolSAR image classification.

Iv Conclusion

In this paper, a novel complex-valued (CV) pixel-level model called CV-FCN has been proposed for PolSAR image classification, which obtains better performance compared with non-deep methods and other DL-based methods. This model integrates the feature extraction module and the classification module in a unified framework. For learning meaningful features faster, a new complex-valued weight initialization scheme is proposed to initialize CV-FCN. It greatly facilitates faster learning for this network and is beneficial to improve CV-FCN performance. Then, different-level and robust CV features that retain more discriminative information are extracted via CV-FCN. Particularly, a new complex upsampling scheme in CV-FCN is proposed to output CV predicted labeling. It also recovers rich spatial information with max locations maps to alleviate the problem of speckle noise. Furthermore, a novel average cross-entropy loss function is presented for more precise CV-FCN optimization. The proposed CV-FCN model can enable pixel-to-pixel classification results directly using the PolSAR CV data without any data projection. Moreover, it automatically learns a higher-level feature representation and fuses multi-level features for accurate categories identification. Experimental results on real benchmark PolSAR images show that CV-FCN achieves comparable or better results than the comparing models.

In the future, this work may be continued with the following ideas: 1) Some experiments demonstrated the effectiveness of the new complex-valued weight initialization scheme to initialize CV-FCN for PolSAR image classification. However, it still needs strong cues to prove the superiority and some visualization to observe the difference; 2) With the limitation of available PolSAR datasets and high-quality training datasets, training rather deep complex-valued networks devoted to PolSAR classification is very challenging, often yielding the risk of overfitting and model collapse. Moreover, data augmentation strategies for natural images are generally not suited for PolSAR images to enlarge training datasets because of the difference of imaging mechanisms. So it appears that an available data augmentation strategy is urgently necessary to tackle the above issues.

References

  • [1] J. S. Lee and E. Pottier, Polarimetric Radar Imaging From Basic to Application. New York, NY, USA: CRC Press, 2011.
  • [2] J. S. Lee, M. R. Grunes, and R. Kwok, “Classification of multi-look polarimetric SAR imagery based on the complex Wishart distribution,” Int. J. Remote Sens., vol. 15, no. 11, pp. 2299–2311, Jul. 1994.
  • [3] J. S. Lee, M. R. Grunes, E. Pottier, and L. Ferro-Famil, “Unsupervised terrain classification preserving polarimetric scattering characteristics,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 4, pp. 722–731, Apr. 2004.
  • [4] M. Dabboor, M. J. Collins, V. Karathanassi, and A. Braun, “An unsupervised classification approach for polarimetric SAR data based on the Chernoff distance for complex Wishart distribution,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4200–4213, Jul. 2013.
  • [5] S. R. Cloude and E. Pottier, “A review of target decomposition theorems in radar polarimetry,” IEEE Trans. Geosci. Remote Sens., vol. 34, no. 2, pp. 498–518, Mar. 1996.
  • [6] S. R. Cloude and E. Pottier, “An entropy based classification scheme for land applications of polarimetric SAR,” IEEE Trans. Geosci. Remote Sens., vol. 35, no. 1, pp. 68–78, Jan. 1997.
  • [7] W. An, Y. Cui, and J. Yang, “Three-component model-based decomposition for polarimetric SAR data,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 6, pp. 2732–2739, Jun. 2010.
  • [8] M. Arii, J. J. van Zyl, and Y. J. Kim, “Adaptive model-based decomposition of polarimetric SAR covariance matrices,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 3, pp. 1104–1113, Mar. 2011.
  • [9] Y. Wu, K. Ji, W. Yu, and Y. Su, “Region-based classification of polarimetric SAR images using Wishart MRF,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 4, pp. 668–672, Oct. 2008.
  • [10] G. Liu, M. Li, Y. Wu, L. Jia, H. Liu, and P. Zhang, “PolSAR image classification based on Wishart TMF with specific auxiliary field,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 7, pp. 1230–1234, Jul. 2014.
  • [11] W. Song, M. Li, P. Zhang, Y. Wu, et al., “Mixture WG-MRF model for PolSAR image classification,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 2, pp. 905-920, Feb. 2018.
  • [12] R. Hänsch and O. Hellwich, “Skipping the real world: Classification of PolSAR images without explicit feature extraction,” ISPRS Journal of Photogrammetry and Remote Sensing, vol 140, pp. 122 – 132, 2018.
  • [13] C. Lardeux, P. L. Frison, C. Tison, J. C. Souyris, B. Stoll, B. Fruneau, and J. P. Rudant, “Support vector machine for multifrequency SAR polarimetric data classification,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 12, pp. 4143–4152, Dec. 2009.
  • [14] C. He, S. Li, Z. Liao, and M. Liao, “Texture Classification of PolSAR Data Based on Sparse Coding of Wavelet Polarization Textons,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 8, pp. 4576-4590, Aug. 2013.
  • [15] X. Zhu, D. Tuia, L. Mou, G. Xia, L. Zhang, F. Xu, and F. Fraundorfer, “Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources,” IEEE Trans. Geosci. Remote Sens., vol. 5, no. 12, pp. 8-36, Oct. 2017.
  • [16] C. T. Chen, K. S. Chen, and J. S. Lee, “The use of fully polarimetric information for the fuzzy neural classification of SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 9, pp. 2089–2100, Sep. 2003.
  • [17] A. Ghosh, B. N. Subudhi, and L. Bruzzone, “Integration of Gibbs Markov random field and Hopfield-type neural networks for unsupervised change detection in remotely sensed multitemporal images,” IEEE Trans. Image Process., vol. 22, no. 8, pp. 3087–3096, Aug. 2013.
  • [18] J. Geng, H. Wang, J. Fan, and X. Ma, “Deep supervised and contractive neural network for SAR image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 4, pp. 2442–2459, Apr. 2017.
  • [19] M. Volpi and D. Tuia, “Dense semantic labeling of subdecimeter resolution images with convolutional neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 2, pp. 881–893, Feb. 2017.
  • [20] S. De, L. Bruzzone, A. Bhattacharya, F. Bovolo, and S. Chaudhuri, “A Novel Technique Based on Deep Learning and a Synthetic Target Database for Classification of Urban Areas in PolSAR Data,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 1, pp. 154-170, Jan. 2018.
  • [21] F. Liu, L. Jiao, B. Hou, and S. Yang, “POL-SAR image classification based on Wishart DBN and local spatial information,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6, pp. 3292–3308, Jun. 2016.
  • [22] Y. Zhou, H. Wang, F. Xu, and Y.-Q. Jin, “Polarimetric SAR image classification using deep convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 12, pp. 1935–1939, Dec. 2016.
  • [23] H. Bi, J. Sun, and Z. Xu, “A Graph-Based Semisupervised Deep Learning Model for PolSAR Image Classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 4, pp. 2116-2132, April 2019.
  • [24] Y. Wang, C. He, X. Liu, and M. Liao, “A Hierarchical Fully Convolutional Network Integrated with Sparse and Low-Rank Subspace Representations for PolSAR Imagery Classification,” Remote Sens., vol. 10, no. 2, pp. 342, Feb. 2018.
  • [25] X. Liu, L. Jiao, X. Tang, Q. Sun, and D. Zhang, “Polarimetric Convolutional Network for PolSAR Image Classification,” IEEE Trans. Geosci. Remote Sens., 2018.
  • [26] F. Mohammadimanesh, B. Salehi, M. Mahdianpari, E. Gill, and M.Molinier, “A new fully convolutional neural network for semantic segmentation of polarimetric SAR imagery in complex land cover ecosystem,” ISPRS Journal of Photogrammetry and Remote Sensing, vol 151, pp. 223-236, 2019.
  • [27] J. S. Lee, K. W. Hoppel, S. A. Mango, and A. R. Miller, “Intensity and phase statistics of multi-look polarimetric and interferometric SAR imagery,” IEEE Trans. Geosci. Remote Sensing, vol. 32, pp. 1017–1028, Sept. 1994.
  • [28] J. S. Lee, M. R. Grunes, and E. Pottier, “Quantitative comparison of classification capability: Fully polarimetric versus dual- and single polarization SAR,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 11, 2001.
  • [29] T. L. Ainsworth, J. P. Kelly, and J. S. Lee, “Classification comparisons between dual pol, compact polarimetric and quad-pol SAR imagery,” ISPRS J. Photogramm. Remote Sens., vol. 64, pp. 464–471, 2009.
  • [30] V. Turkar, R. Deo, Y. Rao, S. Mohan, and A. Das, “Classification accuracy of multi-frequency and multi-polarization SAR images for various land covers,” IEEE J. Sel. Topics. Appl. Earth Observ. Remote Sens., vol. 5, no. 3, pp. 936–941, Jun. 2012.
  • [31] R. Hänsch and O. Hellwich, “Classification of polarimetric SAR data by complex valued neural networks,” in Proc. ISPRS Hannover Workshop, High-Resolution Earth Imag. Geospatial Inf., vol. 38, no. Part 1, pp. 4–7, 2009.
  • [32] F. Shang and A. Hirose, “Quaternion neural-network-based PolSAR land classification in Poincare-sphere-parameter space,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 9, pp. 5693–5703, Sep. 2014.
  • [33] K. Kinugawa, F. Shang, N. Usami, and A. Hirose, “Isotropization of Quaternion-Neural-Network-Based PolSAR Adaptive Land Classification in Poincare-Sphere Parameter Space,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 8, pp. 1234-1238, Aug. 2018.
  • [34]

    H. Kim and A. Hirose, “Unsupervised Fine Land Classification Using Quaternion Autoencoder-Based Polarization Feature Extraction and Self-Organizing Mapping,”

    IEEE Trans. Geosci. Remote Sens., vol. 56, no. 3, pp. 1839-1851, March 2018.
  • [35] Z. Zhang, H. Wang, F. Xu, and Y.-Q. Jin, “Complex-valued convolutional neural network and its application in polarimetric sar image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 12, pp. 7177–7188, 2017.
  • [36] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
  • [37] G. Cheng, Y. Wang, S. Xu, H. Wang, S. Xiang, and C. Pan, “Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 6, pp. 3322–3337, 2017.
  • [38] G. Fu, C. Liu, R. Zhou, T. Sun, and Q. Zhang, “Classification for high resolution remote sensing imagery using a fully convolutional network,” Remote Sens., vol. 9, no. 6, p. 498, 2017.
  • [39] M.D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. 13th Eur. Conf. Comput. Vis. (ECCV), Zurich, Switzerland, ep. 2014; pp. 818–833.
  • [40] C. Trabelsi, O. Bilaniuk, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792, 2017.
  • [41]

    W. Wu, H. Li, L. Zhang, X. Li, H. Guo, “High-resolution PolSAR scene classification with pretrained deep convnets and manifold polarimetric parameters,”

    IEEE Trans. Geosci. Remote Sens., vol. 56, no. 10, pp. 6159-6168, 2018.
  • [42] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” arXiv preprint arXiv:1502.01852, 2015.
  • [43] Y. Li, Y. Chen, G. Liu, and L. Jiao, “A novel deep fully convolutional network for PolSAR image classification,” Remote Sens., vol. 10, no. 2, pp. 1984, Feb. 2018.
  • [44] W. Sun and R. Wang, “Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 3, pp. 474-478, Aug. 2018.