Deep Learning Meets SAR

06/17/2020 ∙ by Xiao Xiang Zhu, et al. ∙ DLR 0

Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in SAR data processing, despite successful first attempts, its huge potential remains locked. For example, to the best knowledge of the authors, there is no single example of deep learning in SAR that has been developed up to operational processing of big data or integrated into the production chain of any satellite mission. In this paper, we provide an introduction to the most relevant deep learning models and concepts, point out possible pitfalls by analyzing special characteristics of SAR data, review the state-of-the-art of deep learning applied to SAR in depth, summarize available benchmarks, and recommend some important future research directions. With this effort, we hope to stimulate more research in this interesting yet under-exploited research field.



There are no comments yet.


page 2

page 6

page 9

page 10

page 12

page 13

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Motivation

In recent years, deep learning [90]

has been developed at a dramatic pace, achieving great success in many fields. Unlike conventional algorithms, deep learning-based methods commonly employ hierarchical architectures, such as deep neural networks, to extract feature representations of raw data for numerous tasks. For instance, convolutional neural networks (CNNs) are capable of learning low- and high-level features from raw images with stacks of convolutional and pooling layers, and then applying the extracted features to various computer vision tasks, such as large-scale image recognition

[147], object detection [197], and semantic segmentation [58].

Inspired by numerous successful applications in the computer vision community, the use of deep learning in remote sensing is now obtaining wide attention [202]. As first attempts in SAR, deep learning-based methods have been adopted for a variety of tasks, including terrain surface classification [119], object detection [21], parameter inversion [164], despeckling [167], specific applications in InSAR [8], and SAR-optical data fusion [75].

For terrain surface classification from SAR and Polarimetric SAR (PolSAR) images, effective feature extraction is essential. These features are extracted based on expert domain knowledge and are usually applicable to a small number of cases and data sets. Deep learning feature extraction has however proved to overcome, to some degrees, both of the aforementioned issues

[119]. For SAR target detection, conventional approaches mainly rely on template matching, where specific templates are created manually [78]

to classify different categories, or through the use of traditional machine learning approaches, such as Support Vector Machines (SVMs)

[195, 16]; in contrast, modern deep learning algorithms aim at applying deep CNNs to extract discriminative features automatically for target recognition [21]

. For parameter inversion, deep learning models are employed to learn the latent mapping function from SAR images to estimated parameters, e.g., sea ice concentration

[164]. Regarding despeckling, conventional methods often rely on artificial filters and may suffer from mis-eliminating sharp features when denoising. Furthermore, the development of joint analysis of SAR and optical images has been motivated by the capacities of extracting features from both types of images. For applications in InSAR, only a few studies have been carried out such as the work described in [8]. However, these algorithms neglect the special characteristics of phase and simply use an out-of-the-box deep learning-based model.

Despite the first successes, and unlike the evaluation of optical data, the huge potential of deep learning in SAR and InSAR remains locked. For example, to the best knowledge of the authors, there is no single example of deep learning in SAR that has been developed up to operational processing of big data or integrated into the production chain of any satellite mission. This paper aims at stimulating more research in this interesting yet under-exploited research field.

In the remainder of this paper, Section II first introduces the most commonly used deep learning models in remote sensing. Section III describes the specific characteristics of SAR data that have to be taken into account to exploit the full potential of SAR combined with deep learning. Section IV details recent advances in the utilization of deep learning on different SAR applications, which were outlined earlier in the section. Section V reviews the existing benchmark data sets for different applications of SAR and their limitations. Finally, Section VI concludes current research, and gives an overview of promising future directions.

Ii Introduction to Relevant Deep Learning Models and concepts

Fig. 1: A Selection of relevant deep learning models. Sources of the images: VGG [43], ResNet [15], U-Net [118], LSTM [174], RNN [42], VAE [161], GAN [145], CGNN [205], RGNN [69], and DeepRL [206].

In this section, we briefly review relevant deep learning algorithms originally proposed for visual data processing that are widely used for the state-of-the-art research of deep learning in SAR. In addition, we mention the latest developments of deep learning, which are not yet widely applied to SAR but may help create next generation of its algorithms. Fig. 1 gives an overview of the deep learning models we discuss in this section.

Before discussing deep learning algorithms, we would like to stress that the importance of high-quality benchmark datasets in deep learning research cannot be overstated. Especially in supervised learning, the knowledge that can be learned by the model is bounded by

the information present in the training dataset. For example, the MNIST [92] dataset played a key role in Yann LeCun’s seminal paper about convolutional neural networks and gradient-based learning [91]. Similarly, there would be no AlexNet [87], the network that kick-started the current deep learning renaissance, without theImageNet [32] dataset, which contains over 14 million images and 22,000 classes. ImageNet has been such an important part of deep learning research that, even after over 10 years of being published, it is still used as a standard benchmark to evaluate the performance of CNNs for image classification.

Ii-a Deep Learning Models

The main principle of deep learning models is to encode input data into effective feature representations for target tasks. To examplify how a deep learning framework works, we take autoencoder as an example: it first maps an input data to a latent representation via a trainable nonlinear mapping and then reconstructs inputs through reverse mapping. The reconstruction error is usually defined as the Euclidian distance between inputs and reconstructed inputs. Parameters of autoencoders are optimized by gradient descent based optimizers, like stochastic gradient descent (SGD), RMSProp

[156] and ADAM [84]

, during the backpropagation step.

Ii-A1 Convolutional Neural Networks (CNN)

With the success of AlexNet in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC-2012), where it scored a top-5 test error of 15.3% compared to 26.2% of the second best, CNNs have attracted worldwide attention and are now used for many

image understanding tasks, such as image classification, object detection, and semantic segmentation. AlexNet consists of five convolutional layers, three max-pooling layers, and three fully-connected layers. One of the key innovations of the AlexNet was the use of GPUs, which made it possible to train such large networks with huge datasets without using supercomputers. In just two years, VGGNet

[147] overtook AlexNet in performance by achieving a 6.8% top-5 test error in ILSVRC-2014; the main difference was that it only used 3x3-sized convolutional kernels, which enabled it to have more number of channels and in turn capture more diverse features.

ResNet [62], U-Net [131], and DenseNet [71] were the next major CNN architectures. The main feature of all these architectures was the idea of connecting, not only neighboring layers but any two layers in the network, by using skip connections. This helped reduce loss of information across networks, mitigated the problem of vanishing gradients and allowed the design of deeper networks. U-Net is one of the most commonly used image segmentation networks. It has autoencoder based architecture where it uses skip connections to concatenate features from the first layer to last, second to second last, and so on: this way it can get fine-grained information from initial layers to the end layers. U-Net was initially proposed for medical image segmentation, where data labeling is a big problem. The authors used heavy data augmentation techniques on input data, making it possible to learn from only a few hundred annotated samples. In ResNet skip connections were used within individual blocks and not across the whole network. Since its initial proposal, it has seen many architectural tweaks, and even after 4-5 years its variants are always among the top scorers on ImageNet. In DenseNet all the layers were attached to all preceding layers, reducing the size of the network, albeit at the cost of memory usage. For a more detailed explanations of different CNN models, interested readers are referred to [66].

Ii-A2 Recurrent Neural Networks (RNN)

Besides CNNs, RNNs [121]

are another major class of deep networks. Their main building blocks are recurrent units, which take the current input and output of the previous state as input. They provide state-of-the-art results for processing data of variable lengths like text and time series data. Their weights can be replaced with convolutional kernels for visual processing tasks such as image captioning and predicting future frames/points in visual time-series data. Long short term memory (LSTM)

[64] is one of the most popular architectures of RNN: its cells can store values from any past instances while not being severely affected by the problem of gradient diminishing..

Ii-A3 GANs

Proposed by Ian Goodfellow et al. [56]

, GANs are among the most popular and exciting inventions in the field of deep learning. Based on game-theoretic principles, they consist of two networks called a generator and a discriminator. The generator’s objective is to learn a latent space, through which it can generate samples from the same distribution as the training data, while the discriminator tries to learn to distinguish if a sample is from the generator or training data. This very simple mechanism is responsible for most cutting-edge algorithms of various applications, e.g., generating artificial photo-realistic images/videos, super-resolution, and text to image synthesis.

Ii-B Supervised, Unsupervised and Reinforcement Learning

Ii-B1 Supervised Learning

Most of popular deep learning models fall under the category of supervised deep learning, i.e. they need labelled datasets to learn the objective functions. One of big challenges of supervised learning is generalization, i.e. how well a trained model performs on test data. Therefore it is vital that training data truly represents the true distribution of data so it can handle all the unseen data. If the model fits well on training data and fails on test data then it is called overfitting, in deep learning literature there are several techniques that can be used to avoid it, e.g. Dropout[150].

Ii-B2 Unsupervised Learning

Unsupervised learning refers to the class of algorithms where the training data do not contain labels. For instance,

in classical data analysis, principal component analysis (PCA)

[122] can be used to reduce the data dimension followed by a clustering algorithm to group similar data points. In deep learning generative models like autoencoders and variational autoencoders (VAEs) [85] and Generative Adversarial Networks (GANs) [56]

are some of popular techniques that can be used for unsupervised learning. Their primary goal is to generate output data from the same distribution as input data. Autoencoders consists of an encoder part which finds compressed latent representation of input and a decoder part which decodes that representation back to the original input. VAEs take autoencoders to the next level by learning the whole distribution instead of just a single representation at the end of the encoder part, which in turn can be used by the decoder to generate the whole distribution of outputs. The trick to learning this distribution is to also learn variance along with mean of latent representation at the encoder-decoder meeting point and add a KL-divergence-based loss term to the standard reconstruction loss function of the autoencoders.

Ii-B3 Deep Reinforcement Learning (DeepRL)

Reinforcement Learning (RL) tries to mimic the human learning behavior, i.e., taking actions and then adjusting them for the future according to feedback from the environment. For example, young children learn to repeat or not repeat their actions based on the reaction of their parents. The RL model consists of an environment with states, actions to transition between those states, and a reward system for ending up in different states. The objective of the algorithm is to learn the best actions for given states using a feedback reward system. In a classical RL algorithms function, approximators are used to calculate the probability of different actions in different states. DeepRL uses different types of neural networks to create these functions [107][101]. Recently DeepRL received particular attention and popularity due to the success of Google Deep Mind’s AlphaGo [146], which defeated the Go board game world champion. This task was considered impossible by computers just until a few years ago.

Ii-C Relevant Deep Learning Concepts

Ii-C1 Automatic Machine Learning (AutoML)

Deep networks have many hyperparameters to choose from, for example, number of layers, kernel sizes, type of optimizer, skip connections,

and the like. There are billions of possible combinations of these parameters and given high computational cost, time, and energy costs it is hard to find the best performing network even from among a few hundred candidates. In the case of deep learning, the objective of AutoML is mainly to find the most efficient and high performing deep network for a given dataset and task. The first major attempt in this field was by Zoph et al. [207], who used DeepRL to find the optimum CNN for image classification. In their system an RNN creates CNN architectures and, based on their classification results, proposes changes to them. This process continues to loop until the optimum architecture is found. This algorithm was able to find competing networks compared to the state-of-the-art but took over 800 GPUs, which was unrealistic for practical application. Recently, there have been many new developments in the AutoML field, which have made it possible to perform such tasks in more intelligent and efficient ways. More details about the field of network architectural search can be found in [41].

Ii-C2 Geometric Deep Learning – Graph Neural Networks (GNNs)

Except for well-structured image data, there is a large amount of unstructured data, e.g., knowledge graphs and social networks, in real life that cannot be directly processed by a deep CNN. Usually, these data are represented in the form of graphs, where each node represents an entity and edges delineate their mutual relations. To learn from unstructured data, geometric deep learning has been attracting an increasing attention, and a most-commonly used architecture is GNN, which is also proven successful in dealing with structured data. Specifically, Using the terminology of graphs, nodes of a graph can be regarded as feature descriptions of entities, and their edges are established by measuring their relations or distances and encoded in an adjacency matrix. Once a graph is constructed, messages can be propagated among each node by simply performing matrix multiplication. Followingly,

[86] proposed Graph Convolutional Networks (GCNs) characterized by utilizing graph convolutions, and [45] fasten the process. Moreover recurrent units in RGNNs (Recurrent Graph Neural Network) [70] [144] have also been proven to obtain achievements in learning from graphs.

Iii Possible Pitfalls

To develop tailored deep learning architectures and prepare suitable training datasets for SAR or InSAR tasks, it is important to understand that SAR data is different from optical remote sensing data, not to mention images downloaded from the internet. In this section, we discuss the special characteristics (and possible pitfalls) encountered while applying deep learning to SAR.

What makes SAR data and SAR data processing by neural networks unique? SAR data are substantially different from optical imagery in many respects. These are a few points to be considered when transferring CNN experience and expertise from optical to SAR data:

  • Dynamic Range. Depending on their spatial resolution, the dynamic range of SAR images can be up to 90 dB (TerraSAR-X high resolution spotlight data with a resolution of about 1 m). Moreover, the distribution is extremely asymmetric, with the majority of pixels in the low amplitude range (distributed scatterers) and a long tail representing bright discrete scatterers, in particular in urban areas. Standard CNNs are not able to handle such dynamic ranges and, hence, most approaches feature dynamic compression as a preprocessing step. In [154], the authors first take only amplitude values from 0 to 255 and then subtract mean values of each image. In [75, 111], normalization is performed as a pre-processing step, which compresses the dynamic range significantly.

  • Signal Statistics. In order to retrieve features from SAR (amplitude or intensity) images the speckle statistics must be considered. Speckle is a multiplicative, rather than an additive, phenomenon. This has consequences: While the optimum estimator of radar brightness of a homogeneous image patch under speckle is a simple moving averaging operation (i.e., a convolution, like in the additive noise case), other optimum detectors of edges and low-level features under additive Gaussian noise may no longer be optimum in the case of SAR. A popular example is Touzi’s CFAR edge detector [157] for SAR images, which uses the ratio of two spatial averages over adjacent windows. This operation cannot be emulated by the first layer of a standard CNN.

    Some studies use a logarithmic mapping of the SAR images prior to feeding them into a CNN [23, 167]

    . This turns speckle into an additive random variable and —as a side effect —reduces dynamic range. But still, a single convolutional layer can only emulate approximations to optimum SAR feature estimators. It could be valuable to supplement the original log-SAR image by a few lowpass filtered and logarithmized versions as input to the CNN. Another approach is to apply some sophisticated speckle reduction filter before entering the CNN, e.g., non-local averaging

    [143, 201, 33].

  • Imaging Geometry. The SAR image coordinates range and azimuth are not arbitrary coordinates like East and North or and , but rather reflect the peculiarities of the image generation process. Layover always occurs at near range shadow always at far range of an object. That means, that data augmentation by rotation of SAR images would lead to nonsense imagery that would never be generated by a SAR.

  • The Complex Nature of SAR Data.

    The most valuable information of SAR data lies in its phase. This applies for SAR image formation, which takes place in the complex signal domain, as well as for polarimetric, interferometric (InSAR), and tomographic SAR data processing. This means that the entire CNN must be able to handle complex numbers. For the convolution operation this is trivial. The nonlinear activation function and the loss function, however, require thorough consideration. Depending on whether the activation function acts on the real and imaginary parts of the signal independently, or only on its magnitude, and where bias is added, phase will be distorted to different degrees.

    If we use polarimetric SAR data for land cover or target classification, a nonlinear processing of the phase is even desirable, because the phase between different polarimetric channels has physical meaning and, hence, contributes to the classification process.

    In SAR interferometry and tomography, however, the absolute phase has no meaning, i.e., the CNN must be invariant to an arbitrary phase offset. Assume some interferometric input signal to a CNN and the output signal with phase


    Any constant phase offset does not change the meaning of the interferogram. Hence, we require an invariance that we refer to as ”phase linearity” (valid at least in the expectation):


    This linearity is violated, for example, if the activation function is applied to real and imaginary parts separately, or if a bias is added to the complex numbers.

    Another point to consider in regression-type InSAR CNN processing (e.g., for noise reduction) is the loss function. If the quantity of interest is not the complex number itself, but its phase, the loss function must be able to handle the cyclic nature of phases. It may also be advantageous that the loss function is independent—at least to a certain degree —of the signal magnitude to relieve the CNN from modelling the magnitude. A loss function that meets these requirements is, for example,


    where is the reference signal.

    Some authors use magnitude and phase, rather than real and imaginary parts, as input to the CNN. This approach is not invariant to phase offset, either. The interpretation of a phase function as a real-valued function forces the CNN to disregard the sharp discontinuities at the -transitions, whose positions are inconsequential. A standard CNN would pounce on these, interpreting them as edges.

  • Simulation-based Training and Validation Data? The prevailing lack of ground-truth for regression-type tasks, like speckle reduction or inSAR denoising, might tempt us to use simulated SAR data for training and validation of neural networks. However, this bears the risk that our networks will learn models that are far too simplified. Unlike in the optical imaging field, where highly realistic scenes can be simulated, e.g. by PC games, the simulation of SAR data is more a scientific topic without the power of commercial companies and a huge market. SAR simulators focus on specific scenarios, e.g. vegetation (only distributed scatterers considered) or persistent (point) scatterers. The most advanced simulators are probably the ones for computing radar backscatter signatures of single military objects, like vessels. To our knowledge though there is no simulator available that can , e.g., generate realistic interferometric data of rugged terrain with layover, spatially varying coherence, and diverse scattering mechanisms. Often simplified scattering assumptions are made, e.g. that speckle is multiplicative. Even this is not true; pure Gaussian scattering can only be found for quite homogeneous surfaces and low resolution SARs. As soon as the resolution increases chances for a few dominating scatterers in a resolution cell increase as well and the statistics become substantially different from the one of fully developed speckle

Iv Recent Advances in Deep learning applied to SAR

In this section, we provide an in-depth review of deep learning methods applied to SAR data from six perspectives: terrain surface classification, object detection, parameter inversion, despeckling, SAR Interferometry (InSAR), and SAR-optical data fusion. For each application, notable developments are stated in the chronological order, and their advantages and disadvantages are reported. Finally, each subsection is concluded with a brief summary.

Iv-a Terrain Surface Classification

As an important direction of SAR applications, terrain surface classification using PolSAR images is rapidly advancing with the help of deep learning. Regarding feature extraction, most conventional methods rely on exploring physical scattering properties [109] and texture information [60] in SAR images. However, these features are mainly human-designed based on specific problems and characteristics of data sources. Compared to conventional methods, deep learning is superior in terrain surface classification due to its capability of automatically learning discriminative features. Moreover, deep learning approaches, such as CNNs, can effectively extract not only polarimetric characteristics but also spatial patterns of PolSAR images [119]. Some of the most notable deep learning techniques for PolSAR image classification are reviewed in the following.

Xie et al. [177]

first applied deep learning to terrain surface classification using PolSAR images. They employed a stacked auto encoder (SAE) to automatically learn deep features from PolSAR data and then fed them to a softmax classifier. Remarkable improvements in both classification accuracy and visual effect proved that this method can effectively learn a comprehensive feature representation for classification purposes.

Fig. 2: Classification maps obtained from a TerraSAR-X image of a small area in Norway [53]

. Subfigures (a)-(f) depict the results of classification using SVM (accuracy = 78.42%), sparse representation classifier (SRC) (accuracy = 85.61%), random forest (accuracy = 82.20%)

[160], SAE (accuracy = 87.26%) [177], DCAE (accuracy = 94.57%) [52], contractive AE (accuracy = 88.74). Subfigures (g)-(i) show the combination of DSCNN with SVM (accuracy = 96.98%), with SRC (accuracy = 92.51%) [68], and with random forest (accuracy = 96.87%). Subfigures (j) and (k) represent the classification results of DSCNN (accuracy = 97.09%) and DSCNN followed by spatial regularization (accuracy = 97.53%), which achieve higher accuracy than the other methods.

Instead of simply applying SAE, Geng et al. [52] proposed a deep convolutional autoencoder (DCAE) for automatically extracting features and performing classification. The first layer of DCAE is a hand-crafted convolutional layer, where filters are pre-defined, such as gray-level co-occurrence matrices and Gabor filters. The second layer of DCAE performs a scale transformation, which integrates correlated neighbor pixels to reduce speckle. Following these two hand-crafted layers, a trained SAE, which is similar to [177], is attached for learning more abstract features. Tested on high-resolution single-polarization TerraSAR-X images, the method achieved remarkable classification accuracy.

Based on DCAE, Geng et al. [53] proposed a framework, called deep supervised and contractive neural network (DSCNN), for SAR image classification, which introduces histogram of oriented gradient (HOG) descriptors. In addition, a supervised penalty is designed to capture relevant information between features and labels, and a contractive restriction, which can enhance local invariance, is employed in the following trainable autoencoder layers. An example of applying DSCNN on TerraSAR-X data from a small area in Norway is seen in Fig. 2. Compared to other algorithms, the capability of DSCNN to achieve a highly accurate and noise free classification map is observed.

Fig. 3: The architecture of the dual-branch deep convolution neural network (Dual-CNN) for PolSAR image classification, proposed in [48].

In addition to the aforementioned methods, many studies integrate SAE models with conventional classification algorithms for terrain surface classification. Hou et al. [67] proposed an SAE combined with superpixel for PolSAR image classification. Multiple layers of the SAE are trained on a pixel-by-pixel basis. Superpixels are formed based on Pauli-decomposed pseudo-color images. Outputs of the SAE are used as features in the final step of k-nearest neighbor clustering of superpixels. Zhang et al. [190] applied stacked sparse AE to PolSAR image classification by taking into account local spatial information. Qin et al. [125] applied adaptive boosting of RBMs to PolSAR image classification. Zhao et al. [196]

proposed a discriminant DBN (DisDBN) for SAR image classification, in which discriminant features are learned by combining ensemble learning with a deep belief network in an unsupervised manner. Moreover, taking into account that most current deep learning methods aim at exploiting features either from polarization information or spatial information of PolSAR images, Gao et al.


proposed a dual-branch CNN to learn features from both perspectives for terrain surface classification. This method is built on two feature extraction channels: one to extract polarization features from the 6-channel real matrix, and the other to extract spatial features of a Pauli decomposition. Next the extracted features are combined using two parallel fully connected layers, and finally fed to a softmax layer for classification.

The detailed architecture of this network is illustrated in Fig. 3.

Different variations of CNNs have been used for terrain surface classification as well. In [199], Zhou et al. first extracted a 6-channel covariance matrix and then fed it to a trainable CNN for PolSAR image classification. Wang et al. [169] proposed a fully convolutional network integrated with sparse and low-rank subspace representations for classifying PolSAR images. Chen et al. [19] improved CNN performances by incorporating expert knowledge of target scattering mechanism interpretation and polarimetric feature mining. In a more recent work [61], He et al. proposed the combination of features learned from nonlinear manifold embedding and applying a fully convolutional network (FCN) on input PolSAR images; the final classification was carried out in an ensemble approach by SVM. In [36], the authors focused on the computational efficiency of deep learning methods, proposing the use of lightweight 3D CNNs. They showed that classification accuracy comparable to other CNN methods was achievable while significantly reducing the number of learned parameters and therefore gaining computational efficiency.

Apart from these single-image classification schemes using CNN, the use of time series of SAR images for crop classification has been shown in [114, 155]

. The authors of both papers experimented with using Recurrent Neural Network (RNN)-based architectures to exploit the temporal dependency of multi-temporal SAR images to improve classification accuracy. A unique approach for tackling PolSAR classification was recently proposed in

[37], where for the first time the authors utilized an AutoML technique to find the optimum CNN architecture for each dataset. The approach takes into account the complex nature of PolSAR images, is cost effective, and achieves high classification accuracy [37].

Most of the aforementioned methods rely primarily on preprocessing or transforming raw complex-valued data into features in the real domain and then inputting them in a common CNN, which constrains the possibility of directly learning features from raw data. To tackle this problem, Zhang et al. [192] proposed a novel complex-valued CNN (CV-CNN) specifically designed to process complex values in PolSAR data, i.e., the off-diagonal elements of a coherency or covariance matrix. The CV-CNN not only takes complex numbers as input but also employs complex weights and complex operations throughout different layers. A complex-valued backpropagation algorithm is also developed for CV-CNN training. Other notable complex-valued deep learning approaches for classification using PolSAR images can be found in [113, 97, 178].

Although not completely related to terrain surface classification, it is also worth mentioning that the combination of SAR and PolSAR images with feed-forward neural networks has been extensively used for sea ice classification. This topic is not treated any further in this section and the interested reader is referred to consult

[128, 130, 129, 148, 186] for more information. Similar to the polarimetric signature, InSAR coherence provides information about physical scattering properties. In [104] interferometric volume decorrelation is used as a feature for forest/non-forest mapping together with radar backscatter and incidence angle. The authors used bistatic TanDEM-X data where temporal decorrelation can be neglected. They compared different architectures and concluded that CNNs outperform random forest and U-Net proved best for this segmentation task.

To summarize, it is apparent that deep learning-based SAR and PolSAR classification algorithms have advanced considerably in the past few years. Although at first the focus was based on low-rank representation learning using SAE [177] and its modifications [52], later research focused on a multitude of issues relevant to SAR imagery, such as taking into account speckle [52, 53] preserving spatial structures [48] and their complex nature [192, 113, 97]

. It can also be seen that the challenge of the scarcity of labeled data has driven researchers to use semi-supervised learning algorithms

[178]. Finally, one of machine learning’s important fields, AutoML, a field that had not been exploited extensively by the remote sensing community, has found its application for PolSAR image classification [37].

Fig. 4: The flowchart of the multi-aspect-aware bi-directional approach for SAR ATR proposed in [188].

Iv-B Object Detection

Although various characteristics distinguish SAR images from optical RGB images, the SAR object detection problem is still analogous to optical image classification and segmentation in the sense that feature extraction from raw data is always the prior and crucial step. Hence, given success in the optical domain, there is no doubt that deep learning is one of the most promising ways to develop the state-of-the-art SAR object detection algorithms.

The majority of earlier works on SAR object detection using deep learning consists of taking successful deep learning methods for optical object detection and applying them with minor tweaks to military vehicle detection (MSTAR dataset; see subsection V-C) or ship detection on custom datasets. Even small-sized networks are easily able to achieve more than 90% test accuracy on most of these tasks.

The first attempt in military vehicle detection can be found in [21], where Chen et al. used an unsupervised sparse autoencoder to generate convolution kernels from random patches of a given input for a single-layer CNN, which generated features to train a softmax classifier for classifying military targets in the MSTAR dataset [83]. The experiments in [21] showed great potential for applying CNNs to SAR target recognition. With this discovery, Chen et al. [20] proposed A-ConvNets, a simple 5-layer CNN that was able to achieve state-of-the-art accuracy of about 99% on the MSTAR dataset.

Following this trend, more and more authors applied CNNs to the MSTAR dataset [110, 35, 38]. Morgan [110] successfully applied a modestly sized 3-layered CNN on MSTAR and building upon it Wilmanski et al. [175] investigated the effects of initialization and optimizer selection on final results. Ding et al. [35] investigated the capabilities of a CNN model combined with domain-specific data augmentation techniques (e.g., pose synthesis and speckle adding) in SAR object detection. Furthermore, Du et al. [38] proposed a displacement- and rotation-insensitive CNN, and claimed that data augmentation on training samples is necessary and critical in the pre-processing stage.

On the same dataset, instead of treating CNN as an end-to-end model, Wagner [163] and similarly Gao [50] integrated CNN and SVM, by first using a CNN to extract features, and then feeding them to an SVM for final prediction. Specifically, Gao et al. [49] added a class of separation information to the cross-entropy cost function as a regularization term, which they show explicitly facilitates intra-class compactness and separtability, in turn improving the quality of extracted features. More recently, Furukawa [47] proposed VersNet, an encoder-decoder style segmentation network, to not only identify but also localize multiple objects in an input SAR image. Moreover, Zhang et al. [188] proposed an approach based on multi-aspect image sequences as a pre-processing step. In the contribution, they are taking into account backscattering signals from different viewing geometries, following feature extraction using Gabor filters, dimensionallity reduction and eventually feeding the results to a Bidirectional LSTM model for joint recognition of targets. The flowchart of this SAR ATR framework is illustrated in Fig. 4.

Besides truck detection, ship detection is another tackled SAR object detection task. Early studies on applying deep learning models to ship detection [25, 140, 13, 116, 99] mainly consist of two stages: first cropping patches from the whole SAR image and then identifying whether cropped patches belong to target objects using a CNN. Because of fixed patch sizes these methods were not robust enough to cater for variations in ship geometry, like size and shape. This problem was overcome by using region-based CNNs [55, 127], with creative use of skip connections and feature fusion techniques in later literature. For example, Li et al. [96] fuses features of the last three convolution layers before feeding them to a region proposal network (RPN). Kang et al. [80] proposed a contextual region based network that fuses features from different levels. Meanwhile, to make the most use of features of different resolution, Jiao et al. [79] densely connected each layer to its subsequent layers and fed features from all layers to separate RPN to generat proposals; in the end the best proposal was chosen based on an intersection-over-union score.

In more recent works on SAR object detection, scientists have tried to explore many other interesting ideas to complement current works. Dechesne et al. [27] proposed a multitask network that simultaneously learned to detect, classify, and estimate the length of ships. Mullissa et al. [112] showed that CNNs can be trained directly on Complex-Valued SAR data; Kazemi et al. [81] performed object classification using an RNN based architecture directly on received SAR signal instead of processed SAR images; and Rostami et al. [133] and Huang et al. [73]

explored knowledge transfer or transfer learning from other domains to the SAR domain for SAR object detection.

Fig. 5: Very high resolution TerraSAR-X image of Berlin (left), and the predicted building mask [141] (right).

Perhaps one of the more interesting recent works in this application area is building detection by Shahzad et al. [141]. They tackle the problem of Very High Resolution (VHR) SAR building detection using a FCN [100] architecture for feature extraction, followed by CRF-RNN [198], which helps give similar weights to neighboring pixels. This architecture produced building segmentation masks with up to 93% accuracy. An example of the detected buildings can be seen in Fig. 5, where the left subfigure is the amplitude of the input TerraSAR-X image of Berlin, and the right subfigure is the predicted building mask. Another major contribution made in that paper addresses the problem of lack of training data by introducing an automatic annotation technique, which annotates the TomoSAR data using Open Street Map (OSM) data.

In summary, deep learning faces challenges on two fronts when applied to SAR object detection tasks. The first is the challenge of identifying characteristics of SAR imagery like imaging geometry, size of objects, and speckle noise. The second and bigger challenge is the lack of good quality standardized datasets. As we observed, the most popular dataset, MSTAR, is too easy for deep nets and for ship detection, majority of authors created their own datasets, which makes it very hard to judge the quality of the proposed algorithms and even harder to compare different algorithms.

Iv-C Parameter Inversion

Parameter inversion from SAR images is a challenging field in SAR applications. As one important branch, ice concentration estimation is now attracting great attention due to its importance to ice monitoring and climate research [126]. Since there are complex interactions between SAR signals and sea ice [34], empirical algorithms face difficulties with interpreting SAR images for accurate ice concentration estimation.

Wang et al. [164] resorted to a CNN for generating ice concentration maps from dual polarized SAR images. Their method takes image patches of the intensity-scaled dual band SAR images as inputs, and outputs ice concentration directly. In [165, 166], Wang et al. employed various CNN models to estimate ice concentration from SAR images during the melt season. Labels are produced by ice experts via visual interpretation. The algorithm was tested on dual-pol RadarSat-2 data. Since the problem considered is the regression of a continuous value, mean squared error is selected as the loss function. Experimental results demonstrate that CNNs can offer a more accurate result than comparative operational products.

Fig. 6: The Architecture of CNN for SAR image despeckling [23].
Fig. 7: The comparison of speckle reduction among SAR-BM3D [120], SAR-CNN [23], and CNN-NLM applied to a small strip of COSMO-SkyMed data over Caserta, Italy, where the reference clean image has been obtained by temporal multi-looking applied to a stack of SAR images [26].

In a different application, Song et al. used a deep CNN, including five pairs of convolutional and max pooling layers followed by two fully connected layers for inverting rough surface parameters from SAR images [149]. The training of the network was based on simulated data solely due to the scarcity of real training data. The method was able to invert the desired parameters with a reasonable accuracy and the authors showed that training a CNN for parameter inversion purposes could be done quite efficiently. Furthermore, Zhao et al. [193] designed a complex-valued CNN to directly learn physical scattering signatures from PolSAR images. The authors have notably proposed a framework to automatically generate labeled data, which led to an unsupervised learning algorithm for the aforementioned parameter inversion.

On the whole, deep learning-based parameter estimation for SAR applications has not yet been fully exploited. Unfortunately, most of the focus of the remote sensing community has been devoted to classical problems, which overlap with computer vision tasks such as classification, object detection, segmentation, and denoising. We hope that in the future more studies will be carried out to employ deep learning methods for geophysical and other parameter inversion tasks using SAR data.

Iv-D Despeckling

Speckle, caused by the coherent interaction among scattered signals from sub-resolution objects, often makes processing and interpretation of SAR images difficult. Therefore, despeckling is a crucial procedure before applying SAR images to various tasks. Conventional methods aim at removing speckle either spatially, where local spatial filters, such as the Lee filter [93], Kuan filter [88], and Frost filter [45], are employed, or using wavelet-based methods [176, 9, 4]. For a full overview of these techniques, the reader is referred to [10]. In the past decade, patch-based methods for speckle reduction have gained high popularity due to their ability to preserve spatial features while not sacrificing image resolution [159]. Deledalle et al. [29] proposed one of the first nonlocal patch-based methods applied to speckle reduction by taking into account the statistical properties of speckle combined with the original nonlocal image denoising algorithm introduced in [17]. A vast number of variations of the nonlocal method for SAR despeckling has been proposed, with the most notable ones included in [179, 30]. However, on one hand, manual selection of appropriate parameters for conventional algorithms is not easy and is sensitive to reference images. On the other hand, it is difficult to achieve a balance between preserving distinct image features and removing artifacts with empirical despeckling methods. To solve these limitations, methods based on deep learning have been developed.

Inspired by the success of image denoising using a residual learning network architecture in the computer vision community [189], Chierchia et al. [23] first introduced a residual learning CNN for SAR image despeckling by presenting a 17-layered CNN for learning to subtract speckle components from noisy images. Considering that speckle noise is assumed to be multiplicative, the homomorphic approach with coupled log- and exp-transformations is performed before and after feeding images to the network. In this case, multiplicative speckle noise is transformed into an additive form and can be recovered by residual learning, where log-speckle noise is regarded as residual. As shown in Fig. 6, an input log-noisy image is mapped identically to a fusion layer via a shortcut connection, and then added element-wise with the learned residual image to produce a log-clean image. Afterwards, denoised images can be obtained by an exp-transformation. Wang et al. [167] proposed a CNN, called ID-CNN, for image despeckling, which can directly learn denoised images via a component-wise division-residual layer with skip connections. In another words, homomorphic processing is not introduced for transforming multiplicative noise into additive noise and at a final stage the noisy image is divided by the learned noise to yield the clean image.

As a step forward with respect to the two aforementioned residual-based learning methods, Zhang et al. [191] employed a dilated residual network, SAR-DRN, instead of simply stacking convolutional layers. Unlike [23] and similar to [167], SAR-DRN is trained in an end-to-end fashion using a combination of dilated convolutions and skip connections with a residual learning structure, which indicates that prior knowledge such as a noise description model is not required in the workflow.

In [185]

, Yue et al. proposed a novel deep neural network architecture specifically designed for SAR despeckling. It used a convolutional neural network to extract image features and reconstruct a discrete RCS probability density function (PDF). It is trained by a hybrid loss function which measures the distance between the actual SAR image intensity PDF and the estimated one that is derived from convolution between the reconstructed RCS PDF and prior speckle PDF. Experimental results demonstrated that the proposed despeckling neural network can achieve comparable performance as non-learning state-of-the-art methods. In

[154], the problem of despeckling was tackled by a time series of images. Using a stack of images for despeckling is not unique to deep learning-based methods, as has been recently demonstrated in [12] as well. In [154]

the authors utilized a multi-layer perceptron with several hidden layers to learn non-linear intensity characteristics of training image patches. This approach has shown promising results and reported comparative performance with the state-of-the-art despeckling algorithms.

Again using single images instead of time series, in [89] the authors proposed a deep encoder–decoder CNN architecture with focus on feature preservation, which is a weakness of CNNs. They modified U-Net [\@@cite[cite]{[\@@bibref{}{ronneberger2015u}{}{}]}] in order to accommodate speckle statistical features. Another notable CNN approach was introduced in [26], where the authors used a nonlocal structure, while the weights for pixel-wise similarity measures were assigned using a CNN. The results of this approach, called CNN-NLM, are reported in Fig. 7, where the superiority of the method with respect to both feature preservation and speckle reduction is clearly observed.

From the deep learning-based despeckling methods reviewed in this subsection, it can be observed that most methods employ CNN-based architectures with single images of the scene for training; they either output the clean image in an end-to-end fashion or propose residual-based techniques to learn the underlying noise model. With the availability of large archives of time series thanks to the Sentinel-1 mission, an interesting direction is to exploit the temporal correlation of speckle characteristics for despeckling applications. Another problem in supervised deep learning-based despeckling techniques is the lack of ground truth data. In many studies, the training data set is built by corrupting optical images by multiplicative noise. This is far from realistic for despeckling applied to real SAR data. Therefore, despeckling in an unsupervised manner would be highly desirable and worth attention.


Interferometric SAR (InSAR) is one of the most important SAR techniques, and is widely used in reconstructing the topography of the Earth’s surface, i.e., digital elevation model (DEM) generation [187, 2, 109], and detecting topographical displacements, e.g., monitoring volcanic eruptions [102, 134, 158], earthquakes [103, 123], land subsidence [1], and urban areas using time series methods [203, 54, 108].

The principle of InSAR is to first measure the interferometric phase between signals received by two antennas located at different positions and then extract topographic information from the obtained interferogram by unwrapping and converting the absolute phase to height. However, an actual interferogram often suffers from a large number of singular points, which originate from the interference distortion and noise in radar measurements. These points result in unwrapping errors and consequently low quality DEMs. To tackle this problem, Ichikawa and Hirose [77] applied a complex-valued neural network, CVNN, in the spectral domain to restore singular points. With the help of the Complex Markov Random Field (CMRF) filter [181], they aimed at learning ideal relationships between the spectrum of neighboring pixels and that of center pixels via a one-hidden-layer CVNN. Notably, center pixels of each training sample are supposed to be ideal points, which indicate that singular points are not fed to the network during the training procedure. Similarly, Oyama and Hirose [117] restored singular points with a CVNN in the spectrum domain.

Fig. 8: The workflow of volcano deformation detection proposed in [162]. The CNN is trained on simulated data and is later used to detect phase gradients and a decorrelation mask from input wrapped interferograms to locate ground deformation caused by volcanoes.

Related to topography extraction, Costante et al. [24] proposed a fully CNN Encoder-Decoder architecture for estimating DEM from single-pass image acquisitions. It is demonstrated that this model is capable of extracting high-level features from input radar images using an encoder section and then reconstructing full resolution DEM via a decoder section. Moreover, the network can potentially solve the layover phenomenon in one single-look SAR image with contextual features.

In addition to reconstructing DEMs, Schwegmann et al. [139] presented a CNN-based technique to detect subsidence deformations from interferograms. They employed a 9-layer network to extract salient information in interferograms and displacement maps for discriminating deformation targets from deformation-like targets. Furthermore, Anantrasirichai et al. [8, 5, 6] used a pre-trained CNN to automatically detect volcanic ground deformation from InSAR images. They divided each image into patches, and relabeled them with binary labels, i.e., background and volcano, and finally fed them to the network to predict volcano deformation. They further improved their method to be able to detect slow-moving volcanoes by using a time series of interferograms in [7]. In another study related to automatic volcanic deformation detection, Valade et al. [162] designed and trained a CNN from scratch to learn a decorrelation mask from input wrapped interferograms, which then was used to detect volcanic ground deformation. The flowchart of this approach can be seen in Fig. 8. The training in both of the aforementioned works [7, 162] was based on simulated data. Another geophysically motivated example of using deep learning on InSAR data, which was actually proposed earlier than the above-mentioned CNN-based studies, was seen in [28, 124, 152], where the authors used simple feed-forward shallow neural networks for seismic event characterization and automatic seismic source parameter inversion by exploiting the power of neural networks in solving non-linear problems.

Fig. 9: Randomly selected patches obtained from the testing phase of the network for SAR-optical image patch correspondence detection proposed in [111].

In summary, it can be concluded that the use of deep learning methods in InSAR is still at a very early stage. Although deep learning has been used in different applications combined with InSAR, the full potential of interferograms is not yet fully exploited except in the pioneering work of Hirose [63]. Many applications treat interferograms or deformation maps obtained from interferograms as images similar to RGB or gray-scale ones and therefore the complex nature of interferograms has remained unnoticed. Apart from this issue, like the SAR despeckling problem using deep learning, lack of ground truth data for either detection or image restoration problems is a motivation to focus on developing semi-supervised and unsupervised algorithms that combine deep learning and InSAR.

Iv-F SAR-Optical Data fusion

The fusion of SAR and optical images can provide complementary information about targets. However, considering the two different sensing modalities, prior identification and co-registration of corresponding images are challenging [138], but compulsory for joint applications of SAR and optical images. For the purpose of identifying and matching SAR and optical images, many current methods resort to deep learning, given its powerful capabilities of extracting effective features from complex images.

In [111], the authors proposed a CNN for identifying corresponding image patches of very high resolution (VHR) optical and SAR imagery of complex urban scenes. Their network consists of two streams: one designed for extracting features from optical images, the other responsible for learning features from SAR images. Next the extracted features are fused via a concatenation layer for further binary prediction of their correspondence. A selection of True Positives, False Positives, False Negatives, and True Negatives of SAR-optical image patches from [111] can be seen in Fig. 9. Similarly, Hughes et al. [75] proposed a pseudo-Siamese CNN for learning a multi-sensor correspondence predictor for SAR and optical image patches. Notably, both networks in [111, 75] are trained and validated on the SARptical dataset [170, 172], which is specifically built for joint analysis of VHR SAR and optical images in dense urban areas.

In [168], the authors proposed a deep learning framework that can learn an end-to-end mapping between image patch pairs and their matching labels. An image pair is first transformed into two 1-D vectors and then concatenated to build a large 1-D vector as the input of the network. Then hidden layers are stacked for learning the mapping between input vectors and output binary labels, which indicate their correspondence.

For the purpose of matching SAR and optical images, Merkle et al. [106] presented a CNN that comprises of a feature extraction stage (Siamese network) and a similarity measure stage (dot product layer). Specifically, features of input optical and SAR images are extracted via two separate 9-layer branches and then fed to a dot product layer for predicting the shift of the optical image within the large SAR reference patch. Experimental results indicate that this deep learning-based method outperforms state-of-the-art matching approaches [153, 31]. Furthermore, Abulkhanov et al. [3] successfully trained a neural network to build feature point descriptors to identify corresponding patches among SAR and optical images and match the detected descriptors using the RANSAC algorithm [44].

In contrast to training a model to identify corresponding image patches, Merkle et al. [105] first employed a conditional generative adversarial network (cGAN) to generate artificial SAR-like images from optical images, then matched them with real SAR images. The authors demonstrate that the matching accuracy and precision are both improved with the proposed strategy. Inspired by their study, more researchers resorted to using GANs for the purpose of SAR-optical image matching (see [76, 46] for a review).

With respect to applications of SAR and optical image matching, Yao et al. [183] aimed at applying SAR and optical images to semantic segmentation with deep neural networks. They collected corresponding optical patches from Google Earth according to TerraSAR-X patches and built ground truths using data from OpenStreetMap. Then SAR and optical images were separately fed to different CNNs to predict semantic labels (building, natural, land use, and water). Despite their experimental results not outperforming the state of the art by the time [11] likely because of network design or training strategy, they deduced that introducing advanced models and simultaneously using both data sources can greatly improve the performance of semantic segmentation. Another application mentioned in [137] demonstrated that standard fusion techniques for SAR and optical images require data from both sources, which indicates that it is still not easy to interpret SAR images without the support of optical images. To address this issue, Schmitt et al. [137]

proposed an automatic colorization network, composed of a VAE and a mixture density network (MDN)

[14], to predict artificially colored SAR images (i.e., Sentinel-1 images). These images are proven to disclose more information to the human interpreter than the original SAR data.

In [57], the authors tackled the problem of cloud removal from optical imagery. They introduced a cGAN architecture to fuse SAR and cloud-corrupted multi-spectral data for generating cloud- and haze-free multi-spectral optical data. Experiments proved the effectiveness of the proposed network for removing cloud from multi-spectral data with auxiliary SAR data. Extending previous multi-modal networks for cloud removal, [40] proposed a cycle-consistent GAN architecture [200] that utilizes a image forward-backward translation consistency loss. Cloud-covered optical information is reconstructed via SAR data fusion, while changes to cloud-free areas are minimized through use of the cycle consistency loss. The cycle-consistent architecture allows training without pixel-wise correspondences between cloudy input and cloud-free target optical imagery, relaxing requirements on the training data set.

In summary, it can be seen that the utilization of deep learning methods for SAR-optical data fusion has been a hot topic in the remote sensing community. Although a handful of data sets consisting of optical and SAR corresponding image patches are available for different terrain types and applications, one of the biggest problems in this task is still the scarcity of high quality training data. Semi-supervised methods, as proposed in [74], seems to be a viable option to tackle the problem. A great challenge in SAR-optical image matching is the extreme difference in viewing geometries of the two sensors. For this it is important to exploit auxiliary 3D data in order to assist the training data generation.

Fig. 10: Samples of the OpenSARUrban [194]. Six classes are shown from the top to the bottom: dense and low-rise residential buildings, general residential area, high-rise buildings, villas, industrial storage area, and vegetation.

V Existing Benchmark Datasets and their limitations

In order to train and evaluate deep learning models, large datasets are indispensable. Unlike RGB images in the computer vision community, which can be easily collected and interpreted, SAR images are much more difficult to annotate due to their complex properties. Our research shows that big SAR datasets created for the primary purpose of deep learning research are nearly non-existent in the community. In recent years, only a few SAR datasets have been made public for training and assessing deep learning models. In the following, we categorize those datasets according to their best suited deep learning problem and focus on openly accessible and well-curated large datasets.

In particular, we consider the following categories of deep learning problems in SAR.

  • Image classification: each pixel or patch in one image is classified into a single label. This is often the case in typical land use land cover classification problems.

  • Scene classification: similar to image classification, one image or patch is classified into a single label. However, one scene is usually much larger than an image patch. Hence, it requires a different network architecture.

  • Semantic segmentation: one image or patch is segmented to a classification map of the same dimension. Training of such neural networks also requires densely annotated training data.

  • Object detection: similar to scene classification. However, detection often requires the estimation of the object location.

  • Registration/matching: provide binary classification (matched or unmatched), or estimate the translation between two image patches. This type of task requires matching pairs of two different image patches as training data.

Name Description Suitable tasks  Related work
So2Sat LCZ42111 [204],
TensorFlow API222
400,673 pairs of corresponding Sentinel-1 dual-pol image patch, Sentinel-2 multispectral image patch, and manually labeled local climate zones classes over 42 urban agglomerations (plus 10 additional smaller areas) across the globe. It is the first EO dataset that provides a quantitative measure of the label uncertainty, achieved by having a group of domain experts cast 10 independent votes on 19 cities in the dataset. image classification,
data fusion,
quantification of uncertainties
OpenSARUrban333 [194] 33,358 Sentinel-1 dual-pol images patches covering 21 major cities in China, labeled with 10 classes of urban scenes. image classification
SEN12MS444 [135] 180,748 corresponding image triplets containing Sentinel-1 dual-pol SAR data, Sentinel-2 multi-spectral imagery, and MODIS-derived land cover maps, covering all inhabited continents during all meteorological seasons. image classification,
semantic segmentation,
data fusion
MSAW555 [142] quad-pol X-band SAR imagery from Capella Space with 0.5 m spatial resolution, which covers 120 in the area of Rotterdam, the Netherlands. A total number of 48,000 unique building footprints are labeled with associated height information curated from the 3D Basis registratie Adressen en Gebouwen (3DBAG) dataset. semantic segmentation
PolSF, Data666,
Label777 [98]
The dataset includes PolSAR images of San Francisco from five different sensors. Each image was densely labeled to five or six classes, such as mountain, water, high-density urban, low-density urban, vegetation, developed, and bare soil. image classification,
semantic segmentation
data fusion
MSTAR888 [132] 17,658 X-band very high resolution SAR images chips (patches) of 10 classes of different vehicles plus one class of simple geometric shaped target. SAR images of pure clutter are also included in the dataset. object detection,
scene classification
[20] [35] [51]
OpenSARShip 2.0999 [95] 34,528 Sentinel-1 SAR image chips of ships with the ship geometric information, the ship type, and the corresponding automatic identification system (AIS) information. object detection,
scene classification
SAR-Ship-Dataset101010 [173] 43,819 Gaofen-3 or Sentinel-1 image chips of different ships. Each image chip has a dimension of 256 by 256 pixels in range and azimuth. object detection, scene classification
SARptical111111 [171] 10,108 coregistered pairs of TerraSAR-X very high resolution spotlight image patch and UltraCAM aerial RGB image patch in Berlin, Germany. The coregistration is defined by the matching of the 3D position of the center of the image pair. image matching [75, 172]
SEN1-2121212 [136] 282,384 pairs of corresponding Sentinel-1 single polarization intensity, and Sentinel-2 RGB image patches, collected across the globe. The patches are of dimension 256 by 256 pixels. image matching
data fusion
TABLE I: Summary of available open SAR datasets

V-a Image/Scene Classification

  • So2Sat LCZ42 [204]: So2Sat LCZ42 follows the local climate zones (LCZs) classification scheme. The dataset comprises 400,673 pairs of dual-pol Sentinel-1 and multi-spectral Sentinel-2 image patches from 42 urban agglomerations, plus 10 additional smaller areas, across five continents. The image patches are hand-labelled into one of the 17 LCZ classes [151]. The Sentinel-1 image patches in this dataset contain both the geocoded single look complex image, as well as a despeckled Lee filtered variant. In particular, it is the first Earth observation dataset that provides a quantitative measure of the label uncertainty, achieved by letting a group of domain experts cast 10 independent votes on 19 cities in the dataset. The dataset therefore can be considered a large-scale data fusion and classification benchmark dataset for cutting-edge machine learning methodological developments, such as automatic topology learning, data fusion, and quantification of uncertainties.

  • OpenSARUrban [194]: OpenSARUrban consists of 33,358 patches of Sentinel-1 dual-pol images covering 21 major cities in China. The dataset was manually annotated according to a hierarchical classification scheme, with 10 classes of urban scenes at its finest level. Each image patch has a dimension of 100 by 100 pixels with a pixel spacing of 10 m (Sentinel-1 GRD product). This dataset can support deep learning studies of urban target characterization, and content-based SAR image queries. Fig. 10 shows some samples from the OpenSARUrban dataset.

V-B Semantic Segmentation/Classification

  • SEN12MS [135]: SEN12MS was created based on its previous version SEN12 [136]. SEN12MS consists of 180,662 triplets of dual-pol Sentinel-1 image patches, multi-spectral Sentinel-2 image patches, and MODIS land cover maps. The patches are georeferenced with a ground sampling distance of 10 m. Each image patch has a dimension of 256 by 256 pixels. We expect this dataset to support the community in developing sophisticated deep learning-based approaches for common tasks such as scene classification or semantic segmentation for land cover mapping.

  • MSAW [142]: The multi-sensor all-weather mapping (MSAW) dataset includes high-resolution SAR data, which covers 120 in the area of Rotterdam, the Netherlands. The quad-polarized X-band SAR imagery from Capella Space with 0.5 m spatial resolution was used for the SpaceNet 6 Challenge. A total of 48,000 unique building footprints have been labeled with additional building heights.

  • PolSF [98]: This dataset consists of PolSAR images of San Francisco from eight different sensors, including AIRSAR, ALOS-1, ALOS-2, RADARSAT-2, SENTINEL-1A, SENTINEL-1B, GAOFEN-3, and RISAT (data compiled by E. Pottier of IETR). Five of the eight images were densely labeled to five or six land use land cover classes in [98]. These densely annotated images correspond to roughly 3,000 training patches of 128 by 128 pixels. Although the data volume is relatively low for deep learning research, this dataset is the only annotated multi-sensory PolSAR dataset, to the best of our knowledge. Therefore, we suggest that the creator of this dataset increase the number of annotated images to enable greater potential use of this dataset.

V-C Object Detection

  • MSTAR [132]: The Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset is one of the earliest datasets for SAR target recognition. The dataset consists of total 17,658 X-band SAR image chips (patches) of 10 classes of vehicle plus one class of simple geometric shaped target. The collected SAR image patches are by pixels with a resolution of one foot in range and azimuth. In addition, 100 SAR images of clutter were also provided in the dataset.

    In our opinion, the number of image patches in this dataset is relatively low for deep learning models, especially considering the number of classes. In addition, this dataset represents a rather ideal and unrealistic scenario: vehicles in the dataset are centered in the patch, and the clutter is quite homogeneous without disturbing signals. However, considering the scarcity of such datasets, MSTAR is a valuable source for target recognition.

  • OpenSARShip 2.0 [95]: This dataset was built based on its previous version, OpenSARShip [72]. It contains 34,528 Sentinel-1 SAR image patches of different ships with automatic identification system (AIS) information. For each SAR image patch, the creators manually extracted the ship length, width, and direction, as well as its type by verifying this data on the Marine Traffic website [95]. Among all the patches, about one-third is extracted from Sentinel-1 GRD products, and the other two-thirds are from Sentinel-1 SLC products. OpenSARShip 2.0 is one of the handful of SAR datasets suitable for object detection.

  • SAR-Ship-Dataset [173]: This dataset was created using 102 Gaofen-3 and 108 Sentinel-1 images. It consists of 43,819 ship chips of 256 pixels in both range and azimuth. These ships mainly have distinct scales and backgrounds. Therefore, this dataset can be employed for developing multi-scale object detection models.

  • FUSAR-Ship [180]: This dataset was created using space-time matched-up datasets of Gaofen-3 SAR images and ship AIS messages. It consists of over 5000 ship chips with corresponding ship information extracted from AIS messages, which can be used to trace back to each unique ship of any particular chip.

V-D Registration/Matching

  • SARptical [171, 172]: The SARptical dataset was designed for interpreting VHR spaceborne SAR images of dense urban areas. This dataset consists of 10,108 pairs of corresponding very high resolution SAR and optical image patches, whose location is precisely coregistered in 3D. They are extracted from TerraSAR-X VHR spotlight images with resolution better than 1 m and UltraCAM aerial optical images of 20 cm pixel spacing, respectively. Unlike low and medium resolution images, high resolution SAR and optical images in dense urban areas have very distinct geometries. Therefore, in the SARptical dataset, the center points of each image pair are matched in 3D space via sophisticated 3D reconstruction and matching algorithms. The UTM coordinates of the center pixel of each pair are also made available publicly in the dataset. This dataset contributes to applications of multi-modal data classification, and SAR optical images co-registering. However, we believe more training samples are required for learning complicated SAR optical image to image mapping.

  • SEN1-2 [136]: The SEN1-2 dataset consists of 282,384 pairs of corresponding Sentinel-1 single polarization intensity and Sentinel-2 RGB image patches, collected from across the globe and throughout all meteorological seasons. The patches are of dimension 256 by 256 pixels. Their distribution over the four seasons is roughly even. SEN1-2 is the first large open dataset of this kind. We believe it will support further developments in the field of deep learning for remote sensing as well as multi-sensor data fusion, such as SAR image colorization, and SAR-optical image matching.

V-E Other Datasets

  • Sample PolSAR images from ESA: For example, the Flevoland PolSAR Dataset. Several works make use of this dataset for agricultural land use land cover classification. The authors of [184, 65, 182] have manually labeled the dataset according to different classification schemes.

  • SAR Image Land Cover Datasets [39]: This dataset is not publicly available. Please contact the creator.

  • Airbus Ship Detection Challenge:

Vi Conclusion and Future Trends

This paper reviews the current state-of-the-art of an important and under-exploited research field — deep learning in SAR. Relevant deep learning models are introduced, and their applications in six application fields — terrain surface classification, object detection, parameter inversion, despeckling, InSAR, and SAR-optical data fusion — are analyzed in depth. Exisiting benchmark datasets and their limitations are discussed. In summary, despite early successes, full exploitation of deep learning in SAR is mostly limited by 1) the lack of large and representative benchmark datasets and 2) the defect of tailored deep learning models that make full consideration of SAR signal characteristics.

Looking forward, the years ahead will be exciting. Next generation spaceborne SAR missions will simultaneously provide high resolution and global coverage, which will enable novel applications such as monitoring the dynamic Earth. To retrieve geo-parameters from these data, development of new analytics methods are warranted. Deep learning is among the most promising methods. To fully unlock its potential in SAR/InSAR applications in this big SAR data era, there are several promising future directions:

  • Large and Representative Benchmark Datasets: As summarized in this article, there is only a handful of SAR benchmarks, in particular when excluding multi-modal ones. For instance, in SAR target detection, methods are mainly tested on a single benchmark data set — the MSTAR dataset, where only several thousands of target samples in total (several hundreds for each class) are provided for training. With respect to InSAR, due to the lack of ground truth, datasets are extremely deficient or nearly nonexistent. Large and representative expert-annotated benchmark datasets are in high demand in the SAR community, and deserve more attention.

  • Unsupervised Deep Learning: To bypass the deficiencies in annotated data in SAR, unsupervised deep learning is a promising direction. These algorithms derive insights directly from the data itself, and work as feature learning, representation learning, or clustering, which could be further used for data-driven analytics. Autoencoders and their extensions, such as variational autoencoders (VAEs) and deep embedded clustering algorithms, are popular choices. With respect to denoising, in despeckling, the high complexity of SAR images and lack of ground truth make it infeasible to produce appropriate benchmarks from real data. Noise2Noise [94] is an elegant example of unsupervised denoising where the authors learn denoised data without clean data. Despite the nice visual appearance of the results, preserving details is a must for SAR applications.

  • Interferometric Data Processing: Since deep learning methods are initially applied to perception tasks in computer vision, many methods resort to transforming SAR images, e.g., PolSAR images, into RGB-like images in advance or focus only on intensities. In other words, the most essential component of a SAR measurement — the phase information — is not appropriately considered. Although CV-CNNs are capable of learning phase information and show great potential in processing CV-SAR images, only a few such attempts have been made [192]. Extending CNN to complex domain, while being able to preserve the precious phase information, would enable networks to directly learn features from raw data, and would open up a wide range of SAR/InSAR applications.

  • Quantification of Uncertainties: Generally speaking, geo-parameter estimates without uncertainty measures are considered invalid in remote sensing. Appropriately trained deep learning models can achieve highly accurate predictions. Yet, they fail in quantifying the uncertainty of these predictions. Here, giving a statement about the predictive uncertainty, while considering both aleatoric uncertainty and epistemic uncertainty, is of crucial importance. The Bayesian deep learning community has developed a model-agnostic and easy-to-implement methodology to estimate both data and model uncertainty within deep learning models [82], which are awaiting exploration by the SAR community.

  • Large Scale Nonlinear Optimization Problems

    : The development of inversion algorithms should keep up the pace of data growth. Fast solvers are demanded for many advanced parameter inversion models, which often involve non-convex, nonlinear, and complex-valued optimization problems, such as compressive-sensing-based tomographic inversion, or low rank complex tensor decomposition for InSAR time series data analysis. In some cases, the iterations of the optimization algorithms perform similar computations as layers in neural networks, that is, a linear step followed by a non-linear activation (see for example, the iteratively reweighted least-squares approach). And it is thus meaningful to replace the computationally expensive optimization algorithms with unrolled deep architectures that could be trained from simulated data


  • Cognitive Sensors: Radars –– and SARs in particular –– are very complex and versatile imaging machines. A variety of modes (stripmap, spotlight, ScanSAR, TOPS, etc.), swath-widths, incidence angles and polarizations can be programmed in near real-time. Cognitive radars go a giant step further; they adapt their operational modes autonomously to the environment to be imaged by an intelligent interplay of transmit waveforms, adaptive signal processing on the receiver side and learning. Cognitive SARs are still in their conceptual and experimental phase and are often justified by the stunning capabilities of the echo-location system of bats. In his early pioneering article [59] Haykin defines three ingredients of a cognitive radar: “1) intelligent signal processing, which builds on learning through interactions of the radar with the surrounding environment; 2) feedback from the receiver to the transmitter, which is a facilitator of intelligence; and 3) preservation of the information content of radar returns, which is realized by the Bayesian approach to target detection through tracking.” Such a SAR could, e.g., perform a low resolution, yet wide swath, surveillance of a coastal area and in a first step detect objects of interest, like ships, in real-time. Based on these detections the transmit waveform can be modified such as to zoom into the region of interest and allow for a close-up look of the object and possibly classify or even identify it. Reinforcement (online) learning is part of the concept as well as fast and reliable detectors or classifiers (trained offline), e.g. based on deep learning. All this is edge computing; the learning algorithms have to perform in real-time and with the limited compute resources onboard the satellite or airplane.

Last but not least, technology advances in deep learning in remote sensing would only be possible if experts in remote sensing and machine learning work closely together. This is particularly true when it comes to SAR. Thus, we encourage more joint initiatives working collaboratively toward deep learning powered, explainable and reproducible big SAR data analytics.


  • [1] V. B. H. (Gini) Ketelaar (2009) Satellite radar interferometry. Remote Sensing and Digital Image Processing, Vol. 14, Springer Netherlands. Cited by: §IV-E.
  • [2] R. Abdelfattah and J. Nicolas (2002) Topographic SAR interferometry formulation for high-precision DEM generation. IEEE Transactions on Geoscience and Remote Sensing 40 (11), pp. 2415–2426. Cited by: §IV-E.
  • [3] D. Abulkhanov, I. Konovalenko, D. Nikolaev, A. Savchik, E. Shvets, and D. Sidorchuk (2018) Neural network-based feature point descriptors for registration of optical and SAR images. In International Conference on Machine Vision (ICMV), Cited by: §IV-F.
  • [4] A. Achim, P. Tsakalides, and A. Bezerianos (2003) SAR image denoising via Bayesian wavelet shrinkage based on heavy-tailed modeling. IEEE Transactions on Geoscience and Remote Sensing 41 (8), pp. 1773–1784. Cited by: §IV-D.
  • [5] N. Anantrasirichai, F. Albino, P. Hill, D. Bull, and J. Biggs (2018) Detecting volcano deformation in InSAR using deep learning. arXiv:1803.00380. Cited by: §IV-E.
  • [6] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull (2019-09) A deep learning approach to detecting volcano deformation from satellite imagery using synthetic datasets. Remote Sensing of Environment 230, pp. 111179. External Links: ISSN 00344257 Cited by: §IV-E.
  • [7] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull (2019-11-16) The application of convolutional neural networks to detect slow, sustained deformation in InSAR time series. Geophysical Research Letters 46 (21), pp. 11850–11858. External Links: ISSN 0094-8276, 1944-8007 Cited by: §IV-E.
  • [8] N. Anantrasirichai, J. Biggs, F. Albino, P. Hill, and D. Bull (2018-08-29) Application of machine learning to classification of volcanic deformation in routinely generated InSAR data. Journal of Geophysical Research: Solid Earth. External Links: ISSN 21699313 Cited by: §I, §I, §IV-E.
  • [9] F. Argenti and L. Alparone (2002) Speckle removal from SAR images in the undecimated wavelet domain. IEEE Transactions on Geoscience and Remote Sensing 40 (11), pp. 2363–2374. Cited by: §IV-D.
  • [10] F. Argenti, A. Lapini, T. Bianchi, and L. Alparone (2013-09) A tutorial on speckle reduction in synthetic aperture radar images. IEEE Geoscience and Remote Sensing Magazine 1 (3), pp. 6–35. External Links: ISSN 2168-6831 Cited by: §IV-D.
  • [11] N. Audebert, B. Le Saux, and S. Lefèvre (2017) Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks. In Computer Vision – ACCV 2016, S. Lai, V. Lepetit, K. Nishino, and Y. Sato (Eds.), Vol. 10111, pp. 180–196 (en). Note: Series Title: Lecture Notes in Computer Science External Links: ISBN 978-3-319-54180-8 978-3-319-54181-5, Link, Document Cited by: §IV-F.
  • [12] G. Baier, W. He, and N. Yokoya (2020) Robust nonlocal low-rank SAR time series despeckling considering speckle correlation by total variation regularization. IEEE Transactions on Geoscience and Remote Sensing, pp. 1–13. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-D.
  • [13] C. Bentes, A. Frost, D. Velotto, and B. Tings (2016) Ship-iceberg discrimination with convolutional neural networks in high resolution SAR images. In European Conference on Synthetic Aperture Radar (EUSAR), Cited by: §IV-B.
  • [14] C. Bishop (1994) Mixture density networks. Technical report Citeseer. Cited by: §IV-F.
  • [15] C. Bourez Deep learning course. Note: [Accessed May 27, 2020] External Links: Link Cited by: Fig. 1.
  • [16] M. Bryant and F. Garber (1999) SVM classifier applied to the MSTAR public data set. In Algorithms for Synthetic Aperture Radar Imagery, Cited by: §I.
  • [17] A. Buades, B. Coll, and J.-M. Morel (2005) A non-local algorithm for image denoising. In

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

    Vol. 2, pp. 60–65. External Links: ISBN 978-0-7695-2372-9 Cited by: §IV-D.
  • [18] Y. Cao, Y. Wu, P. Zhang, W. Liang, and M. Li (2019) Pixel-wise polsar image classification via a novel complex-valued deep fully convolutional network. Remote Sensing 11 (22), pp. 2653. Cited by: TABLE I.
  • [19] S. Chen and C. Tao (2018) PolSAR image classification using polarimetric-feature-driven deep convolutional neural network. IEEE Geoscience and Remote Sensing Letters 15 (4), pp. 627–631. Cited by: §IV-A.
  • [20] S. Chen, H. Wang, F. Xu, and Y. Jin (2016) Target classification using the deep convolutional networks for SAR images. IEEE Transactions on Geoscience and Remote Sensing 54 (8), pp. 4806–4817. Cited by: §IV-B, TABLE I.
  • [21] S. Chen and H. Wang (2014) SAR target recognition based on deep learning. In

    International Conference on Data Science and Advanced Analytics (DSAA)

    Cited by: §I, §I, §IV-B.
  • [22] X. Chen, J. Liu, Z. Wang, and W. Yin (2018) Theoretical linear convergence of unfolded ista and its practical weights and thresholds. External Links: 1808.10038 Cited by: 5th item.
  • [23] G. Chierchia, D. Cozzolino, G. Poggi, and L. Verdoliva (2017) SAR image despeckling through convolutional neural networks. arXiv:1704.00275. Cited by: 2nd item, Fig. 6, Fig. 7, §IV-D, §IV-D.
  • [24] G. Costante, T. Ciarfuglia, and F. Biondi (2018) Towards monocular digital elevation model (DEM) estimation by convolutional neural networks-application on synthetic aperture radar images. arXiv:1803.05387. Cited by: §IV-E.
  • [25] D. Cozzolino, G. Di Martino, G. Poggi, and L. Verdoliva (2017) A fully convolutional neural network for low-complexity single-stage ship detection in Sentinel-1 SAR images. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cited by: §IV-B.
  • [26] D. Cozzolino, L. Verdoliva, G. Scarpa, and G. Poggi (2020-03-20) Nonlocal CNN SAR image despeckling. Remote Sensing 12 (6), pp. 1006. External Links: ISSN 2072-4292 Cited by: Fig. 7, §IV-D.
  • [27] C. Dechesne, S. Lefèvre, R. Vadaine, G. Hajduch, and R. Fablet (2019) Multi-task deep learning from sentinel-1 sar: ship detection, classification and length estimation. In Conference on Big Data from Space, Cited by: §IV-B.
  • [28] F. Del Frate, M. Picchiani, G. Schiavon, and S. Stramondo (2010-10-07) Neural networks and SAR interferometry for the characterization of seismic events. In Proc. SPIE, C. Notarnicola (Ed.), pp. 78290J. Cited by: §IV-E.
  • [29] C.-A. Deledalle, L. Denis, and F. Tupin (2009-12) Iterative weighted maximum likelihood denoising with probabilistic patch-based weights. IEEE Transactions on Image Processing 18 (12), pp. 2661–2672. External Links: ISSN 1057-7149, 1941-0042 Cited by: §IV-D.
  • [30] C. Deledalle, L. Denis, F. Tupin, A. Reigber, and M. Jager (2015-04) NL-SAR: a unified nonlocal framework for resolution-preserving (pol)(in)SAR denoising. IEEE Transactions on Geoscience and Remote Sensing 53 (4), pp. 2021–2038. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-D.
  • [31] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin (2015-01) SAR-SIFT: a SIFT-like algorithm for SAR images. IEEE Transactions on Geoscience and Remote Sensing 53 (1), pp. 453–466. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-F.
  • [32] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: §II.
  • [33] L. Denis, C. Deledalle, and F. Tupin (2019-07) From patches to deep learning: combining self-similarity and neural networks for sar image despeckling. In IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 5113–5116. External Links: ISBN 978-1-5386-9154-0 Cited by: 2nd item.
  • [34] W. Dierking (2013) Sea ice monitoring by synthetic aperture radar. Oceanography 26 (2), pp. 100–111. Cited by: §IV-C.
  • [35] J. Ding, B. Chen, H. Liu, and M. Huang (2016) Convolutional neural network with data augmentation for SAR target recognition. IEEE Geoscience and Remote Sensing Letters 13 (3), pp. 364–368. Cited by: §IV-B, TABLE I.
  • [36] H. Dong, L. Zhang, and B. Zou (2020-01-26) PolSAR image classification with lightweight 3d convolutional networks. Remote Sensing 12 (3), pp. 396. External Links: ISSN 2072-4292 Cited by: §IV-A.
  • [37] H. Dong, B. Zou, L. Zhang, and S. Zhang (2020) Automatic design of CNNs via differentiable neural architecture search for PolSAR image classification. IEEE Transactions on Geoscience and Remote Sensing, pp. 1–14. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-A, §IV-A.
  • [38] K. Du, Y. Deng, R. Wang, T. Zhao, and N. Li (2016) SAR ATR based on displacement-and rotation-insensitive CNN. Remote Sensing Letters 7 (9), pp. 895–904. Cited by: §IV-B.
  • [39] C. O. Dumitru, G. Schwarz, and M. Datcu (2018-05) SAR Image Land Cover Datasets for Classification Benchmarking of Temporal Changes. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11 (5), pp. 1571–1592. Note: Conference Name: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing External Links: ISSN 2151-1535, Document Cited by: 2nd item.
  • [40] P. Ebel, M. Schmitt, and X. Zhu (2020) Cloud removal in unpaired sentinel-2 imagery using cycle-consistent gan and sar-optical data fusion. IGARSS 2020 IEEE International Geoscience and Remote Sensing Symposium. Cited by: §IV-F.
  • [41] T. Elsken, J. H. Metzen, and F. Hutter (2018) Neural architecture search: a survey. arXiv preprint arXiv:1808.05377. Cited by: §II-C1.
  • [42] W. Feng, N. Guan, Y. Li, X. Zhang, and Z. Luo (2017-05) Audio visual speech recognition with multimodal recurrent neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, pp. 681–688 (en). External Links: ISBN 978-1-5090-6182-2, Link, Document Cited by: Fig. 1.
  • [43] M. Ferguson, R. Ak, Y. T. Lee, and K. H. Law (2017-12) Automatic localization of casting defects with convolutional neural networks. In 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, pp. 1726–1735 (en). External Links: ISBN 978-1-5386-2715-0, Link, Document Cited by: Fig. 1.
  • [44] M. A. Fischler and R. C. Bolles (1981-06-01) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24 (6), pp. 381–395. External Links: ISSN 00010782 Cited by: §IV-F.
  • [45] V. Frost, J. Stiles, K. Shanmugan, and J. Holtzman (1982) A model for radar images and its application to adaptive digital filtering of multiplicative noise. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-4 (2), pp. 157–166. Cited by: §IV-D.
  • [46] M. Fuentes Reyes, S. Auer, N. Merkle, C. Henry, and M. Schmitt (2019-09-03) SAR-to-optical image translation based on conditional generative adversarial networks—optimization, opportunities and limits. Remote Sensing 11 (17), pp. 2067. External Links: ISSN 2072-4292 Cited by: §IV-F.
  • [47] H. Furukawa (2018) Deep learning for end-to-end automatic target recognition from synthetic aperture radar imagery. arXiv:1801.08558. Cited by: §IV-B.
  • [48] F. Gao, T. Huang, J. Wang, J. Sun, A. Hussain, and E. Yang (2017) Dual-branch deep convolution neural network for polarimetric SAR image classification. Applied Sciences 7 (5), pp. 447. Cited by: Fig. 3, §IV-A, §IV-A.
  • [49] F. Gao, T. Huang, J. Wang, J. Sun, E. Yang, and A. Hussain (2017) Combining deep convolutional neural network and svm to sar image target recognition. In IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Cited by: §IV-B.
  • [50] F. Gao, T. Huang, J. Sun, J. Wang, A. Hussain, and E. Yang (2019) A new algorithm for sar image target recognition based on an improved deep convolutional neural network. Cognitive Computation 11 (6), pp. 809–824. Cited by: §IV-B.
  • [51] F. Gao, Y. Yang, J. Wang, J. Sun, E. Yang, and H. Zhou (2018) A deep convolutional generative adversarial networks (dcgans)-based semi-supervised method for object recognition in synthetic aperture radar (sar) images. Remote Sensing 10 (6), pp. 846. Cited by: TABLE I.
  • [52] J. Geng, J. Fan, H. Wang, X. Ma, B. Li, and F. Chen (2015) High-resolution SAR image classification via deep convolutional autoencoders. IEEE Geoscience and Remote Sensing Letters 12 (11), pp. 2351–2355. Cited by: Fig. 2, §IV-A, §IV-A.
  • [53] J. Geng, H. Wang, J. Fan, and X. Ma (2017) Deep supervised and contractive neural network for SAR image classification. IEEE Transactions on Geoscience and Remote Sensing 55 (4), pp. 2442–2459. Cited by: Fig. 2, §IV-A, §IV-A.
  • [54] S. Gernhardt and R. Bamler (2012-09) Deformation monitoring of single buildings using meter-resolution SAR data in PSI. ISPRS Journal of Photogrammetry and Remote Sensing 73, pp. 68–79. External Links: ISSN 09242716 Cited by: §IV-E.
  • [55] R. Girshick (2015) Fast R-CNN. arXiv:1504.08083. Cited by: §IV-B.
  • [56] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §II-A3, §II-B2.
  • [57] C. Grohnfeld, M. Schmitt, and X. X. Zhu (2018) A conditional generative adversarial network to fuse SAR and multispectral optical data for cloud removal from Sentinel-2 images. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cited by: §IV-F.
  • [58] Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew (2018-06) A review of semantic segmentation using deep neural networks. International Journal of Multimedia Information Retrieval 7 (2), pp. 87–93. External Links: ISSN 2192-6611, 2192-662X Cited by: §I.
  • [59] S. Haykin (2006) Cognitive radar: a way of the future. IEEE Signal Processing Magazine 23 (1), pp. 30–40. Cited by: 6th item.
  • [60] C. He, S. Li, Z. Liao, and M. Liao (2013) Texture classification of PolSAR data based on sparse coding of wavelet polarization textons. IEEE Transactions on Geoscience and Remote Sensing 51 (8), pp. 4576–4590. Cited by: §IV-A.
  • [61] C. He, M. Tu, D. Xiong, and M. Liao (2020-02-17) Nonlinear manifold learning integrated with fully convolutional networks for PolSAR image classification. Remote Sensing 12 (4), pp. 655. External Links: ISSN 2072-4292 Cited by: §IV-A.
  • [62] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II-A1.
  • [63] A. Hirose (2012) Complex-valued neural networks. Studies in Computational Intelligence, Vol. 400, Springer Berlin Heidelberg. Cited by: §IV-E.
  • [64] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §II-A2.
  • [65] D. Hoekman and M. Vissers (2003) A new polarimetric classification approach evaluated for agricultural crops. IEEE Transactions on Geoscience and Remote Sensing 41 (12), pp. 2881–2889. Cited by: 1st item.
  • [66] T. Hoeser and C. Kuenzer (2020) Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends. Remote Sensing 12 (10), pp. 1667. Cited by: §II-A1.
  • [67] B. Hou, H. Kou, and L. Jiao (2016) Classification of polarimetric SAR images using multilayer autoencoders and superpixels. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9 (7), pp. 3072–3081. Cited by: §IV-A.
  • [68] B. Hou, B. Ren, G. Ju, H. Li, L. Jiao, and J. Zhao (2016-01) SAR image classification via hierarchical sparse representation and multisize patch features. IEEE Geoscience and Remote Sensing Letters 13 (1), pp. 33–37. External Links: ISSN 1545-598X, 1558-0571 Cited by: Fig. 2.
  • [69] B. Huang and K. M. Carley (2019-08) Residual or Gate? Towards Deeper Graph Neural Networks for Inductive Graph Representation Learning. arXiv:1904.08035 [cs, stat] (en). Note: arXiv: 1904.08035 External Links: Link Cited by: Fig. 1.
  • [70] B. Huang and K. M. Carley (2019) Residual or gate? towards deeper graph neural networks for inductive graph representation learning. arXiv preprint arXiv. Cited by: §II-C2.
  • [71] G. Huang, Z. Liu, K. Weinberger, and L. Maaten (2017) Densely connected convolutional networks. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II-A1.
  • [72] L. Huang, B. Liu, B. Li, W. Guo, W. Yu, Z. Zhang, and W. Yu (2018-01) OpenSARShip: A Dataset Dedicated to Sentinel-1 Ship Interpretation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11 (1), pp. 195–208. Note: Conference Name: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing External Links: ISSN 2151-1535, Document Cited by: 2nd item, TABLE I.
  • [73] Z. Huang, Z. Pan, and B. Lei (2019) What, where, and how to transfer in sar target recognition based on deep cnns. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §IV-B.
  • [74] L. H. Hughes and M. Schmitt (2019-09-16) A SEMI-SUPERVISED APPROACH TO SAR-OPTICAL IMAGE MATCHING. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-2/W7, pp. 71–78. External Links: ISSN 2194-9050 Cited by: §IV-F.
  • [75] L. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu (2018) Identifying corresponding patches in SAR and optical images with a pseudo-siamese CNN. IEEE Geoscience and Remote Sensing Letters 15 (5), pp. 784–788. Cited by: §I, 1st item, §IV-F, TABLE I.
  • [76] L. H. Hughes, N. Merkle, T. Burgmann, S. Auer, and M. Schmitt (2019-07) Deep learning for SAR-optical image matching. In IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 4877–4880. External Links: ISBN 978-1-5386-9154-0 Cited by: §IV-F.
  • [77] K. Ichikawa and A. Hirose (2017) Singular unit restoration in InSAR using complex-valued neural networks in the spectral domain. IEEE Transactions on Geoscience and Remote Sensing 55 (3), pp. 1717–1723. Cited by: §IV-E.
  • [78] K. Ikeuchi, T. Shakunaga, M.D. Wheeler, and T. Yamazaki (1996) Invariant histograms and deformable template matching for SAR target recognition. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 100–105. External Links: ISBN 978-0-8186-7259-0 Cited by: §I.
  • [79] J. Jiao, Y. Zhang, H. Sun, X. Yang, X. Gao, W. Hong, K. Fu, and X. Sun (2018) A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection. IEEE Access 6, pp. 20881–20892. Cited by: §IV-B.
  • [80] M. Kang, K. Ji, X. Leng, and Z. Lin (2017) Contextual region-based convolutional neural network with multilayer fusion for SAR ship detection. Remote Sensing 9 (8), pp. 860. Cited by: §IV-B.
  • [81] S. Kazemi, B. Yonel, and B. Yazici (2019) Deep learning for direct automatic target recognition from sar data. In 2019 IEEE Radar Conference (RadarConf), pp. 1–6. Cited by: §IV-B.
  • [82] A. Kendall and Y. Gal (2017) What uncertainties do we need in bayesian deep learning for computer vision?. External Links: 1703.04977 Cited by: 4th item.
  • [83] E. Keydel, S. Lee, and J. Moore (1996) MSTAR extended operating conditions: a tutorial. In Algorithms for Synthetic Aperture Radar Imagery III, Cited by: §IV-B.
  • [84] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §II-A.
  • [85] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §II-B2.
  • [86] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §II-C2.
  • [87] A. Krizhevsky, I. Sutskever, and G. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Cited by: §II.
  • [88] D. Kuan, A. Sawchuk, T. Strand, and P. Chavel (1985) Adaptive noise smoothing filter for images with signal-dependent noise. IEEE transactions on Pattern Analysis and Machine Intelligence PAMI-7 (2), pp. 165–177. Cited by: §IV-D.
  • [89] F. Lattari, B. Gonzalez Leon, F. Asaro, A. Rucci, C. Prati, and M. Matteucci (2019-06-28) Deep learning for SAR image despeckling. Remote Sensing 11 (13), pp. 1532. External Links: ISSN 2072-4292 Cited by: §IV-D.
  • [90] Y. LeCun, Y. Bengio, and G. Hinton (2015-05) Deep learning. Nature 521 (7553), pp. 436–444. External Links: ISSN 0028-0836, 1476-4687 Cited by: §I.
  • [91] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §II.
  • [92] Y. LeCun, C. Cortes, and C. Burges (2010) MNIST handwritten digit database. IEEE. Cited by: §II.
  • [93] J. Lee (1980) Digital image enhancement and noise filtering by use of local statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-2 (2), pp. 165–168. Cited by: §IV-D.
  • [94] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila (2018) Noise2Noise: learning image restoration without clean data. External Links: 1803.04189 Cited by: 2nd item.
  • [95] B. Li, B. Liu, L. Huang, W. Guo, Z. Zhang, and W. Yu (2017-11) OpenSARShip 2.0: A large-volume dataset for deeper interpretation of ship targets in Sentinel-1 imagery. In 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, pp. 1–5 (en). External Links: ISBN 978-1-5386-4519-2, Link, Document Cited by: 2nd item, TABLE I.
  • [96] J. Li, C. Qu, and J. Shao (2017) Ship detection in SAR images based on an improved faster R-CNN. In SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Cited by: §IV-B.
  • [97] L. Li, L. Ma, L. Jiao, F. Liu, Q. Sun, and J. Zhao (2019-12) Complex contourlet-CNN for polarimetric SAR image classification. Pattern Recognition, pp. 107110. External Links: ISSN 00313203 Cited by: §IV-A, §IV-A.
  • [98] X. Liu, L. Jiao, and F. Liu (2019) PolSF: polsar image dataset on san francisco. arXiv preprint arXiv:1912.07259. Cited by: 3rd item, TABLE I.
  • [99] Y. Liu, M. Zhang, P. Xu, and Z. Guo (2017) SAR ship detection using sea-land segmentation-based convolutional neural network. In International Workshop on Remote Sensing with Intelligent Processing (RSIP), Cited by: §IV-B.
  • [100] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §IV-B.
  • [101] H. Mao, M. Alizadeh, I. Menache, and S. Kandula (2016) Resource management with deep reinforcement learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, pp. 50–56. Cited by: §II-B3.
  • [102] D. Massonnet, P. Briole, and A. Arnaud (1995) Deflation of mount Etna monitored by spaceborne radar interferometry. Nature 375 (6532), pp. 567. Cited by: §IV-E.
  • [103] D. Massonnet, M. Rossi, Cé. Carmona, F. Adragna, G. Peltzer, K. Feigl, and T. Rabaute (1993) The displacement field of the landers earthquake mapped by radar interferometry. Nature 364 (6433), pp. 138. Cited by: §IV-E.
  • [104] A. Mazza, F. Sica, P. Rizzoli, and G. Scarpa (2019-01) TanDEM-X Forest Mapping Using Convolutional Neural Networks. Remote Sensing 11 (24), pp. 2980 (en). External Links: Link, Document Cited by: §IV-A.
  • [105] N. Merkle, S. Auer, R. Müller, and P. Reinartz (2018)

    Exploring the potential of conditional adversarial networks for optical and SAR image matching

    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 1–10. Cited by: §IV-F.
  • [106] N. Merkle, W. Luo, S. Auer, R. Müller, and R. Urtasun (2017) Exploiting deep matching and SAR data for the geo-localization accuracy improvement of optical satellite images. Remote Sensing 9 (6), pp. 586. Cited by: §IV-F.
  • [107] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. (2015) Human-level control through deep reinforcement learning. Nature 518 (7540), pp. 529–533. Cited by: §II-B3.
  • [108] S. Montazeri, X. X. Zhu, M. Eineder, and R. Bamler (2016-12) Three-dimensional deformation monitoring of urban infrastructure by tomographic SAR using multitrack TerraSAR-x data stacks. IEEE Transactions on Geoscience and Remote Sensing 54 (12), pp. 6868–6878. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-E.
  • [109] A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek, and K. P. Papathanassiou (2013-03) A tutorial on synthetic aperture radar. IEEE Geoscience and Remote Sensing Magazine 1 (1), pp. 6–43. External Links: ISSN 2168-6831 Cited by: §IV-A, §IV-E.
  • [110] D. Morgan (2015) Deep convolutional neural networks for ATR from SAR imagery. In Algorithms for Synthetic Aperture Radar Imagery, Cited by: §IV-B.
  • [111] L. Mou, M. Schmitt, Y. Wang, and X. X. Zhu (2017) A CNN for the identification of corresponding patches in SAR and optical imagery of urban scenes. In Urban Remote Sensing Event (JURSE), Cited by: 1st item, Fig. 9, §IV-F.
  • [112] A. G. Mullissa, C. Persello, and A. Stein (2019) PolSARNet: a deep fully convolutional network for polarimetric sar image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. Cited by: §IV-B.
  • [113] A. G. Mullissa, C. Persello, and A. Stein (2019-12) PolSARNet: a deep fully convolutional network for polarimetric SAR image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (12), pp. 5300–5309. External Links: ISSN 1939-1404, 2151-1535 Cited by: §IV-A, §IV-A.
  • [114] E. Ndikumana, D. Ho Tong Minh, N. Baghdadi, D. Courault, and L. Hossard (2018-08-03) Deep recurrent neural network for agricultural classification using multitemporal SAR sentinel-1 for camargue, france. Remote Sensing 10 (8), pp. 1217. External Links: ISSN 2072-4292 Cited by: §IV-A.
  • [115] M. Neumann, A. S. Pinto, X. Zhai, and N. Houlsby (2019-11) In-domain representation learning for remote sensing. arXiv:1911.06721 [cs] (en). Note: arXiv: 1911.06721 External Links: Link Cited by: TABLE I.
  • [116] N. Ødegaard, A. Knapskog, C. Cochin, and J. Louvigne (2016) Classification of ships using real and simulated data in a convolutional neural network. In IEEE Radar Conference (RadarConf), Cited by: §IV-B.
  • [117] K. Oyama and A. Hirose (2018) Adaptive phase-singular-unit restoration with entire-spectrum-processing complex-valued neural networks in interferometric SAR. Electronics Letters 54 (1), pp. 43–44. Cited by: §IV-E.
  • [118] S. Panchal Cityscape image segmentation with tensorflow 2.0. Note: [Accessed May 27, 2020] External Links: Link Cited by: Fig. 1.
  • [119] H. Parikh, S. Patel, and V. Patel (2020-01-02) Classification of SAR and PolSAR images using deep learning: a review. International Journal of Image and Data Fusion 11 (1), pp. 1–32. External Links: ISSN 1947-9832, 1947-9824 Cited by: §I, §I, §IV-A.
  • [120] S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva (2012-02) A nonlocal SAR image denoising algorithm based on LLMMSE wavelet shrinkage. IEEE Transactions on Geoscience and Remote Sensing 50 (2), pp. 606–616. External Links: ISSN 0196-2892, 1558-0644 Cited by: Fig. 7.
  • [121] B. A. Pearlmutter (1989) Learning state space trajectories in recurrent neural networks. Neural Computation 1 (2), pp. 263–269. Cited by: §II-A2.
  • [122] K. Pearson (1901) LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (11), pp. 559–572. Cited by: §II-B2.
  • [123] G. Peltzer and P. Rosen (1995) Surface displacement of the 17 May 1993 Eureka valley, California, earthquake observed by SAR interferometry. Science 268 (5215), pp. 1333–1336. Cited by: §IV-E.
  • [124] M. Picchiani, F. Del Frate, G. Schiavon, S. Stramondo, M. Chini, and C. Bignami (2011-10-06) Neural networks for automatic seismic source analysis from DInSAR data. In Proc. SPIE, pp. 81790K. Cited by: §IV-E.
  • [125] F. Qin, J. Guo, and W. Sun (2017)

    Object-oriented ensemble classification for polarimetric SAR imagery using restricted Boltzmann machines

    Remote Sensing Letters 8 (3), pp. 204–213. Cited by: §IV-A.
  • [126] F. RADAR and J. FALKINGHAM () GLOBAL satellite observation requirements for floating ice. External Links: Link Cited by: §IV-C.
  • [127] S. Ren, K. He, R. Girshick, and J. Sun (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. Cited by: §IV-B.
  • [128] R. Ressel, A. Frost, and S. Lehner (2015-07) A neural network-based classification for sea ice types on x-band SAR images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8 (7), pp. 3672–3680. External Links: ISSN 1939-1404, 2151-1535 Cited by: §IV-A.
  • [129] R. Ressel, S. Singha, S. Lehner, A. Rosel, and G. Spreen (2016-07) Investigation into different polarimetric features for sea ice classification using x-band synthetic aperture radar. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9 (7), pp. 3131–3143. External Links: ISSN 1939-1404, 2151-1535 Cited by: §IV-A.
  • [130] R. Ressel, S. Singha, and S. Lehner (2016-07) Neural network based automatic sea ice classification for CL-pol RISAT-1 imagery. In 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 4835–4838. External Links: ISBN 978-1-5090-3332-4 Cited by: §IV-A.
  • [131] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §II-A1.
  • [132] T. Ross, S. Worrell, V. Velten, J. Mossing, and M. Bryant (1998) Standard SAR ATR evaluation experiments using the MSTAR public release data set. In Algorithms for Synthetic Aperture Radar Imagery, Cited by: 1st item, TABLE I.
  • [133] M. Rostami, S. Kolouri, E. Eaton, and K. Kim (2019) Deep transfer learning for few-shot sar image classification. Remote Sensing 11 (11), pp. 1374. Cited by: §IV-B.
  • [134] J. Ruch, J. Anderssohn, T. Walter, and M. Motagh (2008) Caldera-scale inflation of the Lazufre volcanic area, south America: evidence from InSAR. Journal of Volcanology and Geothermal Research 174 (4), pp. 337–344. Cited by: §IV-E.
  • [135] M. Schmitt, L. H. Hughes, C. Qiu, and X. X. Zhu (2019-09) SEN12MS - A CURATED DATASET OF GEOREFERENCED MULTI-SPECTRAL SENTINEL-1/2 IMAGERY FOR DEEP LEARNING AND DATA FUSION. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-2/W7, pp. 153–160 (en). External Links: ISSN 2194-9050, Link, Document Cited by: 1st item, TABLE I.
  • [136] M. Schmitt, L. H. Hughes, and X. X. Zhu (2018) The SEN1-2 dataset for deep learning in SAR-Optical data fusion. In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Cited by: 1st item, 2nd item, TABLE I.
  • [137] M. Schmitt, L. Hughes, M. Körner, and X. X. Zhu (2018) Colorizing Sentinel-1 SAR images using a variational autoencoder conditioned on Sentinel-2 imagery. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 42, pp. 2. Cited by: §IV-F.
  • [138] M. Schmitt and X. X. Zhu (2016) On the challenges in stereogrammetric fusion of SAR and optical imagery for urban areas. the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 41 (B7), pp. 719–722. Cited by: §IV-F.
  • [139] C. Schwegmann, W. Kleynhans, J. Engelbrecht, L. Mdakane, and R. Meyer (2017) Subsidence feature discrimination using deep convolutional neural networks in synthetic aperture radar imagery. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cited by: §IV-E.
  • [140] C. Schwegmann, W. Kleynhans, B. Salmon, L. Mdakane, and R. Meyer (2016) Very deep learning for ship discrimination in synthetic aperture radar imagery. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cited by: §IV-B.
  • [141] M. Shahzad, M. Maurer, F. Fraundorfer, Y. Wang, and X. X. Zhu (2019) Buildings detection in VHR SAR images using fully convolution neural networks. IEEE Transactions on Geoscience and Remote Sensing 57 (2), pp. 1100–1116. Cited by: Fig. 5, §IV-B.
  • [142] J. Shermeyer, D. Hogan, J. Brown, A. Van Etten, N. Weir, F. Pacifici, R. Haensch, A. Bastidas, S. Soenen, T. Bacastow, et al. (2020) SpaceNet 6: multi-sensor all weather mapping dataset. arXiv preprint arXiv:2004.06500. Cited by: 2nd item, TABLE I.
  • [143] Y. Shi, X. X. Zhu, and R. Bamler (2015) Optimized parallelization of non-local means filter for image noise reduction of InSAR image. In IEEE International Conference on Information and Automation, Cited by: 2nd item.
  • [144] Y. Shi, Q. Li, and X. X. Zhu (2020) Building segmentation through a gated graph convolutional neural network with deep structured feature embedding. ISPRS Journal of Photogrammetry and Remote Sensing 159, pp. 184–197. Cited by: §II-C2.
  • [145] T. Silva An intuitive introduction to generative adversarial networks (gans). Note: [Accessed May 26, 2020] External Links: Link Cited by: Fig. 1.
  • [146] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. (2016) Mastering the game of go with deep neural networks and tree search. nature 529 (7587), pp. 484. Cited by: §II-B3.
  • [147] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Cited by: §I, §II-A1.
  • [148] S. Singha, M. Johansson, N. Hughes, S. M. Hvidegaard, and H. Skourup (2018-07) Arctic sea ice characterization using spaceborne fully polarimetric l-, c-, and x-band SAR with validation by airborne measurements. IEEE Transactions on Geoscience and Remote Sensing 56 (7), pp. 3715–3734. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-A.
  • [149] T. Song, L. Kuang, L. Han, Y. Wang, and Q. H. Liu (2018-07) Inversion of rough surface parameters from SAR images using simulation-trained convolutional neural networks. IEEE Geoscience and Remote Sensing Letters 15 (7), pp. 1130–1134. External Links: ISSN 1545-598X, 1558-0571 Cited by: §IV-C.
  • [150] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15 (1), pp. 1929–1958. Cited by: §II-B1.
  • [151] I. D. Stewart and T. R. Oke (2012) Local climate zones for urban temperature studies. Bulletin of the American Meteorological Society 93 (12), pp. 1879–1900. External Links: Link Cited by: 1st item.
  • [152] S. Stramondo, F. Del Frate, M. Picchiani, and G. Schiavon (2011-01) Seismic source quantitative parameters retrieval from InSAR data and neural networks. IEEE Transactions on Geoscience and Remote Sensing 49 (1), pp. 96–104. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-E.
  • [153] S. Suri and P. Reinartz (2010) Mutual-information-based registration of TerraSAR-X and Ikonos imagery in urban areas. IEEE Transactions on Geoscience and Remote Sensing 48 (2), pp. 939–949. Cited by: §IV-F.
  • [154] X. Tang, L. Zhang, and X. Ding (2018)

    SAR image despeckling with a multilayer perceptron neural network

    International Journal of Digital Earth, pp. 1–21. Cited by: 1st item, §IV-D.
  • [155] N. Teimouri, M. Dyrmann, and R. N. Jørgensen (2019-04-25) A novel spatio-temporal FCN-LSTM network for recognizing various crop types using multi-temporal radar images. Remote Sensing 11 (8), pp. 990. External Links: ISSN 2072-4292 Cited by: §IV-A.
  • [156] T. Tieleman and G. Hinton (2012) Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. Note: COURSERA: Neural Networks for Machine Learning Cited by: §II-A.
  • [157] R. Touzi, A. Lopes, and P. Bousquet (1988) A statistical and geometrical edge detector for SAR images. IEEE Transactions on Geoscience and Remote Sensing 26 (6), pp. 764–773. Cited by: 2nd item.
  • [158] E. Trasatti, F. Casu, C. Giunchi, S. Pepe, G. Solaro, S. Tagliaventi, P. Berardino, M. Manzo, A. Pepe, G. Ricciardi, E. Sansosti, P. Tizzani, G. Zeni, and R. Lanari (2008) The 2004–2006 uplift episode at Campi Flegrei caldera (Italy): constraints from SBAS-DInSAR ENVISAT data and Bayesian source inference. Geophysical Research Letters 35 (7), pp. 1–6. Cited by: §IV-E.
  • [159] F. Tupin, L. Denis, C. Deledalle, and G. Ferraioli (2019-07) Ten years of patch-based approaches for sar imaging: a review. In IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 5105–5108. External Links: ISBN 978-1-5386-9154-0 Cited by: §IV-D.
  • [160] S. Uhlmann and S. Kiranyaz (2014) Integrating color features in polarimetric SAR image classification. IEEE Transactions on Geoscience and Remote Sensing 52 (4), pp. 2197–2216. Cited by: Fig. 2.
  • [161] Under the hood of the variational autoencoder (in prose and code). Note: [Accessed May 27, 2020] External Links: Link Cited by: Fig. 1.
  • [162] S. Valade, A. Ley, F. Massimetti, O. D’Hondt, M. Laiolo, D. Coppola, D. Loibl, O. Hellwich, and T. R. Walter (2019-06-27)

    Towards global volcano monitoring using multisensor sentinel missions and artificial intelligence: the MOUNTS monitoring system

    Remote Sensing 11 (13), pp. 1528. External Links: ISSN 2072-4292 Cited by: Fig. 8, §IV-E.
  • [163] S. Wagner (2016) SAR ATR by a combination of convolutional neural network and support vector machines. IEEE Transactions on Aerospace and Electronic Systems 52 (6), pp. 2861–2872. Cited by: §IV-B.
  • [164] L. Wang, A. Scott, L. Xu, and D. Clausi (2014) Ice concentration estimation from dual-polarized SAR images using deep convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I, §I, §IV-C.
  • [165] L. Wang, K. Scott, L. Xu, and D. Clausi (2016) Sea ice concentration estimation during melt from dual-pol SAR scenes using deep convolutional neural networks: a case study. IEEE Transactions on Geoscience and Remote Sensing 54 (8), pp. 4524–4533. Cited by: §IV-C.
  • [166] L. Wang (2016) Learning to estimate sea ice concentration from SAR imagery. Ph.D. Thesis, University of Waterloo, University of Waterloo. External Links: Link Cited by: §IV-C.
  • [167] P. Wang, H. Zhang, and V. Patel (2017) SAR image despeckling using a convolutional neural network. IEEE Signal Processing Letters 24 (12), pp. 1763–1767. Cited by: §I, 2nd item, §IV-D, §IV-D.
  • [168] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao (2018) A deep learning framework for remote sensing image registration. ISPRS Journal of Photogrammetry and Remote Sensing. Cited by: §IV-F.
  • [169] Y. Wang, C. He, X. Liu, and M. Liao (2018) A hierarchical fully convolutional network integrated with sparse and low-rank subspace representations for PolSAR imagery classification. Remote Sensing 10 (2), pp. 342. Cited by: §IV-A.
  • [170] Y. Wang, X. X. Zhu, S. Montazeri, J. Kang, L. Mou, and M. Schmitt (2017) Potential of the “SARptical” system. In FRINGE, Cited by: §IV-F.
  • [171] Y. Wang, X. X. Zhu, B. Zeisl, and M. Pollefeys (2017-01) Fusing Meter-Resolution 4-D InSAR Point Clouds and Optical Images for Semantic Urban Infrastructure Monitoring. IEEE Transactions on Geoscience and Remote Sensing 55 (1), pp. 14–26. Note: 00002 External Links: ISSN 0196-2892, Document Cited by: 1st item, TABLE I.
  • [172] Y. Wang and X. X. Zhu (2018) The SARptical dataset for joint analysis of SAR and optical image in dense urban area. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cited by: §IV-F, 1st item, TABLE I.
  • [173] Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei (2019-03) A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds. Remote Sensing 11 (7), pp. 765 (en). External Links: ISSN 2072-4292, Link, Document Cited by: 3rd item, TABLE I.
  • [174] Wikipedia Long short-term memory. Note: [Accessed May 27, 2020] External Links: Link Cited by: Fig. 1.
  • [175] M. Wilmanski, C. Kreucher, and J. Lauer (2016) Modern approaches in deep learning for SAR ATR. In Algorithms for Synthetic Aperture Radar Imagery, Cited by: §IV-B.
  • [176] H. Xie, L. Pierce, and F. Ulaby (2002) SAR speckle reduction using wavelet denoising and Markov random field modeling. IEEE Transactions on Geoscience and Remote Sensing 40 (10), pp. 2196–2212. Cited by: §IV-D.
  • [177] H. Xie, S. Wang, K. Liu, S. Lin, and B. Hou (2014) Multilayer feature learning for polarimetric synthetic radar data classification. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cited by: Fig. 2, §IV-A, §IV-A, §IV-A.
  • [178] W. Xie, G. Ma, F. Zhao, H. Liu, and L. Zhang (2020-05) PolSAR image classification via a novel semi-supervised recurrent complex-valued convolution neural network. Neurocomputing 388, pp. 255–268. External Links: ISSN 09252312 Cited by: §IV-A, §IV-A.
  • [179] Xin Su, C. Deledalle, F. Tupin, and Hong Sun (2014-10) Two-step multitemporal nonlocal means for synthetic aperture radar images. IEEE Transactions on Geoscience and Remote Sensing 52 (10), pp. 6181–6196. External Links: ISSN 0196-2892, 1558-0644 Cited by: §IV-D.
  • [180] H. Xiyue, A. Wei, S. Qian, L. Jian, W. Haipeng, and X. Feng (2020) FUSAR-ship: a high-resolution sar-ais matchup dataset of gaofen-3 for ship detection and recognition. SCIENCE CHINA Information Sciences. Cited by: 4th item.
  • [181] R. Yamaki and A. Hirose (2009-01) Singular unit restoration in interferograms based on complex-valued markov random field model for phase unwrapping. IEEE Geoscience and Remote Sensing Letters 6 (1), pp. 18–22. External Links: ISSN 1545-598X, 1558-0571 Cited by: §IV-E.
  • [182] W. Yang, D. Dai, J. Wu, and C. He (2010) Weakly supervised polarimetric SAR image classification with multi-modal Markov aspect model. ISPRS. Cited by: 1st item.
  • [183] W. Yao, D. Marmanis, and M. Datcu (2017) Semantic segmentation using deep neural networks for SAR and optical image pairs. Cited by: §IV-F.
  • [184] P. Yu, A. Qin, and D. Clausi (2012) Unsupervised polarimetric SAR image segmentation and classification using region growing with edge penalty. IEEE Transactions on Geoscience and Remote Sensing 50 (4), pp. 1302–1317. Cited by: 1st item.
  • [185] D. Yue, F. Xu, and Y. Jin (2018) SAR despeckling neural network with logarithmic convolutional product model. International Journal of Remote Sensing 39 (21), pp. 7483–7505. Cited by: §IV-D.
  • [186] N. Zakhvatkina, V. Smirnov, and I. Bychkova (2019-03-31) Satellite SAR data-based sea ice classification: an overview. Geosciences 9 (4), pp. 152. External Links: ISSN 2076-3263 Cited by: §IV-A.
  • [187] H. Zebker, C. Werner, P. Rosen, and S. Hensley (1994) Accuracy of topographic maps derived from ERS-1 interferometric radar. IEEE Transactions on Geoscience and Remote Sensing 32 (4), pp. 823–836. Cited by: §IV-E.
  • [188] F. Zhang, C. Hu, Q. Yin, W. Li, H. Li, and W. Hong (2017) SAR target recognition using the multi-aspect-aware bidirectional LSTM recurrent neural networks. arXiv:1707.09875. Cited by: Fig. 4, §IV-B.
  • [189] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §IV-D.
  • [190] L. Zhang, W. Ma, and D. Zhang (2016) Stacked sparse autoencoder in PolSAR data classification using local spatial information. IEEE Geoscience and Remote Sensing Letters 13 (9), pp. 1359–1363. Cited by: §IV-A.
  • [191] Q. Zhang, Q. Yuan, J. Li, Z. Yang, and X. Ma (2018) Learning a dilated residual network for SAR image despeckling. Remote Sensing 10 (2), pp. 196. Cited by: §IV-D.
  • [192] Z. Zhang, H. Wang, F. Xu, and Y. Jin (2017) Complex-valued convolutional neural network and its application in polarimetric SAR image classification. IEEE Transactions on Geoscience and Remote Sensing 55 (12), pp. 7177–7188. Cited by: §IV-A, §IV-A, 3rd item.
  • [193] J. Zhao, M. Datcu, Z. Zhang, H. Xiong, and W. Yu (2019) Contrastive-regulated cnn in the complex domain: a method to learn physical scattering signatures from flexible polsar images. IEEE Transactions on Geoscience and Remote Sensing 57 (12), pp. 10116–10135. Cited by: §IV-C.
  • [194] J. Zhao, Z. Zhang, W. Yao, M. Datcu, H. Xiong, and W. Yu (2020) OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13, pp. 187–203 (en). External Links: ISSN 1939-1404, 2151-1535, Link, Document Cited by: Fig. 10, 2nd item, TABLE I.
  • [195] Q. Zhao and J. Principe (2001) Support vector machines for SAR automatic target recognition. IEEE Transactions on Aerospace and Electronic Systems 37 (2), pp. 643–654. Cited by: §I.
  • [196] Z. Zhao, L. Jiao, J. Zhao, J. Gu, and J. Zhao (2017) Discriminant deep belief network for high-resolution SAR image classification. Pattern Recognition 61, pp. 686–701. Cited by: §IV-A.
  • [197] Z. Zhao, P. Zheng, S. Xu, and X. Wu (2019-11) Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems 30 (11), pp. 3212–3232. External Links: ISSN 2162-237X, 2162-2388 Cited by: §I.
  • [198] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr (2015) Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision, pp. 1529–1537. Cited by: §IV-B.
  • [199] Y. Zhou, H. Wang, F. Xu, and Y. Jin (2016) Polarimetric SAR image classification using deep convolutional neural networks. IEEE Geoscience and Remote Sensing Letters 13 (12), pp. 1935–1939. Cited by: §IV-A.
  • [200] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017-10)

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §IV-F.
  • [201] X. X. Zhu, R. Bamler, M. Lachaise, F. Adam, Y. Shi, and M. Eineder (2014) Improving TanDEM-X DEMs by non-local InSAR filtering. In European Conference on Synthetic Aperture Radar (EUSAR), Cited by: 2nd item.
  • [202] X. X. Zhu, D. Tuia, L. Mou, G. Xia, L. Zhang, F. Xu, and F. Fraundorfer (2017) Deep learning in remote sensing: a comprehensive review and list of resources. IEEE Geoscience and Remote Sensing Magazine 5 (4), pp. 8–36. Cited by: §I.
  • [203] X. X. Zhu and R. Bamler (2011-07) Let’s do the time warp: multicomponent nonlinear motion estimation in differential SAR tomography. IEEE Geoscience and Remote Sensing Letters 8 (4), pp. 735–739. External Links: ISSN 1545-598X, 1558-0571 Cited by: §IV-E.
  • [204] X. Zhu, J. Hu, C. Qiu, Y. Shi, J. Kang, L. Mou, H. Bagheri, M. Häberle, Y. Hua, R. Huang, L. D. Hughes, H. Li, Y. Sun, G. Zhang, S. Han, M. Schmitt, and Y. Wang (2020) So2Sat LCZ42: A benchmark dataset for global local climate zones classification. IEEE Geoscience and Remote Sensing Magazine in press. Cited by: 1st item, TABLE I.
  • [205] M. Zitnik, M. Agrawal, and J. Leskovec (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34 (13), pp. 457–466. Cited by: Fig. 1.
  • [206] B. Zoph and Q. V. Le (2017-02) Neural Architecture Search with Reinforcement Learning. arXiv:1611.01578 [cs] (en). Note: arXiv: 1611.01578 External Links: Link Cited by: Fig. 1.
  • [207] B. Zoph and Q. V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §II-C1.