Speckle reduction is a key step in many remote sensing applications. By strongly affecting synthetic aperture radar (SAR) images, it makes them difficult to analyse. Due to the difficulty to model the spatial correlation of speckle, a deep learning algorithm with self-supervision is proposed in this paper: SAR2SAR. Multi-temporal time series are leveraged and the neural network learns to restore SAR images by only looking at noisy acquisitions. To this purpose, the recently proposed noise2noise framework has been employed. The strategy to adapt it to SAR despeckling is presented, based on a compensation of temporal changes and a loss function adapted to the statistics of speckle. A study with synthetic speckle noise is presented to compare the performances of the proposed method with other state-of-the-art filters. Then, results on real images are discussed, to show the potential of the proposed algorithm. The code is made available to allow testing and reproducible research in this field.READ FULL TEXT VIEW PDF
Synthetic Aperture Radar (SAR) is an active imaging technology that is widely used for Earth observation, thanks to its capability of acquiring images by day or night and in (almost) all-weather conditions. Agriculture, forestry, oceanography are among the fields that benefit from the exploitation of SAR images, employed in a wide range of practical applications such as urban monitoring, land-use mapping, biomass estimation, damage assessment, oil spill detection, ice monitoring, among others. This is achieved through advanced techniques, like interferometry and polarimetry.
However, interpreting SAR images is a challenging task, both for human observers and for automatic tools aiming at extracting useful information. Indeed, they are corrupted by speckle. Although commonly referred to as noise (we will also adopt this convention in this paper), speckle is a physical phenomenon that is caused by the coherent sum of the contributions from different elementary scatterers within the same resolution cell, which the radar cannot resolve. The phase differences induce fluctuations in the complex summation and then in the observed amplitude, that produces in the observed image a granular behaviour.
In this context, being capable of effectively removing speckle from SAR images is of crucial importance for the community. Great efforts have been devoted to this topic. Among the most sophisticated techniques recently developed, one can mention non-local (NL) algorithms [3, 4, 5, 6], generalized in NL-SAR , a fully automatic algorithm that handles any SAR modality, single- or multi-look images, by performing several non-local estimations to best restore speckle-free data.
In , a general framework, called MuLoG, is proposed to apply any image denoiser originally designed for additive Gaussian noise within an iterative speckle removal procedure. Recent advances of deep learning for additive white Gaussian noise (AWGN) removal can thus be exploited . Training an end-to-end model for speckle reduction has been most recently studied [10, 11, 12, 13, 14, 15]
. However, taking into account the peculiarities of SAR data remains challenging: noise distribution is not the same as in natural images, as well as the content, the texture, or the physical meaning of a pixel value. On top of that, there is an inherent scarcity of speckle-free references to train supervised deep learning algorithms to map a noisy SAR image to a speckle-free image. Thus, borrowing algorithms proposed in the computer vision field and extending them to the speckle removal task is not straightforward.
In this paper, we address the lack of noise-free references by extending the noise2noise approach proposed by Lehtinen et al. in  in order to take into account the peculiarities of SAR data. Thanks to a grounded loss function formulation, which depends on the speckle model, large stacks of multitemporal images acquired over the same area are exploited in two ways. In the first instance, as described in [our SARCNN], a dataset of noiseless images is created and our model is trained with synthetic speckle, following the fully developed Goodman’s model of noise . At a later stage, we feed the network with real acquisitions, allowing learning of the spatial correlation introduced by the SAR processing steps, namely spectral windowing and oversampling  
. The problem of temporal changes is addressed by a strategy for change compensation. This ensures robustness and generalization of the algorithm, since any temporal series of SAR images can be used at this transfer learning step.
SAR images lack of noise-free references, which makes it non-trivial to adapt deep learning image denoising methods to speckle reduction. An analysis of the solutions recently proposed is carried out in this section.
The first paper that investigates the use of CNN for SAR image despeckling has been proposed by Chierchia et al. . Inspired by Zhang et al. , their SAR-CNN is an adaptation of the denoising CNN (DnCNN) to SAR images. The groundtruth is created exploiting temporal series of images: assuming that no change has occurred between acquisitions, the images are temporally multilooked to produce a reference image. While achieving high-quality results both on images with synthetic noise and on real SAR images, the method is difficult to reproduce. Not only is it rare to observe temporal stability, but the definition of absence of change is ambiguous. Only images with short temporal baseline generally offer sufficient temporal stability, but this often comes with a strong temporal correlation of speckle, which undermines the ability of the network to efficiently remove speckle.
An alternative approach considered by Wang et al. [11, 12] and by Zhang et al.  consist in using natural images and produce SAR-like data by generating synthetic speckle, following Goodman’s model . Only the case of multilooked images has been considered in the experiments, making the speckle fluctuations less prominent than in the most interesting case of single-look SAR acquisitions.
The use of natural images, combined with synthetic speckle noise, has the advantage that a huge dataset can be effortlessly generated, allowing the training of models with numerous parameters (i.e., deep architectures). However, peculiar characteristics of SAR images are neglected: content, geometry, resolution, scattering phenomena, etc. A compromise is proposed by Lattari et al. . While the network is initially trained on a synthetic dataset built from natural images, a fine-tuning on SAR images is subsequently carried out. To this purpose, stacks of images are temporally averaged to produce a target image, then corrupted with synthetic speckle. In this way, the model can better handle real SAR images. To this end, a U-Net architecture  is employed in a residual fashion, along with the homomorphic approach.
At present, there does not exist a clear strategy on how to train deep learning models for SAR image despeckling. Some insight is given in [our SARCNN], where a high-quality dataset of noise-free SAR images is built to train an end-to-end deep learning model.
Algorithms developed using speckle generated under Goodman’s fully developed speckle model generally assume an absence of spatial correlations , which is not the case in actual SAR images synthetized by space agencies [21, 17]. Thus, a careful pre-processing step must be performed before handling real images to prevent the apparition of strong artifacts . Whitening the spectrum [17, 22, 23] or down-sampling the image are possible strategies . Yet, they are either not easy to apply in a systematic fashion or they result in a loss of spatial resolution.
To overcome these issues, an end-to-end self-supervised deep learning model trained on real SAR images is proposed by Boulch et al. . Their work is based on the intuition that, if no change occurs, randomly picking two images from a temporal stack and training a neural network to reproduce one image starting from the other eventually leads the network to output the underlying speckle-free reflectivity, as only the speckle realization is changing.
As temporal stability is rarely observed in practise, Molini et al.  present an algorithm enabling direct training on real images, learning to denoise from a single image at a time. Their method however relies on the assumption that speckle is spatially uncorrelated, i.e. a whitening preprocessing stage is shown to be crucial in order to achieve good performances The authors suggest further studies on real images, while showing promising preliminary results.
The fluctuations affecting SAR images arise from the 3-D spatial configuration and the nature of the scatterers inside a resolution cell. Echoes generated by each scatterer interfere either in a constructive or in a destructive way.
These perturbations are generally modeled as a multiplicative noise, i.e. the speckle. Assuming a large number of elementary scatterers, producing echoes with independent and identically distributed (i.i.d.) complex amplitudes, the fully-developed speckle model proposed by Goodman et al.  relates the measured intensity , the underlying reflectivity , and the speckle as follows:
with the number of looks and the gamma function. It follows that and
. Thus, averaging independent samples in intensity leads to an unbiased estimator of the underlying reflectity, reducing the fluctuations by a factor proportional to the number of samples available.
In order to stabilize the variance, i.e. to make it independent from the reflectivity, a logarithmic transformation is often applied (homomorphic transform, see). The log-speckle has an additive behaviour:
where follows a Fisher-Tippett distribution described by:
The log-speckle has a variance that does not depend on the log-reflectivity, i.e., it is stationary throughout the image: , where is the polygamma function of order . The mean of the log-speckle is not zero: , with the digamma function. Averaging log-transformed intensities requires a compensation for to obtain an unbiased estimator of log-reflectivity.
The focusing of a SAR image involves a series of processing steps that, as a result, introduce a spatial correlation between neighboring pixels. Goodman’s speckle model does not take these correlations into account, requiring an adaptation of the algorithms relying on the fully-developed i.i.d assumptions .
In the supervised learning setting, pairsof noiseless and noisy images are available for training. A common approach to estimate the parameters of an estimator is to minimize the loss function:
is a random realization of the random vectorand is a random realization under the conditional distribution .
The self-supervised approach noise2noise introduced by Lehtinen et al.  considers only noisy pairs , where and are two independent realizations drawn under the same conditional distribution . The authors suggest replacing the unknown realization with the noisy observation given that it is much easier to obtain additional noisy measurements of a static scene rather than very high quality measurements (i.e., virtually noise-free images):
Provided that the noise is centered, i.e. , the two expansions differ only by a term that is constant with respect to the parameters . Therefore, if the training set is large enough, parameters estimated with the self-supervised procedure of (5) are equivalent to parameters estimated with the supervised procedure (6).
In practice, training sets are limited and it is therefore necessary to consider how fast the self-supervised estimator converges to the supervised estimator. Under non-Gaussian noise, other loss functions may be more efficient. This is in particular the case of the co-log-likelihood:
Among the M-estimators, i.e. methods to estimate parameters based on the minimization of a loss function over the training set, the maximum likelihood estimators are known to be efficient . This is illustrated in the case of speckle in figure 1, where the root mean square error111since the log-speckle is not centered, a compensation is added to prevent a bias with the loss, which is replaced by of the log-intensity is reported for the and the log-likelihood loss functions. The minimizer of the log-likelihood loss converges more quickly to , which indicates that it should be preferred as a loss function for self-supervised training of a despeckling network and is confirmed in our experiments described in section V.
When and are noisy log-intensity images, section III recalled that the conditional distribution is a Fisher-Tippett distribution. The loss function in (9) takes the form, under a simplifying assumption of statistical independence between pixels:
where the constant offset and the multiplicative factor are dropped since they are irrelevant in the minimization problem (9), and the sum involves all the pixel values and of the image pair.
Beyond the adaptation of the loss function to the statistics of speckle, it is necessary to account for changes that occur between two SAR images of a scene. If an estimator is available to produce pre-estimations and of the log-reflectivity images corresponding to the two speckle-corrupted log intensities and , changes in the second image can be partially compensated by forming the image: which more closely resembles image .
The proposed SAR restoration method, named SAR2SAR since it extends on the original ideas of noise2noise, considers several SAR time series, each accurately co-registered, and performs the training of a despeckling network using both the idea of a self-supervised loss and of change compensation. Figure 2 summarizes the principle of the method: the restoration is performed in the log-domain by a deep network. Since the change compensation requires the availability of pre-estimated reflectivities, the training of the network is performed in 3 steps: (A) first on images with synthetically generated speckle (in the self-supervised fashion of equation (IV-B)), (B) then on pairs of images extracted randomly from a time-series, the second image being compensated for changes based on reflectivites estimated with the network trained in (A), (C) finally a refinement step is performed where the network weights in (B) are used to obtain a better compensation for changes.
In our set of experiments, our model is the U-Net  described in , trained in a residual fashion . Images are fed to the network after a log transform. Thus, the network reproduces the noise, which is subtracted from the input image. The despeckled image is obtained as a result.
One of the main issues when using deep learning algorithms on SAR images is the scarcity of training data. To achieve the desired level of generalization and given that the application of the presented algorithm to time-series needs an accurate adaptation, training is initially carried out on images corrupted with synthetic speckle noise. At each iteration, two independent speckle realizations (following the model described in section III) are used to create two noisy images, one being the input image and the other one to compute the loss. The images are divided into patches of
pixels, with a stride of 32. 3014 batches of 4 images compose our training set. The network has been trained for 30 epochs using the Adam optimizer, with a learning rate set as 0.001 and decreased by a factor of 10 after the first 10 epochs. The loss function of ourSAR2SAR method is adapted to the distribution of SAR images using equation IV-B.
Creating synthetic images from noise-free references, moreover, allows a more reliable evaluation. Results of several despeckling filters are presented in table I. The PSNR values on the SAR2SAR are not only comparable to those obtained with SAR-CNN [our paper], but are superior to when is adopted (with the proper debiasing step, as discussed in section IV-B), justifying the adaptation of the loss. Results on image Lely are displayed in figure 3. While the use of SAR2SAR is motivated by its direct application on real SAR images, even on images corrupted with synthetic speckle noise it achieves state-of-the-art results.
To fine-tune the network on real images, the SAR time series composing the training need to be denoised to generate the images used to compensate for changes. An estimation can be obtained by using the network trained on synthetic speckle, subsampling the images to reduce the effect of the correlation . Training proceeds for 20 more epochs with a learning rate decreased by a factor of 100 w.r.t. the initial value. 5 time series of 53 (Limagne), 45 (Marais 1), 45 (Marais 2), 69 (Rambouillet) and 25 (Lely) dates compose the training set. 2896 image patches are organized into 724 batches of 4 patches each. Learning is thereby transferred to correlated speckle.
As the compensation images used at this step required a subsampling operation, they have a poor resolution that impact the results produced by SAR2SAR. To overcome this issue, an iterative process has been studied: every 10 epochs the compensation images given to the network are updated with the results of SAR2SAR itself. Given that, asymptotically, the function learned by the network tends to the identity, we found experimentally that one iteration is a good compromise between speckle reduction and improvement of the resolution (see table II). Results are shown in figure 4.
|iter 1||iter 2||iter 3||iter 4||iter 5|
Single-look SAR images are difficult to denoise due to the strong spatial correlation. The proposed SAR2SAR algorithm learns the statistics of real speckle noise directly from the data, by devising a self-supervised algorithm leveraging deep learning and multi-temporal stacks of images.
While providing state-of-the-art results on images with synthetic speckle noise, it is on real single-look images that SAR2SAR shows a clear improvement over existing despeckling algorithms. The methods developed under fully developed speckle model assumptions, indeed, need a careful adaptation in order to properly deal with correlated data . If a subsampling step is applied, images with a poorer resolution are produced. A more careful pre-processing, however, needs knowledge of the sensor’s parameters and adds a computational burden. SAR2SAR learns the speckle model directly from the data, making it readily applicable on real images. The advantage of deep learning algorithms, moreover, is that they are computationally fast once they are trained.
The use of information to compensate for changes is the key step that allows exploitation of SAR stacks. From our experiments it turned out that training in a scenario under which temporal stability can be assumed was not leading to such good results. To do this, given an image at date as input, the closest image in time was chosen to do the training. But even in this condition, the no change hypothesis did not always hold, giving mitigated results.
The general formulation of SAR2SAR suggests that it can be extended to Ground Range Detected (GRD) images and to any sensor (e.g. TerraSAR-X), once the training data are collected. The weights of the trained model are released along with this article222https://github.com/emanueledalsasso/SAR2SAR, to allow testing of our method and to foster research on SAR image denoising.
X. Yang, L. Denis, F. Tupin, and W. Yang, “Sar image despeckling using pre-trained convolutional neural network models,” in2019 JURSE. IEEE, 2019, pp. 1–4.
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.