Domain Adaptive Adversarial Learning Based on Physics Model Feedback for Underwater Image Enhancement

02/20/2020 ∙ by Yuan Zhou, et al. ∙ 0

Owing to refraction, absorption, and scattering of light by suspended particles in water, raw underwater images suffer from low contrast, blurred details, and color distortion. These characteristics can significantly interfere with the visibility of underwater images and the result of visual tasks, such as segmentation and tracking. To address this problem, we propose a new robust adversarial learning framework via physics model based feedback control and domain adaptation mechanism for enhancing underwater images to get realistic results. A new method for simulating underwater-like training dataset from RGB-D data by underwater image formation model is proposed. Upon the synthetic dataset, a novel enhancement framework, which introduces a domain adaptive mechanism as well as a physics model constraint feedback control, is trained to enhance the underwater scenes. Final enhanced results on synthetic and real underwater images demonstrate the superiority of the proposed method, which outperforms nondeep and deep learning methods in both qualitative and quantitative evaluations. Furthermore, we perform an ablation study to show the contributions of each component we proposed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 6

page 8

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the development of science and technology, humans’ sight has not been limited to the terrestrial area visible to the naked eye. The ocean, which occupies 71% of the earth’s surface area, is one of the hottest areas of exploration at this stage. Recently, underwater images have become the most effective tools to explore this treasure trove of resources and have emerged in a wide spectrum of aquatic applications, such as deep ocean exploration, inspection of underwater infrastructures [1] and cables [2], sea life monitoring [3], archeology [4], and control of underwater robotics [5].

Different from common images, underwater images suffer from poor visibility such as low contrast, color casts, and blurred details, resulting from the attenuation of the propagated light, mainly due to wavelength-dependent light absorption and scattering as well as the effects of low-end optical imaging devices [6] [7]. The scattering and absorption attenuate the direct transmission and introduce surrounding scattered light. The attenuated direct transmission leads the intensity from the scene to be weaker and introduces color casts, while the surrounding scattered light causes the appearance of the scene to be washed out. In nature, this distortion magnitude of scattering and attenuation is extremely non-linear, and is affected by various factors, including the number of underwater particles, time of day, operational depth, overcast versus sunny, and imaging devices [8]. Some examples of underwater images with different environmental conditions are illustrated in Fig. 1. Due to serious degeneration, it is hard to recover the realistic color and appearance of underwater images, while color and appearance are crucial for underwater vision tasks [9], such as classification, detection, tracking, to name a few. Thereby, developing an effective solution to enhance contrast and restore color for these images is desirable.

Fig. 1: Samples of underwater image with natural and man-made artifacts displaying the diversity of distortions that can occur. With the varying camera-to-object distances in the images, the distortion and loss of color varies between the different images.

Various processing methods for images degraded by the underwater environment have been developed in the past years. Traditional methods include image restoration and image enhancement. For image restoration methods [10, 11, 12, 13, 14, 15]

, the degradation model of underwater optical imaging is taken into consideration for reconstructing the images. Most of these methods are difficult to simulate and recover the complex underwater imaging process by estimating only a few parameters. They can only alleviate the color casts and blur effect of underwater images to a certain extent. While image enhancement techniques

[6, 16, 17, 18] focus on adjusting image pixel values to acquire satisfactory results without depending on imaging model, they can only generate a single enhancement effect on various underwater images globally regardless of image style and imaging process. In addition, due to the lack of abundant training data, these traditional methods display poor generalization performance in different underwater images, and some of the results tend to be over-enhanced or under-enhanced.

Alternatively, Deep neural networks have been shown to be powerful non-linear function approximators, especially in the field of vision. For low-level vision tasks, e.g., image super-resolution

[19], image de-raining [20], image de-noising [21], and de-hazing [22]

, powerful supervised learning models have obtained convincing success. Generally, these networks require large amounts of labeled data. However, images obtained in harsh and complex underwater scenario lack ground truth, which is a major hindrance towards adopting a similar supervised approach for correction. Some researchers try to solve the problem of the lack of ground truth in underwater images (by synthesizing paired underwater images from in-air data or proposing new weakly supervise constraint), and perform underwater image enhancement tasks through deep learning methods

[23, 24, 25, 26, 27, 28, 29]. However, underwater images synthesized by existing algorithms such as through physical models have a single style, which have visual differences and inter-domain differences with diverse real-world underwater images. Simultaneously, due to the complexity of the underwater image enhancement problem, e.g., the solution space of the corresponding problem is too large, a simple feed-forward network or generator with a random initialization is unable to estimate the solution well, and the weakly supervised constraint or only the adversarial loss does not ensure that the contents of the outputs are consistent with those of the inputs. So these algorithms are still less effective enough, it is still necessary to develop underwater image synthesis and enhancement methods for superior underwater visual quality and improve the performance of high-level vision tasks.

In this paper, we design an end-to-end solution for the complex and nonlinear underwater image formation procedure. More elaborately, a domain adaptation based and physical model constrained novel adversarial learning [30] architecture is proposed, it was trained on the underwater scene prior based synthesized image pairs and real-world underwater images. Extensive experiments validate the superior performance of proposed framework with respect to robustness, flexibility and realistic for diverse water type. The main contributions of this paper are summarized as follows:

  1. We synthesized a novel image training set based on the physical imaging model in underwater scenario, which is more in line with the visual effects of multiple real-world style underwater images.

  2. To the best of our knowledge we are the first that propose a physics model constrained learning algorithm so that it can guide the estimation in the GAN framework of underwater image processing. The physics model acts as the feedback controller of GAN based enhancement network, provides explicit constraints for this ill-posed problem, ensures that the estimated results should be consistent with the observed image and more realistic. The GAN with the physics model constrained learning algorithm is jointly trained in an end-to end fashion.

  3. Compared with the existing neural network based methods trained by synthesized underwater image pairs, to our best knowledge, this is the first attempt to introduce a domain adaptive mechanism to eliminate the domain gap between synthetic underwater images and real-world underwater images, which helps the network trained on synthetic datasets also effective enough for enhancing real underwater images.

  4. Our method generalizes well both to synthetic and real-world underwater images with diverse color and visibility characteristics.

The remainder of this paper is organized as follows: In Section II, we give a brief overview of the background knowledge and previous art. In Section III, The proposed method is described. Section IV gives experimental results of our proposed methods and analyzes their effectiveness by comparing with previous works. We finally conclude this paper in Section V.

Ii Background Knowledge And Previous Art

This section surveys the basic principles underlying underwater image formulation model and the function of domain adaptation, then reviews the main approaches that have been considered to restore or enhance the images captured under water.

Ii-a Underwater Image Formulation Model

According to Jaffe-McGlamery imaging model [31], if the camera is not so far away from the scene, an underwater image can be regarded as a linear superposition of two components: 1) light which has not been scattered or absorbed in the intervening water, called the direct component; 2) light which enters the camera without reflection from the object, called backscatter. It can be formulated as follows:

(1)

where is the captured underwater image; is the clear latent image, also called as the scene radiance, that we aim to recover; is the homogeneous global background light; is the wavelength of the light for the red, green and blue channels; and x is a point in the underwater scene. The medium energy ratio represents the percentage of the scene radiance reaching at the camera after reflecting from the point in the underwater scene, which thereby causes color cast and contrast degradation. In other words, is a function of the wavelength of light and the distance from the camera to the object surface:

(2)

Where is the normalized residual energy which is the ratio of residual energy to the initial energy per unit of distance and is dependent on the wavelength of light. For example, the bluish tone of the most underwater images is due to the fast attenuation of the red wavelength in open water as it possesses a longer wavelength than blue and green ones.

Ii-B Domain Adaptation

Conventional machine learning algorithms rely on the assumption that the training and test data are drawn i.i.d. from the same underlying distribution. However, in practice it is common that there exists some discrepancy (domain gap) between training data and testing data. Domain adaptation aims to rectify this mismatch and tune the models toward better generalization at testing phase

[32, 33, 34, 35].

In Underwater image enhancement community and many other low-level tesks, various learning based methods call for synthetic dataset to train the enhancement models, generalizing to real-world underwater images. However, these methods ignored the domain gap between the synthetic training data and real-world testing data, which seriously affected the generalization ability of their model. To deal with this problem, we introduce a domain adaptive mechanism to eliminate the domain gap between synthetic underwater images and real-world underwater images, which helps the network trained on synthetic dataset also effective enough for enhancing real underwater images.

Ii-C Related work of Underwater Image Processing

Given the importance of underwater vision, numerous methods towards improving underwater image quality have been proposed to address the degradation issues of underwater images. Generally, these algorithms can be categorized into three types including model-based restoration methods, model-free enhancement methods, and learning-based convolutional neural networks (CNNs).

Restoration method regards the recovery of underwater image as an inverse problem, which restores underwater images by estimating parameters of underwater image formation model. The dark-channel prior [36] is the most commonly adopted prior, which is used in the estimation of the scene depth in a single image. some researchers apply this prior to process underwater images. For example, Drews et al. extended the classical DCP to underwater image restoration [10]. Chiang et al. apply DCP and extend the work to compensate the attenuated light according to the scene depth and the normalized residual energy ratio in each light channel [37]. Galdran et al. restored red channel to recover the lost contrast of un derwater images [11]. Peng et al. adopted image blurriness with the image formation model to estimate the distance between the scene point and the camera, and thereby recovered underwater images [14]. Moreover, Fattal [38] proposed a novel method for single image dehazing, which takes advantage of a color-lines pixel regularity. Zhou et al. extended the color-lines model to underwater image restoration [15] and demonstrated decent performance. However, many physical parameters and underwater optical property are required, making these restoration-based methods inflexible to be implemented in the harsh and complex real underwater environment.

Underwater image enhancement technologies always focus on adjusting image pixel values to produce a subjectively and visually appealing image. Iqbal et al. proposed an integrated color model and an unsupervised color correction method to enhance the visual quality of underwater images [39] [40]. Ancuti et al. proposed a fusion-based method to increase the contrast of underwater images and videos [6]. Chani and Isa improved contrast and reduced noise of underwater images through modifying the integrated color model [16]. Li and Guo proposed an underwater image enhancement method based on dehazing and color correction [41]. Fu et al. proposed a simple yet effective retinex-based (RB) approach to enhance a single underwater image [17]. Bianco et al. proposed a new color correction method for underwater imaging, which demonstrated the effectiveness of color correction in color space [42]. However, These enhancement-based methods improve underwater scene contrast and image quality to some extent, but output images in some scenes become overenhanced or underenhanced, simultaneously their methods reckon without the underwater physical parameters.

Relying on abundant training data, deep learning [43] techniques are capable of improving image quality in different underwater scenes. In virtue of the physical model, WaterGAN [23] uses in-air images with corresponding depth information to generate the synthetic image for specific underwater scenarios. Li et al. develops a weak underwater image color correction model based on the cycle-consistent adversarial network (CycleGAN) [44]

and a multiterm loss function

[28]. Considering that CycleGAN can translate an image from one domain to another domain without paired training data or depth pairings, an underwater GAN (UGAN) [24]

employs it as a degradation process to generate paired training data, and then uses the model based on pix2pix

[45] to improve underwater image quality. However, due to the complexity of the underwater image quality improvment problem, e.g., the solution space of the corresponding problem is too large, a simple feedforward network with a random initialization is unable to estimate the solution well.

Different from previous methods, we propose an effective framework designed for underwater image enhancement with feedback control based on physical model. It trained by the synthetic underwater images, and a domain adaptive mechanism is introduced to eliminate the domain gap between synthetic training data and real-world underwater testing images, which would be demonstrated to be effective in ablation study. In addition, the proposed framework performs well in terms of both subjective and objective evaluations on synthetic underwater images and real underwater images.

Iii Methodology

Fig. 2: Proposed framework. The discriminative network

is used to classify whether the distributions of the outputs from the generator

are close to those of the ground truth images or not. The discriminative network is used to classify whether the regenerated result is consistent with the observed image or not. All the networks are jointly trained in an end-to-end manner.

GAN [30]

have attracted favorable attention in machine learning community not only for its ability to learn the target probability distribution but also for its theoretically attractive aspects. Inspired by GAN and the priori knowledge of underwater image formulation model, we propose a novel framework to learn a end-to-end nonlinear mapping between the non-distorted image and the distorted image, which can use the fundamental constraint to guide the training of GAN and ensure that the enhanced results are physically correct and seem real. We use the synthetic underwater images and its corresponding ground-truth to train the network. The synthesized underwater images can cover a variety of underwater scenario styles through reasonable randomization of parameters, but the real underwater scenes are extremely complex. So we introduce a domain adaptive mechanism in our network to eliminate the domain gap between synthetic training images and real-world underwater images, which helps the network trained on synthetic datasets also effective enough for enhancing real underwater images. As the flowchart shown in Fig. 2, the proposed model contains three main components, a feedforward generator network

, a feedback controller , two discriminator networks and . The networks have four types of term, including adversarial loss, cycle consistency loss, pixel loss, and domain adaptation loss (coral loss).

We first describe our method of synthetizing underwater training images in Section III-A, we then propose a novel underwater image enhancement framework in Section III-B. Finally, we show the optimization objective we used in Section III-C.

Iii-a Synthetizing Underwater Images

To preserve the real color and content of the image, supervised methods are more suitable for underwater restoration. Unlike the high-level visual tasks [46, 47, 48] where large training datasets with labels are often available, lacking underwater image dataset with corresponding ground truth constrains the development of deep learning-based underwater image enhancement and quality evaluation. To solve this problem, we adopt an novel underwater image synthesis algorithm based on the underwater imaging physical model mentioned above and the observation of real underwater scenes. It was decided to simulate images based on the NYU dataset V2 [49]. It s relatively big, versatile and, most importantly, includes the ground truth depth information for each image, which is important for the method described later in this section. We modified this dataset to match the requirements of the abundant and various underwater scenes.

To convert images taken in the air into underwater styles, we apply Eqs. (1) and (2) to build three main types of underwater image datasets using the RGB-D NYU-v2 indoor dataset [49] which consists of 1449 images ( of Eq.(1)) and corresponding depth information ( of Eq.(2)). For the parameter setting of normalized residual energy () and homogeneous global background light information () in the underwater image formation model, we made various attempts based on [50] and [51], and compared the synthesis results with a large number of real underwater images of different scenes in terms of style and tone. Finally, we selected the three sets of parameter setting and randomization methods for and of red, green and blue channels for different water types, which is presented in Table 1 below. They cover different turbidity conditions from the clear waters in the offshore to the more turbid waters in the deep ocean, and the main tonal range of the underwater environment.

Through rational parameter setting and randomization, we synthesized a unified image training set contains different water types based on the physical imaging model in underwater scenario, which is more in line with the visual effects of multiple real-world underwater images, as shown in Fig.3. It is worth mentioning that, each of image pair in our synthetic dataset consists of four images, not only the synthetic underwater image and the its ground truth image taken in the air, but also the corresponding images containing background light information and the transmission map, which will be used in the feedback control module of our proposed enhancement framework. We selected 8900 pairs of images as our training set and all of them are resized to the canonical size of 256 256 pixels. To evaluate the effect of our proposed framework, we randomly select 80 images (keep the original size 480 640 ) from our synthetic image pairs, where the images have different shades and styles and are not used in the training stage. By the way, in addition to these synthetic data, we also compare with state-of-the-art methods using the commonly used real underwater images.

Fig. 3: The samples of synthesized underwater images from the NYU-v2 RGB-D dataset [49] using a sample image and its depth map with the attenuation coefficients and background light shown in Table.1. (a) is the original image in NYU-v2 RGB-D dataset [49], while (b) (c) and (d) are the samples of synthetic underwater images with different degrees of degradation and background lights.
Type Parameter Red Green Blue
(b) 0.79+0.06*rand() 0.92+0.06*rand() 0.94+0.05*rand()
0.05+0.15*rand() 0.60+0.30*rand() 0.70+0.29*rand()
(c) 0.71+0.04*rand() 0.82+0.06*rand() 0.80+0.07*rand()
0.05+0.15*rand() 0.60+0.30*rand() 0.70+0.29*rand()
(d) 0.67 0.73 0.67
0.15 0.80 0.70
TABLE I: Three sets of parameter settings and randomization methods for and of red, green and blue channels for different water types.

Iii-B Proposed Enhancement Framework

Our purpose is to obtain an effective and robust end-to-end underwater image enhancement framework for multiple water types. Through the network is trained through the synthetic underwater image training set mentioned above, it can be effectively used for the enhancement of real-world underwater images with different styles and scenarios, by which the color and content of subjects can be truly restored. The robustness of the network for multiple water types and the authenticity of the visual effects of the output results are the mainly remarkable characteristics of our method compared to other underwater image restoration and enhancement methods.

In order to make the network trained on synthetic dataset also effective enough for enhancing real underwater images, we introduce a domain adaptive mechanism to eliminate the domain gap between synthetic underwater images and real-world underwater images during the training procedure. Simultaneously, to give a physically correct result with more realistic visual effect for the ill-posed problem of underwater image enhancement, we add a feedback control module by the physics model constraint to the original end-to-end feedforward structure, which can guide the training of the generator. And to ensure the output of feedforward generator (i.e., ) is consistent with the input under the model (1), we introduce an additional discriminative network. The proposed framework is shown in Fig.2. It contains one feedforward generative network with the domain adaptive mechanism, one feedback fundamental constraint , and two symmetrical discriminative networks and .

To the best of our knowledge, this is the first time that the knowledge of domain adaptation has been used in the underwater image processing algorithm to eliminate the domain gap between the training set and the test sets with a variety of real-world underwater scenes. It is also the first attempt to introduce a physical model based feedback control system into this research area. Both of them are significant contributions for the development of underwater optical image processing.

Iii-B1 Feed-forward Generative Network

Our generator is an end-to-end feedforward network whose purpose is to convert the input low-level underwater image into a processed clear image as output. The impressive performance of the image-to-image translation method such as [44] encourages us to explore the similar generator structure. The generator G based on a forward CNN is an encoder-decoder structure [52]

, which is composed of residual blocks. It consists of three sections: the down-sampling feature extraction module, a feature-preserving reconstruction module and an up-sampling image reconstruction module. By means of a nine-residual-block stack, the downsample-upsample model learns the essence of the input scene, and a synthesized version will emerge at the original resolution after the de-convolution operations. The detailed network parameters are shown in Table.2.

Parameters of the generative network
Layers CINR CINR CINR ResBlock-ResBlock CTINR CTINR CINR
Filter size 7 3 3 3 3 3 3 7
Filter numbers 64 128 256      256      256 128 64 3
Stride 1 2 2 1 1 2 2 1
Parameters of the discriminative network
Layers CINLR CINLR CINLR CINLR CINLR
Filter size 4 4 4 4 4
Filter numbers 64 128 256 512 1
Stride 2 2 2 1 1
TABLE II:

Network parameters. ”CINR” denotes the convolutional layer with the instance normalization (IN) and ReLU; ”ResBlock” denotes the residual block which contains two convolutional layers with the IN and ReLU; ”CTINR” denotes the fractionally-strided convolutional layers with IN and ReLU; ”CINLR” denotes the convolutional layer with the IN and LeakyReLU.

As mentioned above, in order to make the network trained on synthetic datasets also effective enough for enhancing real underwater scenario, we introduce a domain adaptive mechanism to eliminate the domain gap between synthetic underwater images and real-world underwater images. During the training procedure, our generator take the synthetic underwater images and randomly selected real-world underwater images as input, while the output is only the clear images without water tune corresponding to the input synthetic underwater image . In the down-sampling module, features of unpaired synthetic underwater image and real-world underwater image are extracted, represented by and , respectively. In general, both and

are 3D tensors, whose size is

. For the sake of analysis, we consider the feature tensor as set of -dimensional local descriptors. Furthermore, local descriptors of synthetic underwater image and real-world underwater image are regard as the source and target domain, respectively. By the constrain adding to the end of down-sampling module which will explained in detail in Section III-C, the feature extraction process can be guided to aligns the second-order statistics of the source and target distributions. It will help the features of the synthetic training data obtained by down-sampling module similar to the feature representations of real underwater images, eliminating their inter-domain differences. Then, followed by the feature-preserving reconstruction module consist of nine residual blocks and the up-sampling image reconstruction module to generate output images from the features of the synthetic underwater images.

Iii-B2 Feed-back Control

We note that the CNN or GAN based methods with the observed data as the input has shown promising results in underwater image enhancement[23, 24, 25, 26, 27, 28, 29]. However, these methods does not guarantee whether the solutions satisfy the physics model (1) or not and thus fails to generate clear and real enough images as illustrated in Section I. In this work, we develop a new method to improve the estimation result of GAN under the guidance of the physics model (1).

The feedforward generative network mentioned above learns the mapping function and generates the intermediate enhanced image from the input . Then, we apply the physics model (1) to to get the regenerated underwater style image :

(3)

Where donate the transmission map and and is the background light which is only used in the training process. Note that the transmission map and the background light in Eq.(3) is known, which is also used to generate underwater style image from original clear image when synthesizing the training data. As mentioned in Section III-A, each of image pair in our dataset consists of four images, not only the synthetic underwater image and the its ground truth image taken in the air, but also the corresponding images and which containing background light and the transmission map information.

This physics model based module acts as the feedback controller of GAN based enhancement network, provides explicit constraints for this ill-posed problem, ensures that the estimated results should be consistent with the observed image and seem more realistic. We note that although the proposed framework is trained in an end-to-end manner, it is constrained by a physics model and thus is not fully blind in the training stage. With the learned generator , the test stage is blind, we can directly obtain the final results by applying it to the input real underwater images.

Iii-B3 Discriminators

Our discriminator is modeled as a PatchGAN [44, 52] , which discriminates at the level of image patches with fewer parameters than a full image discriminator and achieve state-of-the art results in many vision problems. As opposed to a regular discriminator, which outputs a scalar value corresponding to real or fake, our PatchGAN discriminator outputs a 32321 feature matrix, which provides a metric for high-level frequencies. The parameters of the discriminative network is shown in Table 2.

There are two discriminators with the same structure in our proposed framework. The discriminative network takes the ground truth and the intermediate enhanced images as the inputs and it is used to classify whether is clear or not. The other discriminative network takes the synthetic underwater image and the regenerated image as the inputs, which is used to classify whether the generated results satisfy the physical model or not.

Iii-C Optimization objective

The fundamental GAN algorithm learns a generative model via an adversarial process. It simultaneously trains a generator network and a discriminator network by optimizing:

(4)

where donates random noise, donates a real image, and denotes a discriminator network, denotes the generator network. In the training process of GAN, the generator generates samples (i.e., ) that can fool the discriminator, while the discriminator learns to distinguish the real data and the samples from the generator.

However, the contents of the generated images only based on this training loss may be different from the ground truth images. The supervise information provided by the discriminator and adversary loss along is not strong enough. To ensure that the contents of the generated results from the generative networks are sufficient close to those of the ground truth images and also consistent with those of the inputs under the underwater image physical formulation model (1), we use the norm regularized pixel-wise loss functions:

(5)

and

(6)

so

(7)

in the training stage. To make the generative network learning process more stable, we further use the loss function:

(8)

to regularize the generator G.

Finally, we propose a new objective function

(9)

to ensure that the output of GAN is consistent with the observed input under the underwater image formulation model (1).

Basing the former objective functions, our network is able to learn powerful representations from large quantities of synthetic underwater images with ground truth, however, it can not guarantee the model generalize well across changes in input distributions while testing the result of the network by the real underwater images, for the reason that the training and testing data are not independent and identically distributed (i.i.d.). So as mentioned in Section III-B, we introduce a domain adaptive mechanism into our framework to compensate for the degradation in performance due to domain shift.

In this work, we use a differentiable loss function named CORAL loss [53] by incorporate it directly into the down-sampling feature extraction module of our feedforward generative network, which can minimizes the difference in second-order statistics (covariances) between synthetic underwater image (source) and real-world underwater image (target) features:

(10)

where denotes the squared matrix Frobenius norm, and ( ) denote the feature covariance matrices of source domain and target domain, donates the number of channels for the features. The covariance matrices of the source and target data are given by:

(11)
(12)

where 1

is a column vector with all elements equal to 1.

and represent the number of local descriptors of source image (synthetic underwater image ) and target image (real-world underwater image ), respectively. and are the feature matrices of synthetic underwater image and real-world underwater image generated by the down-sampling feature extraction module.

By optimizing the coral loss embedded in the generator, the feature of synthetic underwater image training set and real-world underwater images can be mapped to a common feature subspace, eliminate their inter-domain differences.

Finally, the total optimization objective we proposed is the linear combination of the abovementioned losses with weights as follows:

(13)

where , and are weight parameters.

Fig. 4: Qualitatively comparison between our methods with contemporary approaches in terms of processing quality for samples from synthetic underwater image test set. (a) Raw synthetic underwater images. (b) Results of DCP [36]. (c) Results of UDCP [10]. (d) Results of Fusion [6]. (e) Results of RB [17]. (f) Results of IBLA [14]. (g) Results of UCL [15]. (h) Results of DehazeNet [22]. (i) Results of Cyclegan [44]. (j) Results of DUIENet [25]. (k) Results of our method. (l) Ground truth. The types of underwater images in the first column are with different degeneration degrees and background light. The results illustrate that our method removes the light absorption effects and recovers the original colors without any artifacts.

Iv Experiments

In this part, we perform qualitative and quantitative comparisons with the state-of-the-art underwater image enhancement methods on both synthetic and real-world underwater images. These compared methods include the DCP method [36], the underwater DCP method [10], the improved retinex-based (RB) method [17], the image blurriness and light absorption (IBLA) based method [14], the fusion enhance (Fusion) method [6], the color-line based underwater image restoration (UCL) method [15], Dehazenet [22], Cyclegan [44] and DUIENet [25], thanks to their representativeness in single image de-hazing, tradition underwater image restoration and enhancement, and deep learning based image style transfer or underwater image enhancement, respectively. We run the source codes provided by the authors with the recommended parameter settings to produce the best results for an objective evaluation.

To verify the performance of different methods, subjective and objective evaluations including quality metrics and user study are carried out. At last, we conduct an ablation study to demonstrate the effect of each component in our framework. To validation the generalization ability of our proposed method for different underwater scenarios, we collected a test set of 80 real-world underwater images acquired from [6, 37] and the internet, these images have obvious characteristics of underwater image quality degradation (e.g., color casts, decreased contrast, and blurring details) and are taken in a diversity of underwater scenes. Some of the testing images are shown in Fig.1. And the original underwater images presented in this paper are extracted from this collected dataset.

Iv-a Training Details

In our training process, The training data set consists of 8900 pairs of images with a resolution of 256256. We train the models using the Adam optimizer [54] with an initial learning rate 0.0002, to 0.50, to 0.999. We set the batch size to be 1. After obtaining generator , as we know the paired training data and the corresponding physics model parameters(the transmission map and the background light ) that are used to synthesize from , we apply the same physics model parameters to and generate . Then the discriminator takes the ground truth and as the input while the other discriminative network takes and

as the input. We implemented our network with the PyTorch framework and trained it using 64 GB memory and 11 GeForce Nvidia GTX 1080 Ti GPU for 60 epochs.

Fig. 5: Qualitatively comparison between our methods with contemporary approaches in terms of processing quality on real-world underwater images. (a) Raw synthetic underwater images. (b) Results of DCP [36]. (c) Results of UDCP [10]. (d) Results of Fusion [6]. (e) Results of RB [17]. (f) Results of IBLA [14]. (g) Results of UCL [15]. (h) Results of DehazeNet [22]. (i) Results of Cyclegan [44]. (j) Results of DUIENet [25]. (k) Results of our method. The underwater images in the first row are with different degeneration degrees and background light. The results illustrate that our method produces the results without any visual artifacts, color deviations, and over-saturations. It also unveils spatial motifs and details(Best viewed on high-resolution display with zoom-in.)

Iv-B Evaluation on Synthetic Underwater Image

Method MSE PSNR SSIM
Original 3864.3577 12.6176 0.7524
DCP 4604.2207 11.7758 0.7011
UDCP 5788.5313 10.9260 0.6553
Fusion 2528.0611 14.5415 0.8211
RB 1217.8306 17.6973 0.8521
IBLA 2291.2073 15.3116 0.7941
UCL 2035.3419 15.9243 0.8089
DehazeNet 4192.5230 12.2191 0.7355
Cyclegan 506.4429 21.7324 0.8707
DUIENet 2700.4965 14.1571 0.8254
Ours 90.2791 28.8487 0.9532
TABLE III: Quantitative evaluations on test set. As seen, our method achieves the best scores in all metrics.

We first evaluate the result of underwater image enhancement of the proposed method using the synthesized underwater images as mentioned in Section III-A with nondeep and deep learning compared methods. The test dataset contains 80 synthetic images which includes multiple degradation degrees corresponding with their ground truth. Some of the subjective results of different methods are shown in Fig.4. As we can see, the DCP [36], UDCP [10] and DehazeNet [22] nearly fail to correct the underwater color meanwhile Fusion [6], RB [17], IBLA [14], UCL [15], and DUIENet [25] introduce artificial color or color deviations obviously, although these methods eliminate the underwater color and degradation effect to a certain extent, they still retain the obvious underwater color style and haze blur effect. On the other hand, our method not only enhances the visibility of the images but also restores an aesthetically pleasing texture and vibrant yet genuine colors. In comparison to other methods, the visual quality of our results nearly the same as the ground-truth.

Furthermore, we quantify the accuracy of the recovered images on the synthetic test set including 80 samples for different degradation degree. In Table 3, the accuracy is measured by three different metrics: mean square error (MSE), peak signal to noise ratio (PSNR), and the structural similarity index metric (SSIM)

[55]. The quantitative results are obtained by comparing the results of each method with the corresponding ground truth image. In the case of MSE and PSNR metrics, the lower MSE (higher PSNR) denotes the result is closer to the ground truth in terms of image content. In the case of the SSIM metric, the higher SSIM scores mean the result is more similar to the ground truth in terms of image structure and texture. Here, the presented results are the average scores. The values in bold represent the best results.

Table 3 shows that the proposed method achieves the best performance in terms of all full-reference image quality assessment metrics against state-of-the-art methods, demonstrating its effectiveness and robustness. As we can see, regarding all the three metrics, our method is significantly better than the compared methods with absolute superiority.

Iv-C Evaluation on Real-world Underwater Image

Method UISM UICM UIConM UIQM
Original 3.7823 0.8239 0.5607 3.1449
DCP 4.0432 1.5436 0.7230 3.8224
UDCP 3.7733 2.0890 0.7712 3.9303
Fusion 5.1587 3.7882 0.8005 4.4923
RB 4.8236 3.2928 0.7728 4.2801
IBLA 3.9638 2.9803 0.5809 3.3314
UCL 3.8997 4.3608 0.7272 3.8748
DehazeNet 4.0402 0.7002 0.6164 3.4167
Cyclegan 6.1765 2.1636 0.7036 4.4006
DUIENet 4.5795 3.0063 0.6844 3.8842
Ours 6.2318 1.6609 0.8264 4.8418
TABLE IV: Underwater image quality evaluation of different processing methods on real-world underwater images. The best result is in bold.
Method Original DCP UDCP Fusion RB IBLA UCL DehazeNet Cyclegan DUIENet Ours
Scores 4.0 3.6 3.9 5.7 6.2 5.0 4.8 4.2 5.6 5.9 8.0
TABLE V: User study on real-world underwater image dataset. The best result is in bold.

In this part, we evaluate the proposed method on real-world underwater images. The subjective evaluation with competitive methods, namely visual comparisons, is presented in Fig. 5. As we can see, the original real-world underwater images suffer from poor visibility, and all methods have some effects in improving the image visibility. But although the haze in the raw underwater images are removed by DCP, UDCP, IBLA and UCL, the visibility, color, and details are not good enough, their results still retain a distinct underwater color style with the greenish and bluish tone. The RB and Fusion methods, as typical image enhancement method does not take the underwater imaging formation model into consideration, the results have shown the lack of edges and details information when zoomed in, and these methods make images color distortion and noise, the color of the image does not conform the laws of nature.

For deep neural networks methods, the results of DehazeNet do not have a good vision, it cannot change the color and content of the turbid underwater images. This network was developed and trained for removing haze from the images taken in air, which is probably the reason for the very poor performance on the underwater images. The CycleGAN used here was retrained and fine-tuned by the synthetic underwater image training set, which is same with our method, so it works well in changing the overall scenario styles of underwater images. However, it makes the color of generated images changed, and the content and structure of turbid underwater images are slightly distorted. As a recently proposed baseline model for underwater image enhancement, DUIENet removes the haze on the underwater images and remits color casts quite effectively, but some of the results still retain a distinct underwater style, especially for inputs with a larger depth of field, which may affect the results of high level tasks. In contrast, our method shows promising results on all of the real-world images. The greenish and bluish tone is totally removed as if our results were taken on the ground, without introducing any artificial colors, color casts, over- or under-enhanced areas, which matches the nature underwater scenes. At the same time, it is obvious that the proposed algorithm can remove haze effect well, enhance the detailed information as clean as possible.

In order to make our method more convincing, we choose underwater image quality measure (UIQM) [56] which is the non-referenced metric to evaluate the underwater image enhancement methods and restoration ones. This metric has three underwater image attribute measures: the underwater image colorfulness measure (UICM), the underwater image sharpness measure (UISM), and the underwater image contrast measure (UIConM). Each attribute is used to assess one aspect of the underwater image degradation. Therefore, the UIQM is given as follows:

(14)

where the colorfulness, sharpness, and contrast measures are linearly combined together. And the three parameters are ; , and . Their values are set to 0.0282, 0.2953, and 3.5753 according to the paper [56]. Therefore, we believe that this metric can provide a comprehensive assessment of the effectiveness of various methods. Table 4 lists the average values obtained by different methods on out test set which contains 80 real-world underwater images. The best results of the final index (UIQM) is marked in bold. It can be seen that the UIQM of the proposed method is remarkably larger than the other methods.

For a more objective assessment, we conduct a user study to provide realistic feedback and quantify subjective visual quality. We randomly selected 30 real-world underwater images from our collected test set, which covers a diversity of underwater scenes, different characteristics of quality degradation, and a broad range of image content. We show samples from this dataset in Fig.1. And some corresponding results have been presented in Fig. 5. The results of different methods were randomly displayed on the screen and compared with the corresponding raw underwater images. After that, we invited 20 participants who had experience with image processing to score results. There was no time limitation for each participant. Moreover, the participants did not know which results were produced by our method. The scores ranged from 1 (worst) to 10 (best). As baseline, we set the scores of raw underwater images to 4.0, while one expects that the results with high contrast, good visibility, natural color, and authentic texture should receive higher ranks while the results with over-enhancement/exposure, under-enhancement/exposure, color casts, and artifacts should have lower ranks. The average subjective scores are given in Table 5. As we can see, our method receives the highest rankings, which indicates that our method can generates visually pleasing results.

Iv-D Ablation Study

To demonstrate the effect of each component in our framework, we carry out an ablation study involving the following experiments:

(i) Our method removes domain adaptation operation(-DA);

(ii) Our method removes physical model based feedback control system(-PF);

(iii) Our method removes pixel-wise losses(-PL).

UISM UICM UIConM UIQM
-PL 6.5320 1.5542 0.7235 4.5593
-PF 6.6536 1.1653 0.8093 4.8911
-DA 6.6841 1.6401 0.8397 5.0222
Ours 6.7606 1.4928 0.8528 5.0874
TABLE VI: Underwater image quality evaluation of different variants of the proposed method. The best result is in bold.

We carry out the test on 50 real-world underwater images by quantitative evaluation. The average scores in terms of underwater image quality measure (UIQM) metric is reported in Table 6, and the best result is marked in bold. We notice that both physical model based feedback controller and domain adaptation mechanism could improve the final results of UIQM for the enhanced images.

V Conclusion

In this paper, we presented an underwater image enhancement network inspired by underwater scene prior. Firstly, a new method for simulating underwater-like images which is more suitable for underwater scene has been proposed. Base on this, an adversarial learning architecture with domain adaptative mechanism and physical model constraint feedback control introduced in was trained to enhance the underwater images. Finally, numerous experiments are performed to demonstrate the superiority of the proposed method both on synthetic and real underwater images. In addition, the experimental results of ablation study also demonstrate that the physical model based feedback control and domain adaptation mechanism we proposed boost the performance quantitatively and qualitatively. Furthermore, our method can be used as a guide for subsequent research of the learning-based underwater image processing and similar low-level tasks such as image dehazing and super-resolution reconstruction.

References

  • [1] G. L. Foresti, “Visual inspection of sea bottom structures by an autonomous underwater vehicle,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 31, no. 5, pp. 691–705, 2001.
  • [2] A. Ortiz, M. Simó, and G. Oliver, “A vision system for an underwater cable tracker,” Machine vision and applications, vol. 13, no. 3, pp. 129–140, 2002.
  • [3] C. H. Mazel, “In situ measurement of reflectance and fluorescence spectra to support hyperspectral remote sensing and marine biology research,” in OCEANS 2006.   IEEE, 2006, pp. 1–4.
  • [4] Y. Kahanov and J. G. Royal, “Analysis of hull remains of the dor d vessel, tantura lagoon, israel,” The International journal of nautical archaeology, vol. 30, no. 2, pp. 257–265, 2001.
  • [5] B. A. Levedahl and L. Silverberg, “Control of underwater vehicles in full unsteady flow,” IEEE Journal of Oceanic Engineering, vol. 34, no. 4, pp. 656–668, 2009.
  • [6] C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwater images and videos by fusion,” in

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    .   IEEE, 2012, pp. 81–88.
  • [7] A. S. A. Ghani and N. A. M. Isa, “Underwater image quality enhancement through integrated color model with rayleigh distribution,” Applied soft computing, vol. 27, pp. 219–230, 2015.
  • [8] C. Fabbri, M. J. Islam, and J. Sattar, “Enhancing underwater imagery using generative adversarial networks,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 7159–7165.
  • [9] M. Shortis and E. H. D. Abdo, “A review of underwater stereo-image measurement for marine biology and ecology applications,” in Oceanography and marine biology.   CRC Press, 2016, pp. 269–304.
  • [10] P. L. Drews, E. R. Nascimento, S. S. Botelho, and M. F. M. Campos, “Underwater depth estimation and image restoration based on single images,” IEEE computer graphics and applications, vol. 36, no. 2, pp. 24–35, 2016.
  • [11] A. Galdran, D. Pardo, A. Picón, and A. Alvarez-Gila, “Automatic red-channel underwater image restoration,” Journal of Visual Communication and Image Representation, vol. 26, pp. 132–145, 2015.
  • [12] N. Wang, H. Zheng, and B. Zheng, “Underwater image restoration via maximum attenuation identification,” IEEE Access, vol. 5, pp. 18 941–18 952, 2017.
  • [13] Y. Wang, H. Liu, and L.-P. Chau, “Single underwater image restoration using adaptive attenuation-curve prior,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 3, pp. 992–1002, 2017.
  • [14] Y.-T. Peng and P. C. Cosman, “Underwater image restoration based on image blurriness and light absorption,” IEEE transactions on image processing, vol. 26, no. 4, pp. 1579–1594, 2017.
  • [15] Y. Zhou, Q. Wu, K. Yan, L. Feng, and W. Xiang, “Underwater image restoration using color-line model,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 3, pp. 907–911, 2018.
  • [16] A. S. A. Ghani and N. A. M. Isa, “Underwater image quality enhancement through integrated color model with rayleigh distribution,” Applied soft computing, vol. 27, pp. 219–230, 2015.
  • [17] X. Fu, P. Zhuang, Y. Huang, Y. Liao, X.-P. Zhang, and X. Ding, “A retinex-based enhancing approach for single underwater image,” in 2014 IEEE International Conference on Image Processing (ICIP).   IEEE, 2014, pp. 4572–4576.
  • [18] C.-Y. Li, J.-C. Guo, R.-M. Cong, Y.-W. Pang, and B. Wang, “Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior,” IEEE Transactions on Image Processing, vol. 25, no. 12, pp. 5664–5677, 2016.
  • [19] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
  • [20] H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” IEEE transactions on circuits and systems for video technology, 2019.
  • [21] C. Dong, Y. Deng, C. Change Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 576–584.
  • [22] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5187–5198, 2016.
  • [23] J. Li, K. A. Skinner, R. M. Eustice, and M. Johnson-Roberson, “Watergan: Unsupervised generative network to enable real-time color correction of monocular underwater images,” IEEE Robotics and Automation letters, vol. 3, no. 1, pp. 387–394, 2017.
  • [24] C. Fabbri, M. J. Islam, and J. Sattar, “Enhancing underwater imagery using generative adversarial networks,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 7159–7165.
  • [25] C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,” IEEE transactions on image processing, no. 11, pp. 1–1, 2019.
  • [26] X. Yu, Y. Qu, and M. Hong, “Underwater-gan: Underwater image restoration via conditional generative adversarial network,” in International Conference on Pattern Recognition.   Springer, 2018, pp. 66–75.
  • [27] Y.-S. Shin, Y. Cho, G. Pandey, and A. Kim, “Estimation of ambient light and transmission map with common convolutional architecture,” in OCEANS 2016 MTS/IEEE Monterey.   IEEE, 2016, pp. 1–7.
  • [28] C. Li, J. Guo, and C. Guo, “Emerging from water: Underwater image color correction based on weakly supervised color transfer,” IEEE Signal processing letters, vol. 25, no. 3, pp. 323–327, 2018.
  • [29] Y. Guo, H. Li, and P. Zhuang, “Underwater image enhancement using a multiscale dense generative adversarial network,” IEEE Journal of Oceanic Engineering, 2019.
  • [30] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
  • [31] J. S. Jaffe, “Computer modeling and the design of optimal underwater imaging systems,” IEEE Journal of Oceanic Engineering, vol. 15, no. 2, pp. 101–111, 1990.
  • [32] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” in Domain Adaptation in Computer Vision Applications.   Springer, 2017, pp. 189–209.
  • [33] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4068–4076.
  • [34] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” arXiv preprint arXiv:1502.02791, 2015.
  • [35] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain separation networks,” in Advances in neural information processing systems, 2016, pp. 343–351.
  • [36] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 12, pp. 2341–2353, 2010.
  • [37] J. Y. Chiang and Y.-C. Chen, “Underwater image enhancement by wavelength compensation and dehazing,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1756–1769, 2011.
  • [38] R. Fattal, “Dehazing using color-lines,” ACM transactions on graphics (TOG), vol. 34, no. 1, p. 13, 2014.
  • [39] K. Iqbal, R. A. Salam, A. Osman, and A. Z. Talib, “Underwater image enhancement using an integrated colour model.” IAENG International Journal of Computer Science, vol. 34, no. 2, 2007.
  • [40] K. Iqbal, M. Odetayo, A. James, R. A. Salam, and A. Z. H. Talib, “Enhancing the low quality images using unsupervised colour correction method,” in 2010 IEEE International Conference on Systems, Man and Cybernetics.   IEEE, 2010, pp. 1703–1709.
  • [41] C. Li and J. Guo, “Underwater image enhancement by dehazing and color correction,” Journal of Electronic Imaging, vol. 24, no. 3, p. 033023, 2015.
  • [42] G. Bianco, M. Muzzupappa, F. Bruno, R. Garcia, and L. Neumann, “A new color correction method for underwater imaging,” The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 40, no. 5, p. 25, 2015.
  • [43] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
  • [44] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
  • [45]

    P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
  • [46] W. Wang, J. Shen, F. Guo, M.-M. Cheng, and A. Borji, “Revisiting video saliency: A large-scale benchmark and a new model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4894–4903.
  • [47] J. Y. Yuan Zhou, T. C. Hongru Li, and S.-Y. Kun, “Adversarial learning for multiscale crowd counting under complex scenes,” IEEE Transactions on Cybernetics, no. 1, pp. 1–10, 2020.
  • [48] X. Du, Y. Zhou, Y. Chen, Y. Zhang, J. Yang, and D. Jin, “Dense-connected residual network for video super-resolution,” in 2019 IEEE International Conference on Multimedia and Expo (ICME).   IEEE, 2019, pp. 592–597.
  • [49] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conference on Computer Vision.   Springer, 2012, pp. 746–760.
  • [50] D. Berman, T. Treibitz, and S. Avidan, “Diving into haze-lines: Color restoration of underwater images,” in Proc. British Machine Vision Conference (BMVC), vol. 1, no. 2, 2017.
  • [51] C. Li, S. Anwar, and F. Porikli, “Underwater scene prior inspired deep underwater image and video enhancement,” Pattern Recognition, vol. 98, p. 107038, 2020.
  • [52] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
  • [53] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in European Conference on Computer Vision.   Springer, 2016, pp. 443–450.
  • [54] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference for Learning Representations, 2015, pp. 1–1.
  • [55] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli et al., “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [56] K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired underwater image quality measures,” IEEE Journal of Oceanic Engineering, vol. 41, no. 3, pp. 541–551, 2015.