1 Introduction
Underwater image enhancement (UIE) aims to generate the clean version from a distorted underwater image, which is essential for many underwater applications such as biology, archaeology, underwater robotics and infrastructure inspection, etc. As shown in Fig.1, the major problems in underwater images are low contrast, color casts and blurry details due to wavelength-dependent light absorption and scattering. When travelling through water, the lights with different wavelengths have different levels of attenuation ratio, leading to various degrees of color casts. Underwater images often appear to have bluish or greenish color tone since the red light having longer wavelength is absorbed more than the green and blue one. In addition, small particles in the water absorb most of light energy and change the direction of light before the light reaches the camera, resulting in low contrast and haze-like effects. In the past few years, many excellent algorithms have been proposed to address these problems.
Generally, UIE methods can be roughly divided into three categories: model-free, model-based and data-driven methods. Among them, model-free methods [1, 2, 3, 4, 5, 6, 7, 8] mainly address the pixels to enhance underwater images. These methods can improve contrast, brightness and saturation. However, they ignore the physical degradation process of underwater images and thus the visual quality improvement of enhanced results is limited. Differently, model-based methods [9, 10, 11, 12, 13, 14, 15]
usually estimate key parameters of physical models by various hand-crafted priors to restore clean images, which perform well in some cases. However, these methods tend to make inaccurate parameters estimation and produce unsatisfactory results in some complex scenarios since their priors often fail. Recently, data-driven methods
[16, 17, 18, 19, 20] have been employed to estimate physical parameters or directly predict clear images. Usually, these approaches rely on a large number of real underwater images and their clean counterparts for training. However, it is impractical to acquire large quantities of the corresponding ground-truths in the underwater scenarios. Consequently, most data-driven methods heavily depend on synthetic datasets based on underwater image formation models to train their models. Although these methods have shown remarkable improvements, they still face two major challenges as follows.Firstly, existing underwater image synthesis models have an inherent limitation in synthesizing training samples, which cannot adequately express characteristics of real underwater images. They usually randomly generate the homogeneous global ambient light in the data synthesis process of various water types. This operation introduces significant errors since the intensity values of the ambient light in RGB channels are not independent and influenced by many dependencies, such as water types and surface-object depth, etc. In this paper, we establish a new underwater synthetic dataset using a revised ambient light synthesis equation that takes these dependencies into account. The revised equation clearly defines the complex mathematical relationship among intensity values of the ambient light in RGB channels, absorption coefficient, scattering coefficient and surface-object depth, which is more accurate in simulating the color, contrast and blurriness appearance of real underwater scenes.
Secondly, most deep models designed with engineering experiences are highly dependent on the scale and quality of training data and thus their performances cannot be always guaranteed. There are lots of unique priors can be used in the underwater field to help networks learn more discriminative features, recover better images and improve the generalization of the model [21]. Two examples are shown in Fig.2, including the distorted images and three corresponding priors (i.e., color tone, water-type and structure prior). Color tones of underwater images can reflect the distortion information associated with color to some extent, which is beneficial for color correction. Similarly, the structural information can provide some high-frequency local features of edges, which guides networks to better restore texture and geometry. In addition, by incorporating the water-type prior embedded with water body information into the network, the learned model can achieve better robustness to different water body conditions.
Based on the above analysis, a novel analysis-synthesis framework is proposed for UIE tasks, called ANA-SYN, which effectively enhances underwater images under the combination of priors (underwater domain knowledge) and data information (underwater distortion distribution). The proposed framework consists of two parts, namely an analysis network and a synthesis network. Among them, analysis network is employed to explore some underwater image priors (e.g., color tone prior, water-type prior, structure prior, etc). However, for images with different degrees of distortion, the importance of each prior is not constant, which varies with underwater scenes. As shown in Fig.3, underwater images in the first row are mainly affected by color distortions, while images in the second row are with blur-dominated distortions. For the former, the color tone prior is more important and for the latter, the structure prior is more significant. Therefore, a novel adaptive weighting module is embedded to dynamically learn weight maps of each prior and recalibrate them into more accurate weighted priors. Synthesis network aggregates priors and data information to perform image enhancement, in which a new prior guidance module is introduced to adequately fuse prior and data features by a pair of learnable prior modulation parameters. As shown in Fig.1, our ANA-SYN can effectively handle real underwater images with different types of quality degradation and produces much better results.
The main contributions of this paper can be summarized as follows:
-
We build a new underwater synthetic dataset based on a revised ambient light synthesis equation. To the best of our knowledge, it is the first dataset that takes the relationship among the ambient light intensity values, water-type and surface-object depth into account during the data synthesis process, and the synthesized images are more realistic and natural.
-
We design a new framework, named ANA-SYN, which effectively integrates underwater priors and data information for solving underwater image enhancement tasks, and performs well in real-world scenarios.
-
We propose a novel adaptive weighting module in the analysis network, which can adaptively assign different weights to various priors based on the their importance to the input image, and recalibrate them into more accurate weighted priors.
-
A prior guidance module is introduced in the synthesis network, which effectively fuses prior and data features by learning a pair of modulation parameters, guiding the meaningful interaction of priors and data information.
2 Related work
2.1 Underwater Image Synthetic Datasets
One challenge faced with UIE tasks is lack of real datasets containing their corresponding ground-truths. Li et al. [18] construct a real underwater image enhancement benchmark dataset, including 890 images and their respective references (not the actual ground truth). Liu et al. [22] build a real underwater dataset which consists of three subsets targeting at three challenging aspects, i.e, visibility degradation, color casts and higher-level tasks, respectively. Although these images are real and reliable, they usually have limited numbers, content and types of quality distortion. More importantly, they do not provide the corresponding ground-truth images. In general, it is impossible to obtain the corresponding ground-truth with respect to a real underwater image. Thus, most existing deep models resort to training on some synthetic datasets.
Underwater image synthesis is usually solved from two different perspectives: based on Generative Adversarial Networks (GAN)
[23, 24] and physical models [25, 26]. Li et al. [23] propose a deep model, named WaterGAN, to generate underwater-like images from in-air images and depth maps in an unsupervised manner. Similarly, Fabbri et al. [24] employ CycleGAN to generate distorted images from clean images based on weakly supervised distribution transfer. The GAN-based synthesis methods show a great advantage of generating underwater-like images in an unsupervised pipeline, enabling to transfer the distribution of images instead of all colors, contrast and blur details on the surface. However, these methods heavily depend on training samples, which are easy to produce unrealistic and unnatural artifacts.Recently, several underwater image synthesis methods based on physical models are proposed. Li et al. [25] establish an underwater synthetic image database via a simplified physical model, including ten subsets for different water types. Ding et al. [26] modify the work of [25] by introducing the color shift and object blurriness effect, producing relatively good synthetic data. Nevertheless, the homogeneous ambient light is usually randomly generated in the process of synthesis, which makes a large difference between the synthetic and real underwater images. In this paper, the intensity value relationship of the ambient light in RGB channels at one depth point in the water is defined by a revised ambient light synthesis equation. By embedding the revised equation into the synthesis process, the synthesized data is closer to real underwater data.
2.2 Underwater Image Enhancement Methods
Existing underwater image enhancement algorithms can be roughly classified into three types: model-free
[1, 2, 3, 4, 5, 6, 7, 8], model-based [9, 10, 11, 12, 13, 14, 15] and data-driven methods [16, 17, 18, 19, 20].Model-free methods mainly change image pixel values to improve visual quality of underwater images, such as pixels stretching and adjustment [1, 2, 3, 8], retinex decomposition [4, 5] and image fusion [6, 7]. For example, Ancuti et al. [6] propose an underwater image enhancement method, in which a contrast-enhanced image and a color-corrected image are fused by a multi-scale mechanism to generate an enhanced image with better global contrast and detail information. Based on underwater optical imaging theory, Ancuti et al. [7] modify the color balance method to effectively fuse images, leading to more natural results. Recently, Ancuti et al. [8] design a color channel compensation pre-processing method based on the observation that the information contained in at least one color channel is close to completely lost under underwater, which can effectively remove color artifacts. Model-free methods can improve the contrast and saturation of underwater images to some extent. However, they only depend on human visual brightness and color perception, while ignoring the complexity of underwater scenarios. It is difficult for them to achieve promising results in images with complex underwater degraded environments and lighting conditions.
Model-based methods usually deduce some key parameters of the physical model via some hand-crafted priors and then recover clean images by inverting the degradation process. The priors include underwater dark channel prior [9], attenuation curve prior [10], blurriness prior [11] and minimum information prior [12], etc. For example, Peng et al. [11] propose a method to better estimate depth maps for underwater scenes based on the intrinsic characteristics of underwater image blurriness and light absorption. Li et al. [12] integrate the minimum information loss and histogram distribution prior for depth estimation to effectively recover underwater images. Recently, Akkaynak et al. take many underwater dependencies into account and present a revised underwater image formation [13, 14], which is more physically accurate. From this model, they propose a color correction algorithm [15] using underwater RGB-D images to restore images. Although these methods perform reality relatively in some cases, they heavily depend on hand-crafted priors. Thus, they tend to fail when hand-crafted priors are not valid on specific images.
With the advance of deep learning and large-scale synthetic datasets, data-driven methods have received significant attention in recent years. Current methods either design end-to-end modules, or utilize deep models directly to estimate physical parameters, and then restore the clean image based on the degradation model. To alleviate the need for paired training data, Li
et al. [16]develop an underwater weakly supervised learning method to enhance images. Guo
et al. [17] design a multi-scale dense GAN which combines GAN loss with L1 and gradient loss to better learn the distribution of features for robust image enhancement. Recently, Li et al. [18] employ three weights learned by a gate network to fuse three pre-processing versions of a distorted image for a better output. Furthermore, a new cross-color-spaces and cross-channel method is proposed in [19], which realizes the disentanglement of color and haze effects to avoid unsatisfactory colors and blurring in some areas.Although data-driven methods have made significant progresses, current network architectures ignore lots of available underwater priors and are tightly dependent on training data. The generalization still falls behind some conventional methods and their performances are limited in the real world. Rich underwater priors can provide a proper guidance to the network and make the trained model more robust [21]. In this paper, we effectively aggregate both underwater prior and data information to better solve UIE tasks.
3 Proposed Underwater Dataset
In this section, we first analyze the limitations of existing underwater synthesis models, and propose a revised ambient light synthesis equation. Subsequently, the construction details of our new dataset are introduced.
3.1 Limitations of existing underwater synthesis models
In many studied models, the most widely one used to describe the degradation process of underwater images is Jaffe-McGlamery model [27]. Mathematically, it is modeled as:
(1) |
where denotes a point in the underwater scene, is the total signal received by the camera, represents the clean image at point , denotes color channels, is the object–camera distance, is the homogeneous ambient light and is the sum of absorption coefficient and scattering coefficient [28], i.e., . Li et al. [25] synthesize the first underwater-like dataset based on Eq.1. Ding et al. [26] introduce the color component to improve the work of [25]. is redefined as the scene irradiance at point , which is closer to the characteristic of light propagation in the water.
When light propagates the surface-object distance and reaches the underwater scene point , the energy attenuation can be written as,
(2) |
where is the initial energy at point and is the distance from the surface to the object, i.e. surface-object distance. Therefore, an underwater image formation model which takes account of color shift can be modeled as:
(3) | ||||
Ding et al. [26] establish a new underwater-like dataset based on Eq.3 by multiple parameters including absorption coefficient , scattering coefficient , object–camera distance , surface-object distance and ambient light . They deem that low visibility and color casts are key degradation parameters for image quality and ignore the effect of ambient light. Existing synthesis models usually randomly generate the homogeneous ambient light in the data synthesis process and ignore many important dependencies.
To illustrate the drawback of existing synthesis models, ten different types of real underwater images and some synthetic images based on various synthesis models are shown Fig.4. By comparing the 1st, 2nd and 3rd rows of Fig.4, it can be clearly observed that the synthesized images based on [25] and [26] cannot simulate the characteristics of real underwater images well, especially in color casts. In our opinion, this is mainly caused by incorrect ambient light settings. Although the ambient light values of RGB channels at point x in the underwater scene are constants, they are related to many dependencies including water-type, attenuation coefficients and surface-object distance [13, 14, 15] since the light can be gradually absorbed with the distance from surface to object and the light absorption rate changes with the water types. Therefore, exploring more potential relationships among the ambient light intensity values in RGB channels and many dependencies is important to synthesize more accurate underwater-like data.
3.2 Proposed Revised Ambient light Synthesis Equation
In this section, a revised ambient light synthesis equation is developed. Assuming that there is a small water disk with thickness , radiation scattered by this small disk from all other directions can be written as [29]:
(4) |
where is the ambient light value at the surface-object depth , is the scene object and is the scattering coefficient. Based on Beer’s Law of exponential decay, the received radiance [30] at a distance is
(5) |
By substituting Eq.4 into Eq.5 and integrating from to , the ambient light can be rewritten as
(6) |
when , the ambient light value at one point can be calculated [13, 14, 15],
(7) | ||||
where is the ambient light at the sea surface.
It can be observed that depends on the scattering and attenuation coefficients and , range and ambient light value at the sea surface . Thus, the intensity values of ambient light in RGB channels at one point in the underwater scene, (, , ) are,
(8) |
(9) |
(10) |
The energy corresponding to red, green, blue channels above the water surface shall be the same, i.e., [27]. The intensity ratios of ambient light values in RGB channels are,
(11) |
Similarly,
(12) |
Thus, the intensity values of the ambient light in RGB channels can be written as,
(13) |
In the data synthesis process, the ambient light value of one channel is set based on Eq.13
, and then the ambient light value of the other two channels can be directly calculated by the revised relationship. Compared with existing synthetic models that randomly generate the ambient light, our revised equation dynamically considers many dependencies and thus reduces distances between synthetic and real images, which makes the synthesized data better reflect the characteristics of real underwater images. During synthesis, the ambient light value in the green channel is empirically set as a random variable according to application needs, i.e.,
[25, 26].3.3 Dataset Construction Details
A new underwater dataset is synthesized using NYU-depth2 RTTS dataset111https://sites.google.com/view/reside-dehaze-datasets/reside-v0 based on Eq.3 and our proposed revised ambient light synthesis equation. The NYU-depth2 RTTS dataset includes 4322 clean indoor images and the corresponding scene depth maps. 1000 clean images are randomly chosen as ground-truths and the scene depths vary from 0.25m to 20m as object-camera distances, i.e. . Besides, the surface-object distance is set from 0m to 5m, i.e. . The absorption coefficient and scattering coefficient in the wavelength red (650nm), green (525nm) and blue (450nm) channels for one water type can be derived from [28].
Based on the description above, a new underwater synthetic dataset containing ten subsets (i.e., Type I, II, III, IA, IB, 1C, 3C, 5C, 7C and 9C) is built. Some synthetic samples with different synthesis models are shown in Fig.4, where it is obvious that the appearance and features of our proposed dataset are closer to real underwater images compared to the synthesized datasets based on [25] and [26]. In addition, we verify the effectiveness of the proposed dataset in Section 5.3 and more results are given in the supplementary material.
4 Proposed ANA-SYN Framework
In this paper, we aim to integrate rich underwater priors and data information to solve UIE tasks. A unified analysis-synthesis framework is designed, termed ANA-SYN, which contains two modules: an analysis network for priors exploitation and a synthesis network for priors integration. The overview of our ANA-SYN is shown in Fig.5. Specifically, analysis network first adopts a pre-trained prior module (Prior-CNNs) to extract various underwater priors. Then, a novel adaptive weighting module is developed, which dynamically learns a set of prior-specific weights to recalibrate priors according to their usefulness for input image. Synthesis network takes the original image and weighted priors as inputs to perform image enhancement, in which a prior encoder and a data encoder-decoder are employed to extract prior features and data features, respectively. A new prior guidance module is exploited to effectively fuse prior and data features on each layer. More details are presented in the following subsections.
4.1 Analysis Network
Analysis network aims at obtaining and recalibrating some underwater priors, as shown in Fig.6. The network includes two parts: a pre-trained prior module (Prior-CNNs) for extracting various priors, followed by an adaptive weighting module for recalibrating them based on their importance for the input image. As described in Section 3.3
, the synthesized dataset has some different underwater parameter pairs including color tones, water types, etc. Convolutional neural networks trained on these parameter pairs are employed to extract various priors from the input image, called Prior-CNNs. It is worth noting that these pre-trained networks can be designed based on some commonly used network structures (e.g., resnet-block and densenet-block, etc) and the parameters of pre-trained Prior-CNNs are kept frozen in the analysis network. More detailed architectures of priors extraction networks can be found in the supplementary material. In this paper, the color tone prior, water-type prior and structure prior are extracted to validate the applicability of our ANA-SYN framework (some examples of the extracted priors are shown in Fig.
2).The detailed structure of our proposed adaptive weighting module is shown in Fig.6, which is achieved by a SENet architecture [31]. Denote the input image and each prior as and , respectively. We first process them by a separate 1x1 convolution with 16 channels to ensure that the number of each input channel keeps the same, avoiding unbalanced learning of the adaptive weighting module, i.e, . The module then applies the global average pooling layer (GAP) at input and each prior maps to squeeze their spatial information. Then, the feature maps of the input image and each prior are concatenated together, obtaining the aggregated information , i.e, , where , is the number of priors. The module adopts two fully connected (FC) layers to process the features
with a Rectified Linear Unit non-linear (ReLU) function after each layer. The last layer is a sigmoid function, which predicts a set of weight maps
. The weight of each prior is a one-dimensional parameter, , in which B denotes the batch-size, i.e.,(14) |
where is the proposed adaptive weighting module. With the fusion weight , a set of dynamically weighted priors are presented as the input of synthesis network to guide image enhancement. As described above, three priors are chosen in this paper, e.g, .
4.2 Synthesis Network
Synthesis network aims to effectively aggregate priors and data information to perform underwater image enhancement, which consists of two sub-networks: a prior encoder and a data encoder-decoder. The entire architecture of synthesis network is shown in Fig.7
. Specifically, the prior encoder and the data encoder-decoder are designed to extract prior features and data features, respectively. In our implementation, the prior encoder is designed to be with the same structure of the data encoder. The shallow and deep features from the prior encoder are introduced into each layer of the data encoder and decoder respectively. Such an architecture design can implicitly benefit each other in a layer-by-layer manner.
A novel prior guidance module (GM) achieved by SFT layer [32] is designed to effectively introduce priors into the data, which holds two advantages. One is to regularize data information to avoid overfitting of synthesized images, which is achieved by normalizing data feature maps. Another is to guide the data encoder-decoder to perform image enhancement and better optimize the network, which is accomplished by learning a pair of prior modulation parameters, and .
Specifically, as shown in the GM of Fig.7, data feature maps, , are normalized using instance normalization [33], where , and denote the height of feature maps, the width of feature maps and the number of channels, respectively. The normalization way can be expressed as:
(15) |
where and
are the mean and standard deviation of
in channel .Prior features are also transformed into two learnable prior parameters to fine-tune data features, namely , where
represents the learning network. In our implementation, a 3 × 3 convolution with 64 channels is first adopted, followed by an activation function ReLU, to process the prior features from the prior encoder. Two 3 × 3 convolutions with 64 channels are used to generate
in a pixel-wisely manner, respectively. Finally, by fusing the priors and data, the output of prior guidance module is,(16) |
where and can be regarded as the self-adaptive scale factor and bias factor, respectively, holding the same shape of the data feature map . By learning the pair of parameters through prior information, different channels and pixels will obtain different scales and bias factors, and more attention will be paid to useful information. In addition, also introduces a residual branch to better maintain data information and improve the gradient flow.
4.3 Training Losses
Taking both pixel-level and feature-level loss into consideration, the reconstruction loss and perceptual loss are adopted in our model. The reconstruction loss is defined as follows,
(17) |
where and denote the enhanced and ground-truth image, respectively. The pixel-level reconstruction loss is conducive to content fidelity. However, it is limited in capturing high-level semantics, which is not consistent with human perception.
The perceptual loss is defined as the sum of distances between the designated output features of pre-trained VGG-19 network on ImageNet, which is written as:
(18) |
where is the feature map of the pool-3 layer of the pre-trained VGG-19 network. ,
are the height and width of feature maps. The overall loss function is defined as follows,
(19) |
where and are trade-off weights. In our work, they are set as = 0.8 and = 0.2.
5 Experimental results
In this section, the implementation details and experiment settings of our framework are first presented. Then, we evaluate the proposed underwater synthetic dataset and analyze the experimental results of on both synthetic and real underwater images. Finally, a series of ablation studies are provided to verify each component of our ANA-SYN and the model complexity and running time are analyzed.
Methods | UIQM | NIQE | ||||||||
RUIE | SQUID | UIEB | EUVP | UFO-120 | RUIE | SQUID | UIEB | EUVP | UFO-120 | |
U-net based on [25] | 3.080 | 2.369 | 3.091 | 3.234 | 3.061 | 4.897 | 8.661 | 4.869 | 7.886 | 6.280 |
U-net based on [26] | 3.137 | 2.547 | 3.122 | 3.249 | 3.081 | 4.506 | 7.298 | 4.646 | 7.726 | 5.963 |
U-net based on our dataset | 3.109 | 2.513 | 3.135 | 3.260 | 3.092 | 4.522 | 7.952 | 4.577 | 7.545 | 5.878 |
UGAN based on [25] | 3.034 | 2.960 | 3.122 | 3.136 | 3.111 | 6.905 | 6.759 | 6.418 | 6.853 | 6.316 |
UGAN based on [26] | 3.141 | 3.016 | 3.247 | 3.278 | 3.223 | 5.015 | 5.486 | 4.935 | 5.957 | 5.187 |
UGAN based on our dataset | 3.177 | 3.064 | 3.267 | 3.314 | 3.271 | 4.910 | 5.479 | 4.886 | 6.097 | 5.181 |
ANA-SYN based on [25] | 3.088 | 2.364 | 3.121 | 3.247 | 3.094 | 4.925 | 9.095 | 4.900 | 7.650 | 6.079 |
ANA-SYN based on [26] | 3.128 | 2.478 | 3.140 | 3.259 | 3.096 | 4.763 | 8.360 | 4.794 | 7.892 | 6.061 |
ANA-SYN based on our dataset | 3.158 | 2.617 | 3.151 | 3.250 | 3.105 | 4.542 | 6.216 | 4.608 | 7.255 | 5.737 |
5.1 Implementation Details
For training, we select synthetic images of nine types222Type I, IA, IB, II and III for open ocean water and type 1C, 3C, 5C and 7C for coastal water of water except type 9C since it is too turbid. The dataset (9000 pairs) is divided into two groups, containing 6300 (700x9) pairs of training images, denoted as Train-S6300, and 2700 pairs of testing (300x9) images, denoted as Test-S2700. All input images are resized to 256 × 256 and their pixel values are normalized to . In addition, the training examples are augmented by randomly rotating and horizontal flipping.
Our proposed ANA-SYN is implemented using the Pytorch tool and trained for 100 epochs on two NVIDIA Titan V GPUs. Adam is used as the optimization algorithm with a mini-batch size of 6 and the learning rate is fixed as
. Default values of and are set as 0.5 and 0.999, respectively. The weight decay is set to 0.00005.5.2 Experiment Settings
For testing, we compare our method with seven state-of-the-art underwater image enhancement methods on both synthetic data Test-S2700 and five real underwater benchmarks, RUIE333The RUIE dataset contains 3930 real-world underwater images [22], SQUID444The SQUID dataset contains 57 real-world underwater images [34], UIEB555The UIEB dataset contains 950 real-world underwater images [18], EUVP666The EUVP dataset contains 1910 real-world underwater images [35] and UFO-120777The UFO-120 dataset contains 3255 real-world underwater images [36]. The compared methods include traditional methods (Fusion [6], UIBLA [11] and HE-Prior [12]) and deep-learning methods (UGAN [24], FUIE-GAN [35], Water-Net [18] and Ucolor [20]).
For synthetic data Test-S2700, we retrain all deep-learning methods on Train-S6300
and evaluate them using the full-reference metrics Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM). For PSNR, the higher scores mean the result is closer to the image content of the ground truth. For SSIM, the larger values denote that the result is more similar to the structure and texture of the ground truth.
For five real underwater benchmarks, all competing methods are tested using the corresponding test models and parameters released by their authors. We employ the no-reference metrics UIQM and NIQE scores as references to compare the performance of different methods since these real underwater images do not have corresponding ground-truths. A higher UIQM or a lower NIQE score represents a better human visual perception. It should be pointed out that these scores do not accurately reflect the performance of various underwater enhancement methods in some cases [18, 20].
We also take a user study to more accurately score the performance of different methods on five real benchmarks, denoted as “Perceptual Scores”. Specifically, 12 participants having experience in image processing are invited to score enhanced results. Each participant has no time limitation and does not know the result produced by which method. The scores have 5 levels ranging from 1 to 5, indicating worst to best quality. In addition, we measure the color restoration accuracy on the 16 representative examples presented in the project page888http://csms.haifa.ac.il/profiles/tTreibitz/datasets/ambient_forwardlooking/
index.html of SQUID [34] to evaluate the color accuracy of different methods.
5.3 Dataset Evaluation
In this part, we aim to evaluate the effectiveness of our proposed synthetic dataset by retraining the compared methods. Three networks (U-net, UGAN and ANA-SYN) are selected and trained on our proposed dataset and other synthetic datasets [25, 26], respectively. The number and content of training samples remain the same and only the synthesis model is different. The trained networks are tested on five real underwater benchmarks.
Due to the limited space, only the visual comparisons results of U-net on UFO-120 dataset are presented in Fig.8 and more experimental results are given in the supplementary material. It can be observed that the network trained on our proposed dataset not only has a good performance in color correction and contrast enhancement but also enhances details, achieving a higher generalization ability in real underwater data.
Additionally, we perform quantitative comparisons on five real underwater benchmarks using UIQM and NIQE metrics and the results are presented in Table 1. As presented, the three models trained on our dataset almost achieve the best results in term of UIQM and NIQE metrics. Such results also demonstrate the superiority of our proposed dataset.
5.4 Network Architecture Evaluation
In this section, we perform quantitative and visual comparisons on both synthetic and real underwater images.
Methods | Fusion [6] | UIBLA [11] | HE-Prior [12] | UGAN [24] | FUIE-GAN [35] | Water-Net [18] | Ucolor [20] | Our ANA-SYN |
PSNR | 13.496 | 12.059 | 14.767 | 23.393 | 19.373 | 17.114 | 24.189 | 26.869 |
SSIM | 0.774 | 0.642 | 0.723 | 0.903 | 0.836 | 0.821 | 0.947 | 0.963 |
Methods | UIQM | NIQE | ||||||||||
RUIE | SQUID | UIEB | EUVP | UFO-120 | Avg | RUIE | SQUID | UIEB | EUVP | UFO-120 | Avg | |
Fusion [6] | 3.048 | 1.991 | 2.870 | 2.779 | 2.769 | 2.691 | 4.250 | 4.767 | 4.300 | 8.853 | 5.606 | 5.555 |
UIBLA [11] | 2.378 | 1.047 | 2.262 | 2.053 | 2.061 | 1.960 | 5.024 | 7.350 | 4.689 | 8.157 | 6.099 | 6.264 |
HE-Prior [12] | — | — | 2.637 | 2.656 | 2.565 | 2.619 | 4.269 | 5.902 | 4.371 | 9.165 | 5.986 | 5.939 |
UGAN [24] | 3.192 | 2.927 | 3.180 | 3.237 | 3.152 | 3.138 | 5.474 | 6.253 | 4.705 | 5.730 | 4.978 | 5.428 |
FUIE-GAN [35] | 3.002 | 1.775 | 2.949 | 3.079 | 2.954 | 2.752 | 5.913 | 7.736 | 5.745 | 7.818 | 6.000 | 6.642 |
Water-Net [18] | 3.081 | 2.316 | 3.002 | 3.052 | 2.908 | 2.872 | 5.164 | 7.244 | 5.313 | 6.992 | 5.821 | 6.107 |
Ucolor [20] | 2.984 | 2.292 | 3.048 | 3.171 | 3.001 | 2.899 | 5.304 | 7.314 | 5.004 | 6.395 | 5.523 | 5.908 |
Our ANA-SYN | 3.158 | 2.617 | 3.151 | 3.250 | 3.105 | 3.056 | 4.542 | 6.216 | 4.608 | 7.255 | 5.737 | 5.672 |
Methods | Color Error | Perceptual Scores | |||||||||
Katzaa | Michmoret | Nachsholim | Satil | Avg | RUIE | SQUID | UIEB | EUVP | UFO-120 | Avg | |
Fusion [6] | 31.161 | 28.246 | 31.985 | 35.220 | 31.653 | 2.483 | 2.272 | 2.624 | 2.483 | 2.616 | 2.496 |
UIBLA [11] | 34.214 | 32.742 | 32.559 | 36.019 | 33.883 | 2.104 | 1.959 | 2.059 | 2.180 | 2.105 | 2.081 |
HE-Prior [12] | 21.586 | 17.526 | 10.542 | 26.634 | 19.072 | 1.782 | 1.091 | 1.724 | 2.083 | 1.902 | 1.716 |
UGAN [24] | 10.158 | 11.714 | 10.206 | 7.650 | 9.932 | 1.389 | 1.575 | 1.519 | 1.803 | 1.561 | 1.569 |
FUIE-GAN [35] | 26.847 | 25.735 | 24.820 | 29.663 | 26.766 | 2.672 | 2.313 | 2.518 | 2.542 | 2.744 | 2.558 |
Water-Net [18] | 23.352 | 21.695 | 20.438 | 22.190 | 21.918 | 2.987 | 2.482 | 2.764 | 2.722 | 2.847 | 2.760 |
Ucolor [20] | 15.782 | 18.189 | 22.456 | 13.820 | 17.562 | 2.979 | 2.626 | 2.909 | 3.011 | 2.913 | 2.888 |
Our ANA-SYN | 10.413 | 15.550 | 11.297 | 5.847 | 10.776 | 3.038 | 3.763 | 3.042 | 3.013 | 2.902 | 3.152 |
5.4.1 Experiment on Synthetic Datasets
Some quantitative comparisons are conducted on Test-S2700 and the corresponding results are reported in Table 2. It can be observed that our ANA-SYN achieves the best performance in term of PSNR and SSIM metrics and clearly surpasses other state-of-the-art methods by a wide margin. Such results demonstrate that the effective integration of priors and data information can better enhance underwater images. For Water-Net, its performance is limited by three pre-processing versions, where the white balance method they used cannot pre-process the input data well, resulting in a low PSNR value. For FUIE-GAN, its purpose is to achieve a fast and lightweight model with fewer parameters, which can easily reach performance bottlenecks on the complex and distorted training samples. UGAN is trained in an end-to-end manner without introducing underwater priors and its performance in two metrics is relatively poor. Ucolor only introduces the transmission map prior and does not adaptively weight multiple priors based on their usefulness for images, leading to a limited performance.
Some visual comparison results on synthetic images sampled from Test-S2700 are also shown in Fig.9. It can be observed that the enhanced results generated by our method have clear details with fewer color artifacts and high-fidelity object regions, which are closer to the ground-truths. In comparison, the competing methods tend to introduce color artifacts or fail to remove haze on images. He-Prior exhibits some serious color artifacts, producing visually over-enhancement results. Fusion and UIBLA can improve the contrast of images to some extent, but they cannot effectively remove color shift and fail to recover the complete scene structure. Although Water-Net, FUIE-GAN, UGAN and Ucolor can provide a relatively good color appearance, they often suffer from local over-enhancement and remain some haze in the results.
5.4.2 Experiment on Real Datasets
To evaluate the generalization ability of our proposed method on real images, we conduct comprehensive experiments on five more underwater benchmarks. The average scores of UIQM and NIQE metrics are reported in Table 3, where it can be observed that our method achieves relatively higher UIQM and NIQE scores. For the UIQM metric, UGAN almost achieves the highest score on five real datasets, while our ANA-SYN ranks the second best. For the NIQE metrics, our method ranks the third best, only slightly lower than Fusion and UGAN.
To further analyze the robustness of color accuracy, we report the average color error on SQUID in the left part of Table 4. For images in Satil dive sites, our method achieves the lowest color error. For images in Katzaa and Nachsholim dive sites, our results only have a little lower than UGAN. Observing the perceptual scores in the right part of Table 4, we can see that the learning-based methods achieve relatively higher performance. Among them, our ANA-SYN is significantly outperform other competing methods, obtaining the best performance. Such results demonstrate that our method can generalize well to different real-world underwater scenarios.
It is interesting that the UGAN method has similar or only slightly higher UIQM/NIQE/Color Error values in comparison to our ANA-SYN, but the perceptual score is far worse than ours. In our opinion, this is mainly caused by the fact that these metrics are biased to some characteristics (not entire image) and do not consider the color shift and artifacts, and thus they are not sensitive to artifacts generated on the boundary of objects [18]. As shown in the enhanced results of UGAN in Fig.10 to Fig.14, we can clearly observe the edges of objects are blurred since the model employs data generated based on Cycle-GAN migration for training.
We also present some results of different methods on images sampled from RUIE and SQUID in Fig.10 and Fig.11. As shown, our method not only effectively removes the seriously greenish and blueish color tone but also restores clear structures, which is credited to the effective combination of priors and data. By contrast, UIBLA, UGAN, FUIE-GAN and Ucolor remain some unexpected color casts, UGAN even produces artifacts at the boundary of objects and He-Prior introduces over-enhancement in their results.
Some enhanced results of different methods on UIEB are also presented in Fig.12. For the image with yellowish tone in shallow water areas or the image with greenish tone in deep water areas, our method significantly corrects the color casts and dehazes the input image, which shows the advantage of our specially designed adaptive weighting module. In comparison, most comparison methods still suffer from obvious color artifacts and blurry details.
Visual comparisons on challenging underwater images sampled from EUVP and UFO-120 are shown in Fig.13 and Fig.14. For these low or high quality underwater images, our method can effectively enhance foregrounds structure, remit color casts and remove the haze on images, leading to more natural results and producing visually plausible results. By contrast, the competing methods often introduce annoying artifacts to outputs, destroy the image structure and introduce unexpected color casts in recovery results due to ignoring lots of favorable underwater priors. Due to the limit space, more comparative results in five real benchmarks are provided in supplementary material.
Modules | Models | Synthetic data | Real benchmarks | ||
PSNR | SSIM | UIQM | NIQE | ||
full model | 26.869 | 0.963 | 3.056 | 5.672 | |
UPriors | BL | 25.803 | 0.955 | 3.022 | 6.095 |
BL+E | 26.415 | 0.958 | 3.056 | 5.824 | |
BL+C | 26.536 | 0.961 | 3.053 | 5.742 | |
BL+W | 26.758 | 0.962 | 3.035 | 5.827 | |
AWM | BL+A-CWE-G | 26.419 | 0.959 | 3.050 | 5.857 |
BL+C-CWE-G | 26.464 | 0.958 | 3.051 | 5.918 | |
GM | BL+W-CWE-A | 26.717 | 0.960 | 3.058 | 5.783 |
BL+W-CWE-C | 26.818 | 0.960 | 3.046 | 5.847 |
Methods | Synthetic data | Real benchmarks | |
PSNR | SSIM | Perceptual Score | |
w/o perc loss | 24.995 | 0.931 | 3.085 |
w/ perc loss | 26.869 | 0.963 | 3.152 |
Our | Ucolor | UGAN | FUIE-GAN | Water-Net | |
Flops (G) | 268.7 | 43365 | 3887 | 0.008 | 1937 |
Parameters (M) | 110.1 | 157.4 | 57.17 | 4.216 | 1.091 |
Running time (S) | 0.097 | 1.345 | 0.009 | 0.081 | 0.582 |
5.5 Ablation Study and Analysis
To prove the effectiveness of each component, we conduct a series of ablation studies on Test-S2700 and five real benchmarks using PSNR/SSIM and UIQM/NIQE metrics. Four factors are mainly considered including the priors (UPriors), the adaptive weighting module (AWM), the prior guidance module (GM) and loss function, as follows:
-
full model: recalibrating three priors by our proposed adaptive weighting module and using the prior guidance module to fuse.
-
BL: a baseline U-net network without priors.
-
BL+E: adding the structure prior by the proposed prior guidance module.
-
BL+C: adding the color tone prior by the proposed prior guidance module.
-
BL+W: adding the water-type prior by the proposed prior guidance module.
-
BL+A-CWE-G: directly adding three priors as the final priors and using the prior guidance module to fuse.
-
BL+C-CWE-G: directly concatenating three priors as the final priors and using the prior guidance module to fuse.
-
BL+W-CWE-A: calibrating three priors by our proposed adaptive weighting module and using summation to fuse.
-
BL+W-CWE-C: calibrating three priors by our proposed adaptive weighting module and using concatenation to fuse.
The quantitative results are reported in Table 5. The models BL, BL+E, BL+C and BL+W can be employed to prove the effectiveness of priors, where it can be observed that the color tone, water-type and structure priors improve the overall performance by approximately 0.733dB, 0.955dB and 0.615dB, respectively. The average UIQM and NIQE scores of five real benchmarks are also relatively good overall. Such results also show the usefulness of priors for enhancement.
The models BL+A-CWE-G and BL+C-CWE-G can be used to analyze the effectiveness of the proposed adaptive weighting module. Obviously, the results show that a straightforward method (concatenation or summation) cannot adaptively combine multiple priors into more discriminative priors according to the needs of the input image, which also shows the importance of our proposed module.
As presented in BL+W-CWE-A and BL+W-CWE-C, compared with the direct concatenation or summation of prior features and data features, our proposed prior guidance module can improve performance by 0.05-0.15dB and generate more natural results (see UIQM and NIQE metrics). Such results demonstrate the superiority of our guidance module.
The performances of ANA-SYN with/without the perceptual loss are also compared in Table 5, where it is observed that the network that adds the perceptual loss in the training phase can restore more realistic colors (see PSNR and SSIM metrics on synthetic data) and improve the visual quality of final results (see the average perceptual scores on five real benchmarks), which is better than that trained without the perceptual loss.
5.6 Model Complexity Analysis
Table 7 reports the parameters, flops and running time of our ANA-SYN and other representative deep methods. All methods are tested on a PC with an Intel(R) i5-10500 CPU, 16.0GB RAM, a NVIDIA GeForce RTX 2080 Super and the test image size is 256×256×3. Among them, our method achieves relatively few parameters, flops and running time. Although the flops, sizes and time cost of FUIE-GAN are smaller than ours, the generation capability in real underwater scenarios is far less than our method, even it is trained on real data (EUVP and UFO-120), see ‘Experiment on Real Datasets’ above. UGAN obtains the least running time across different methods. However, it has more flops and the performance in real images is limited. The parameters of Water-Net are 1.091M less than us, but its flops 1937G, which are larger than ours. For Ucolor, its parameters, flops and time cost far exceed ours. Such results demonstrate that our ANA-SYN can achieve superior performance with moderate parameters.
6 Conclusion
In this paper, a new underwater synthetic dataset is first proposed, in which a revised ambient light synthesis equation that defines the relationship among the ambient light values of RGB channels and many dependencies is embedded. Extensive evaluation results show that our proposed dataset is closer to real underwater data, which can be used as training data for various enhancement and restoration algorithms on underwater vision applications. Secondly, a new framework, called ANA-SYN, is proposed for underwater image enhancement under collaborations of priors (underwater domain knowledge) and data information (underwater distortion distribution). In addition, a novel adaptive weighting module is designed to adaptively calibrate priors according to the importance of each prior for the input image, and a new prior guidance module is introduced to effectively fuse prior and data features. Extensive experimental results on both synthetic data and five publicly available real underwater benchmarks demonstrate the effectiveness of our ANA-SYN.
References
- [1] K. Iqbal, M. Odetayo, A. James, R. A. Salam, and A. Z. H. Talib, “Enhancing the low quality images using unsupervised colour correction method,” in Proc. Man and Cybernetics 2010 IEEE Int. Conf. Systems, pp. 1703–1709, Oct. 2010.
- [2] A. S. A. Ghani and N. A. M. Isa, “Underwater image quality enhancement through integrated color model with rayleigh distribution,” Applied soft computing, vol. 27, pp. 219–230, Nov. 2015.
- [3] A. S. A. Ghani and N. A. M. Isa, “Enhancement of low quality underwater image through integrated global and local contrast correction,” Applied Soft Computing, vol. 37, pp. 332–344, Aug. 2015.
- [4] X. Fu, P. Zhuang, Y. Huang, Y. Liao, X.-P. Zhang, and X. Ding, “A retinex-based enhancing approach for single underwater image,” in Proc. IEEE Int. Conf. Image Processing (ICIP), pp. 4572–4576, 2014.
- [5] S. Zhang, T. Wang, J. Dong, and H. Yu, “Underwater image enhancement via extended multi-scale retinex,” Neurocomputing, vol. 245, pp. 1–9, Jul 2017.
-
[6]
C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwater
images and videos by fusion,” in
Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR)
, pp. 81–88, 2012. - [7] C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and P. Bekaert, “Color balance and fusion for underwater image enhancement,” IEEE Trans. Image Process., vol. 27, pp. 379–393, Oct. 2017.
- [8] C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and M. Sbert, “Color channel compensation (3C): A fundamental pre-processing step for image enhancement,” IEEE Trans. Image Process., vol. 29, pp. 2653–2665, 2020.
- [9] P. L. Drews, E. R. Nascimento, S. S. Botelho, and M. F. M. Campos, “Underwater depth estimation and image restoration based on single images,” IEEE Comput. Graph. Appl., vol. 36, pp. 24–35, Mar. 2016.
- [10] Y. Wang, H. Liu, and L.-P. Chau, “Single underwater image restoration using adaptive attenuation-curve prior,” IEEE Trans. Circuits Syst. I, vol. 65, pp. 992–1002, Sept. 2017.
- [11] Y.-T. Peng and P. C. Cosman, “Underwater image restoration based on image blurriness and light absorption,” IEEE Trans. Image Process., vol. 26, pp. 1579–1594, Feb. 2017.
- [12] C.-Y. Li, J.-C. Guo, R.-M. Cong, Y.-W. Pang, and B. Wang, “Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior,” IEEE Trans. Image Process., vol. 25, pp. 5664–5677, Sept. 2016.
- [13] D. Akkaynak, T. Treibitz, T. Shlesinger, Y. Loya, R. Tamir, and D. Iluz, “What is the space of attenuation coefficients in underwater computer vision?,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 568–577, 2017.
- [14] D. Akkaynak and T. Treibitz, “A revised underwater image formation model,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 6723–6732, 2018.
- [15] D. Akkaynak and T. Treibitz, “Sea-thru: A method for removing water from underwater images,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1682–1691, 2019.
- [16] C. Li, J. Guo, and C. Guo, “Emerging from water: Underwater image color correction based on weakly supervised color transfer,” IEEE Signal Process. Lett., vol. 25, pp. 323–327, Jan. 2018.
- [17] Y. Guo, H. Li, and P. Zhuang, “Underwater image enhancement using a multiscale dense generative adversarial network,” IEEE J. Ocean. Eng., 2019.
- [18] C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,” IEEE Trans. Image Process., vol. 29, pp. 4376–4389, Nov. 2019.
- [19] X. Xue, Z. Hao, L. Ma, Y. Wang, and R. Liu, “Joint luminance and chrominance learning for underwater image enhancement,” IEEE Signal Process. Lett., vol. 28, pp. 818–822, 2021.
- [20] C. Li, S. Anwar, J. Hou, R. Cong, C. Guo, and W. Ren, “Underwater image enhancement via medium transmission-guided multi-color space embedding,” IEEE Trans. Image Process., vol. 30, pp. 4985–5000, 2021.
- [21] R. Liu, L. Ma, Y. Wang, and L. Zhang, “Learning converged propagations with deep prior ensemble for image enhancement,” IEEE Trans. Image Process., vol. 28, pp. 1528–1543, Oct. 2018.
- [22] R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 12, pp. 4861–4875, 2020.
- [23] J. Li, K. A. Skinner, R. M. Eustice, and M. Johnson-Roberson, “Watergan: Unsupervised generative network to enable real-time color correction of monocular underwater images,” IEEE Robot. Autom. Lett., vol. 3, no. 1, pp. 387–394, 2017.
- [24] C. Fabbri, M. J. Islam, and J. Sattar, “Enhancing underwater imagery using generative adversarial networks,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 7159–7165, May 2018.
- [25] C. Li, S. Anwar, and F. Porikli, “Underwater scene prior inspired deep underwater image and video enhancement,” Pattern Recognit., vol. 98, 2020.
- [26] X. Ding, Y. Wang, Y. Yan, Z. Liang, Z. Mi, and X. Fu, “Jointly adversarial network to wavelength compensation and dehazing of underwater images,” arXiv preprint arXiv:1907.05595, 2019.
- [27] J. Y. Chiang and Y.-C. Chen, “Underwater image enhancement by wavelength compensation and dehazing,” IEEE Trans. Image Process., vol. 21, pp. 1756–1769, Dec. 2011.
- [28] M. G. Solonenko and C. D. Mobley, “Inherent optical properties of jerlov water types,” Applied optics, vol. 54, no. 17, pp. 5392–5401, 2015.
- [29] X. Liu and B. M. Chen, “A systematic approach to synthesize underwater images benchmark dataset and beyond,” in Proc. IEEE Int. Conf. Control and Automation (ICCA), pp. 1517–1522, 2019.
- [30] Y. Y. Schechner and N. Karpel, “Clear underwater vision,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, vol. 1, pp. I–I, 2004.
- [31] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141, 2018.
-
[32]
X. Wang, K. Yu, C. Dong, and C. Change Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in
Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 606–615, 2018. - [33] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2016.
- [34] D. Berman, D. Levy, S. Avidan, and T. Treibitz, “Underwater single image color restoration using haze-lines and a new quantitative dataset,” IEEE Trans. Pattern Anal. Mach. Intell., 2020.
- [35] M. J. Islam, Y. Xia, and J. Sattar, “Fast underwater image enhancement for improved visual perception,” IEEE Robot. Autom. Lett., vol. 5, pp. 3227–3234, Feb. 2020.
- [36] M. J. Islam, M. Fulton, and J. Sattar, “Toward a generic diver-following algorithm: Balancing robustness and efficiency in deep visual detection,” IEEE Robot. Autom. Lett., vol. 4, pp. 113–120, Jan. 2019.