In the underwater, the captured images always suffer from several kinds of degradation, including blurriness, color casts and low contrast. As light travels in the water, red light, which has longer wavelength than green and blue light, is absorbed faster, and thus underwater images often appear in a typical bluish or greenish tone. Furthermore, large amounts of suspended particles often change the direction of light in the water, resulting in dim and fuzzy images. Excellent underwater image enhancement methods are expected to improve low visibility, eliminate color casts and stretch low contrast, which can effectively enhance visual quality of input images. Meanwhile, the enhanced visibility can make scenes and objects more highlighted, providing a better starting point for high-level computer vision tasks, such as object detection and recognition.
In the past decades, many algorithms have been proposed to enhance underwater images, ranging from traditional methods (image-based [1, 2, 3, 4, 5, 6, 7, 8] and physical-based [9, 10, 11, 12, 13, 14]) to learning-based methods [15, 16, 17, 18, 19, 20, 21]. Compared to traditional methods, learning-based methods tend to design end-to-end modules or integrate networks with physical priors to solve problems, which have better feature representation that benefits from the large data and powerful computational ability. Unfortunately, it is impractical in real-world to collect a number of real underwater images with distortion-free counterparts. Compared to real underwater data, synthetic data is much easier to be obtained. Thus, most deep methods exploit synthetic data to train the proposed models, achieving relatively promising performance. However, most of them ignore the domain shift problem from synthetic to real data, i.e., inter-domain gap, as shown in Fig.1. These models learned from synthetic data often suffer a severe performance drop when facing some real underwater images with different distortion distributions.
Apart from this, another challenging problem in underwater image enhancement is diversity of real image distributions. Generally, the quality of images captured in the water is severely affected by many factors, such as illumination conditions, water bodies, water depth, seasonal and weather changes, etc. As shown in Fig.2(a), these factors lead to various kinds of degradation and a large gap among real images itself, i.e., intra-domain gap. There have been rarely studies proposed to address the challenge of underwater real image itself distribution diversity. Four representative real examples and their corresponding results made by a deep model are presented in Fig.2(b). The model shows satisfactory performance on some images (good results). However, it cannot perform well on some images (poor results), introducing obvious local artifacts, noises and over-enhancement, etc. Obviously, without considering the intra-domain gap, it is hard for a deep model to effectively handle real underwater images with such various degradation distributions.
Motivated by the above analysis, this paper proposes a novel Two-phase Underwater Domain Adaptation network (TUDA) in underwater image enhancement to jointly bridge the inter-domain gap and the intra-domain gap, which consists of two phases: inter-domain adaptation and intra-domain adaptation. To be specific, a new dual-alignment network is designed in the first phase, including a translation part and an enhancement part, one for the synthetic-to-real translation and another for image enhancement. Coupled with both image-level and feature-level adaptations in an end-to-end manner, two parts can cooperative with each other for learning more domain-invariant representations to better reduce the inter-domain gap.
In the second phase, a simple yet efficient rank-based underwater quality assessment algorithm (RUIQA) is proposed, which can better evaluate the perceptual quality of enhanced images by learning to rank. The proposed RUIQA is strongly sensitive to various artifacts and can be easily plugged in both the training and testing pipeline. Based on the assessed quality of enhanced images, we divide the real data into two categories: easy and hard samples, and get a trustworthy real image set with pseudo labels. Subsequently, using the easy-pseudo pairs and unpaired hard samples, an easy/hard domain adaptation technique is performed to close the intra-domain gap between easy and hard samples. The overview of our TUDA is presented in Fig.3. To the best of our knowledge, this is the first work that successfully explores the inter-domain and intra-domain adaptation jointly in the underwater image enhancement community. The main contributions of this paper are summarized as follows:
We propose a novel two-phase underwater domain adaptation network, called TUDA, to simultaneously reduce the inter-domain and intra-domain gap, which successfully sheds new light on future direction for enhancing underwater images.
A novel dual-alignment architecture is designed in the inter-domain adaptation phase, which can effectively perform image-level and feature-level adaptations using jointly adversarial learning. Two alignment parts can improve each other, and the combination of them can better build invariance across domains and thus bridge the inter-domain gap.
A rank-based underwater quality assessment method is developed in the intra-domain adaptation phase, which can effectively assess the perceptual quality of enhanced images with the help of learning to rank. From this method, we successfully perform an easy-hard classification and an easy/hard adaptation technique to significantly reduce the intra-domain gap.
2 Related work
In this section, we briefly review previous related works in two aspects, i.e. underwater image enhancement and domain adaptation.
2.1 Underwater Image Enhancement
Underwater image enhancement approaches can be roughly categorized into three branches, i.e., image-based methods, physical-based methods and learning-based methods.
Image-based methods [1, 2, 3, 4, 5, 6, 7, 8] mainly modify pixel values of underwater images to improve visual quality, including pixel values adjustment [1, 2, 3, 8], retinex decomposition [5, 6] and image fusion [4, 7], etc. For example, Zhang et al.  propose an extended multi-scale retinex-based underwater image enhancement method, in which the input image is processed by three steps: color correction, layer decomposition and enhancement. Ancuti et al.  propose a novel multi-scale fusion strategy, which blends a color-compensated and white-balanced version of the given image to generate a better result. Recently, based on the characteristics of severely non-uniform color spectrum distribution in underwater images, Ancuti et al.  introduce a new color-channel-compensation pre-processing step in the opponent color channel to better overcome artifacts. Image-based methods can improve visual effects to some extent. However, they often fail to provide high quality results in some complex scenarios due to ignoring the domain knowledge of underwater imaging.
, in which the background light and transmission map are estimated by some priors. The priors include underwater dark channel prior, minimum information prior , blurriness prior  and color-line prior , etc. For example, built on underwater image blurriness and light absorption, Peng et al.  propose an underwater image restoration method combined with a blurriness prior to estimate more accurate scene depth. Inspired by the minimal information loss principal, Li et al.  estimate an optimal transmission map to restore underwater images, and exploit a histogram distribution prior to effectively improve the contrast and brightness. Recently, Berman et al.  incorporate the color-line prior and multiple spectral profiles information of different water types into the physical model, and employ the gray-world assumption theory to choose the best result, showing great performance on image dehazing. These methods can restore underwater images well in some cases. However, when the priors are invalid, undesired artifacts and color casts are still inevitable appear in some regions.
Recently, with the development of deep learning,learning-based methods [15, 16, 17, 18, 19, 20, 21] have made significant progresses in underwater image enhancement. There are many methods improve performance by training their models on real underwater images. For example, to relax the need of paired training data, Li et al. 
develop a weakly supervised underwater color transfer model based on cycle-consistent generative adversarial network (CycleGAN) and real data to correct color. As a pioneering work, Liet al.  build a real underwater image enhancement dataset, including totally 950 underwater raw images and reference images. The reference images are produced by 12 enhancement algorithms, and scored by 50 volunteers to choose the final results. With these images, Li et al.  design a gate fusion network, in which three confidence maps are learned to fuse three pre-processing versions into a decent result. Recently, Li et al.  develop an underwater image enhancement network in medium transmission-guided multi-color space for more robust enhancement. The methods trained on real data can produce visually pleasing results. However, they cannot restore the color and structure of objects well and tend to produce inauthentic results since the reference images are not the actual ground truths.
There are also many algorithms to train their networks using data synthesized from Generative Adversarial Networks [16, 17] or physical models [22, 23]. For example, combined with the knowledge of underwater imaging, Li et al.  design a generative adversarial network for generating realistic underwater-like images from in-air images and depth maps, and then utilize these generated data to correct color casts in a supervised manner. Fabbri et al.  directly employ a CycleGAN to generate paired training data, and then a fully convolutional encoder-decoder is trained to improve underwater image quality. In addition, Li et al.  propose to synthesize ten types of underwater images based on an underwater image formation model and some scene parameters. With the synthetic data, Li et al.  develop an end-to-end model to directly recover the clear underwater latent image first, and then conduct a post-processing to improve subjective visual effects. Dudhane et al.  improve the work of  by introducing the object blurriness and color shift components to synthesize more accurate underwater-like data.
Synthesis data can simulate different underwater types and degradation levels, and has the corresponding reference images as guidance for network training. However, due to the certain domain discrepancy between synthetic and real-world data, deep models trained on synthetic data often fail to generalize well on real underwater scenarios.
2.2 Domain Adaptation
Domain adaptation has been extensively explored recently, which aims to reduce the distribution gap between two different domains, and can be performed at the image level or feature level. To the best of our knowledge, domain adaptation is seldom systematic studied in underwater image enhancement field. However, it has a wide range of applications in other fields such as image hazing , semantic segmentation [25, 26] and depth prediction [27, 28], etc. For example, Shao et al.  propose a domain adaptation for single image dehazing based on CycleGAN, in which a new bidirectional translation network is design to reduce the gap between synthetic and real images by jointly synthetic-to-real and real-to-synthetic image-level adaptations. Zhao et al.  propose a novel geometry-aware symmetric domain adaptation framework to explore the labels in the synthetic data and epipolar geometry in the real data jointly for better bridge the gap between synthetic and real domains, and thus generate high-quality depth maps.
More recently, Pan et al  propose an unsupervised intra-domain adaptation through self-supervision for semantic segmentation. To obtain extra performance gains, the authors first train the model using the inter-domain adaptation from existing approaches, and decompose the target domain in two small subdomains based on the mean value of entropy maps from the predicted segmentation maps. Then an alignment on entropy maps for both subdomains are conducted to further reduce the intra-domain gap. Inspired by this work, the concepts of inter- and intra-domain are introduced to underwater image enhancement. In this paper, we propose a different domain adaptation method, in which a new dual-alignment network used for inter-domain adaptation and a novel underwater image quality assessment algorithm used for intra-domain adaptation are proposed. The detail architectures of the proposed method are introduced in the following sections.
3 Proposed Method
Given a set of synthetic images and a real underwater image set , we aim to reduce the inter-domain gap between the synthetic and real data and the intra-domain gap among the real data itself. A novel two-phase underwater domain adaptation network is proposed, which consists of two parts: inter-domain adaptation and intra-domain adaptation. As shown in Fig.3, in the inter-domain phase, a new dual-alignment network is developed to jointly perform image-level and feature-level alignment, including an image translation part and an image enhancement part . The former is used for learning a more robust transformation of synthetic to real underwater images, and the latter is used for performing image enhancement using both translated and real images. Details are introduced in Section 1. From this adaptation, a rank-based underwater quality assessment method (i.e., RUIQA) is designed to evaluate the perceptual quality of the enhanced images. Based on these predicted quality scores, we separate the real underwater raw images into easy and hard samples ( and ), and then conduct the intra-domain adaptation similar to inter-domain adaptation. Details of this phase are described in Section 3.2.
3.1 Inter-domain Adaptation
Our proposed dual-alignment network aims to reduce the inter-domain adaptation gap between the synthetic and real data domain in both image level and feature level, as shown in Fig.4. The proposed network composes of two parts: an image translation module for enhancing realism of input images, followed by an enhancement module . takes synthetic samples and their corresponding ground truth labels () as inputs, and generates translated images , i.e., . The translated images are expected as possible with similar distribution of real images . Meanwhile, the discriminator is encouraged to identify the difference between and . To stabilize the gradients and improve performance, the WGAN-GP adversarial loss  is adopted to perform image-level alignment, set as:
where represents the sampling distribution which is sampled uniformly from and , and is the penalty parameter, in our works, .
Color cast is one of the main characteristics of underwater images, which generally can be divided into three tones: blue, green and blue-green . Inspired by this, the synthetic and real images are divided into three color tone subsets according to the average value of the blue (b) channel in the CIElab color space. When the synthetic images and the real images are in the same color tone, the synthetic-to-realistic translation can be accomplished, which greatly speeds up the convergence of the model. In addition, intuitively, the gap between the synthetic and real data mainly comes from low-level differences, such as color and texture. Thus, the translated images should be retained the same semantic content as , but with a different appearance. Therefore, a semantic content loss component is incorporated along with the adversarial loss, set as:
where is the
th-layer feature extractor of the VGG-19 network pretrained on ImageNet,is the set of layers, including conv1-1, conv2-1, conv3-1, conv4-1 and conv5-1. denotes the weight of the th-layer, set as in our experiments.
After the synthetic images are translated, the generated realistic images can be obtained. The paired translated data () is utilized to train the enhancement network . is trained in a supervised way, including a content loss and a perceptual loss, set as:
where is the output of the enhancement network , i.e., . The two parameters and are the weights of different loss components, set as 0.8 and 0.2, respectively.
To better minimize the inter-domain gap, a feature-level adversarial loss is also introduced into the enhancement part, set as:
where shares identical weights in both real and translated input pipelines and is the encoder of . denotes the sampling distribution sampled uniformly from and . is the penalty parameter, set as 10 in our experiments.
With both image-level and feature-level alignments in an end-to-end manner, our dual-alignment network can better build invariance between domains and thus close the inter-domain gap. The overall loss function for the inter-domain adaptation phase is expressed as follows:
where , , and are trade-off weights. In our work, they are set as 1, 100, 10 and 0.0005, respectively.
3.2 Intra-domain Adaptation
As mentioned above, the intra-domain gap exists among real underwater images itself, and thus a straightforward method is to divide and conquer. Some images containing a similar distribution with the training data are easy to be enhanced, called easy samples, and vice versa. Therefore, real underwater images can be separated into easy samples and hard samples according to the assessed quality of enhanced images. Enhanced results of easy samples are trustworthy, which can be used as pseudo-labels. By using easy samples and their corresponding pseudo-labels, an unsupervised way is conducted to learn easy/hard adaptation to close the intra-domain gap between easy and hard samples. To reasonably separate real underwater into easy and hard parts based on the quality of enhanced images, an effective method is required. One may attempt to use existing underwater image quality assessment methods for separating, such as UCIQE  and UIQM . However, the experimental results in  show that these methods cannot accurate evaluate image quality in some cases. Notably, this paper presents a novel and effective underwater quality assessment method with the help of rank information learned from rankings, which can effectively assess the quality of enhanced images, named rank-based underwater image quality assessment (RUIQA).
3.2.1 Rank-based Underwater Image Quality Assessment (RUIQA)
Existing deep IQA methods usually initialize their model parameters using the pre-trained models on the ImageNet dataset[33, 34]. Although these metrics achieve good results on ground images to some extent, the performance is unsatisfactory when facing images with various underwater distortions. In our opinion, this is mainly caused by the fact that pre-trained models capture information that is conducive to ground image processing instead of the unique prior information of underwater images, and thus they cannot easily adapt the characteristics of underwater image quality assessment tasks.
Inspired by 
in image super-resolution, this paper utilizes an underwater ranking dataset to train a large network to extract some ranking information by learning to rank, which is closely related to the perceptual quality. And then we fine-tune it to more accurately predict the perceptual quality of enhanced images. Differently, in, a Ranker is trained to learn the behavior of perceptual metrics and then a novel rank-content loss is introduced to optimize the perceptual quality, while our method trains an underwater ranker and makes it as model initialization parameters to help assess perceptual quality. As shown in Fig.5, our RUIQA consists of three stages: generating rank images, training ranker and fine-tuning network.
Generating rank images: A large number of underwater images are first collected from online and some public datasets [19, 30, 36], and then carefully selected and refined. Most of the collected images are weeded out, and about 3900 candidate images are remained. We randomly choose 800 pictures to construct an underwater ranking dataset. With the candidate underwater images, the enhanced images are generated by 8 image enhancement methods, including Fusion-12 , Fusion-18 , Two-step-based , Histogram prior , Blurriness-based , Water-Net , FUIE-GAN  and a commercial application for enhancing underwater images (i.e., dive+). Each enhanced image is assessed with a continuous quality scale, ranging from 1 to 5. After then, the quality scale is map to a continuous score between 1 to 100. 20 volunteers are invited to conduct the evaluation in the same monitor environment. Following the work of , the raw scores are refined by means of some standardized settings [39, 40] and the Mean Opinion Score (MOS) are calculated [41, 42], obtaining reliable subjective rating results. Our dataset and code will be publicly released on https://github.com/Underwater-Lab/TUDA.
Training ranker: With the obtained MOS values, the pair-wise images and the corresponding ranking order labels can be obtained. Meanwhile, ResNet-50  is employed as the Siamese network architecture to extract ranking information. The Siamese network is trained by a margin-ranking loss proposed in , which is beneficial for model to learn the ranking information. After training, a single branch of the Siamese network, i.e., the pre-trained ResNet-50 model parameters on Ranking images, is extracted to initialize our backbone network.
In our RUIQA, the last global average pooling (GAP) and fully connected (FC) layer of the pre-trained ResNet-50 model are removed. To better handle distortion diversity, multi-scale features extracted from four layers (conv2-10, conv3-12, conv4-18 and the last layer) are treated as the input of four blocks. The block is composed of a 1×1 convolution, a GAP and a FC layer, mapping the multi-scale features into the corresponding perceptual quality vectors. Finally, these predicted quality vectors are regressed into a quality score. In the training phase, the network is fine-tuned by minimizing theloss between the predicted score and the MOS value label.
Using the proposed RUIQA, the quality score of each enhanced image is predicted. The higher the value, the model is more confident with this real-world image (i.e., easy sample). This step can be named as an easy-hard classification. Some classification results are shown in Fig. 6, it can be observed that the enhanced results of easy samples have higher perceptual quality and are near to the human perception. In practice, a ratio is introduced to help the separation, which means the ratio of easy samples to total samples. The corresponding MOS value of the specified ratio is set as a threshold to pick up easy samples and the rest images are considered as hard samples for the training. In Section 4.4.4, how to obtain the best ratio is explored. It is very important for the intra-domain training and finally TUDA testing pipeline.
3.2.2 Easy/Hard Adaptation
For easy samples , the enhanced results are set as pseudo labels to obtain some real underwater pair data (). By using the pair data (), we aim to adopt an easy/hard adaptation technique to close the intra-domain gap between easy and hard samples, which is composed of an intra-domain translation part and an intra-domain enhancement part . tries to translate the easy sample to be indistinguishable from the hard images . Meanwhile, a discriminator aims to differentiate between the translated image and hard images . This minimax game can be modeled using an adversarial loss as
where the parameter , represents the sampling distribution which is sampled uniformly from and .
Similar to , an excellent translation should keep the translated image “similar” in content to the original easy image . Thus, semantic content loss is incorporated to better achieve content preservation, set as:
where is the set of layers (conv1-1, conv2-1, conv3-1, conv4-1 and conv5-1) and is the corresponding th-layer feature map in pre-trained VGG-19 model. denotes the weight of the th-layer, in our work, set as respectively.
Then, the translated image is input to the intra-domain enhancement part , and the enhanced image is obtained. is trained in a supervised manner, including a content loss and a perceptual loss, set as:
where and are trade-off weights, set as 0.8 and 0.2 respectively.
To better minimize the intra-domain gap between easy and hard samples in the real-world domain, we also perform a feature-level adaptation, where a discriminator is introduced to align the distributions between the feature map of and . The loss is defined as:
where shares the same weight in both translated input and hard images pipelines and is the encoder of . denotes the sampling distribution sampled uniformly from and . is the penalty factor, set as 10 in this work. and are trained in an end-to-end manner, and thus the full loss function is as follow:
where , , and are trade-off weights. In our work, we set them as 1, 100, 10 and 0.0005, respectively.
3.3 Architecture Details
The detail architecture of two transform modules (, ) is shown in Fig.7. The down-sampling layer is not employed in the translator for avoiding valuable information loss. For image discriminators () and feature discriminator networks (), PatchGANs  is employed, which can better locally discriminate whether image patches are real or fake. A simple network architecture (stack the dense block under the U-Net structure)111https://github.com/Underwater-Lab-SHU/ANA-SYN is used as our enhancement parts (, ). It’s worth mentioning that our test pipeline only need the enhancement parts (, ) and the proposed rank-based IQA method, as shown in Fig.8.
In this section, we first describe the implementation details and experiment settings of our TUDA. Then, we compare it with existing representative methods on four publicly available real underwater benchmarks. Finally, a series of ablation studies are provided to verify the advantages of each component, and the model complexity and running time are analyzed.
4.1 Implementation Details
For training, a synthetic underwater image dataset is generated follow the physical model proposed in the project page of ANA-SYN. The synthetic dataset contains 9 water types222Type I, II, III, IA and IB for open ocean water and type 1C, 3C, 5C and 7C for coastal water defined in , and each type has 1000 images which are randomly chosen from RTTS dataset . The constructed dataset is divided into two parts, 7200 () images for training, denoted as Train-S7200 and 1800 () images for testing, denoted as Test-S1800. For real underwater images, as mentioned above, a large real-world underwater database including 3900 images is proposed. The database is divided two parts, 2900 images for training, denoted as Train-R2900 and 1000 images for testing Test-R1000. All images are resized to 256 × 256 and the pixel values are normalized to . Furthermore, several data augmentation techniques are performed in the training phase, such as random rotating and horizontal flipping.
Our TUDA and RUIQA are implemented in Pytorch framework and all experiments are carried out on two NVIDIA Titan V GPUs. Adam optimizer with a learning rate ofis utilized to train , , and . For , , and , we adopt an Adam optimizer with learning rate of as the optimization method. Default values of and
are set as 0.5 and 0.999. The batch size is set to 4. Models are trained for 200 epochs, and their learning rates decay linearly to zero in the next 100 epochs.
4.2 Experiment Settings
For testing, we conduct comprehensive experiments on four publicly available real-world underwater benchmarks, i.e., SQUID333The SQUID dataset contains 57 real underwater images , UIEB444The UIEB dataset contains 950 real underwater images , EUVP555The EUVP dataset contains about 1910 real underwater images  and UFO-120666The UFO-120 dataset contains about 3255 synthetic and real images . The compared algorithms include Fusion-12 , Fusion-18 , HE-Prior , UIBLA , UGAN , FUIE-GAN  and Water-Net . The first four algorithms are traditional methods, while the remaining are deep-learning methods. For all the above-mentioned methods, we use the released test models and parameters to produce their results.
|Methods||Color Error||RUIQA||Perceptual Scores|
For results on real images, performances are measured by three no-reference underwater quality assessment metrics: UCIQE, UIQM and our proposed RUIQA. For the three metrics, a higher score denotes a better human visual perception. It should be pointed out that UCIQE and UIQM are not sufficient to reflect the performance of various underwater image enhancement methods in some cases [19, 21], and thus the scores of UCIQE and UIQM are only for reference.
In addition, a user study is conducted to more accurately evaluate the visual quality of the results, in which 30 images are randomly selected from each testing dataset to scored. 15 volunteers are invited in this evaluation, and the scoring range is 0 to 5 levels, referring Bad, Poor, Fair, Good and Excellent, respectively. To evaluate the color restoration accuracy of different methods, we also calculate the color restoration accuracy on the average angular reproduction error  on the 16 representative examples presented in the project page777http://csms.haifa.ac.il/profiles/tTreibitz/datasets/ambient_forwardlooking/
index.html of SQUID. The smaller color error, the better performance.
4.3 Comparisons with State-of-the-Art Methods
In this section, we conduct quantitative and visual comparisons on diverse challenging testing dataset. The results of different methods are reported in the following subsections.
Quantitative Comparisons: The quantitative comparison results of different methods on real challenging set are listed in Table 1 and Table 2. As presented, HE-Prior achieves the highest scores in term of UCIQE, while our TUDA ranks the fourth best on all challenging set. For the UIQM scores, our method almost achieves the best performance across all datasets, and UGAN ranks the second best. The average values of the color restoration accuracy of different methods on 16 representative examples of SQUID are reported in the second column of Table 2. It can be observed that our TUDA achieves relatively low average error, making a more effectively recovery of a scene’s colors. Fusion-18 obtains the lowest color error than other methods. However, its performance is not as good as our TUDA in terms of RUIQA and Perceptual Scores. Among them, our TUDA achieves the best performance. Such results demonstrate that our TUDA produces visually more convincing results and has more robust performance in handling images taken in a diversity of underwater scenes.
The deep methods trained based on real data including Water-Net and FUIE-GAN perform relatively well, but they do not restore green or some excessively distorted images due to ignoring the intra-domain gap among real underwater images itself, and thus its performance is limited. UGAN trains the model using the synthetic data generated by GAN methods. Since the inter-domain gap is not effectively reduced, the results often contain various artifacts, and thus the subjective effect is not good, and the score is relatively low.
There is an interesting finding from the quantitative results. He-Prior almost achieves the highest UCIQE scores on all real datasets. However, its perceptual score is the lowest, which means it has the worst subjective quality. In our opinion, this is mainly due to the fact that UCIQE pays too much attention to local features (color) and ignores the entire image, and thus it is not consistent with human visual perception in some cases, especially when the enhanced image is over-enhanced (please refer to Fig.14) .
Visual Comparisons: Some visual comparisons on Test-R1000 and SQUID are shown in Fig.9 and Fig.10, respectively. For these images, except for Fusion-18  and our TUDA, other competing methods cannot achieve satisfactory results. Some of them even introduce undesirable color artifacts in their enhanced results to some extend, such as Fusion-12 , HE-Prior , FUIE-GAN  and UIBLA . Meanwhile, most methods fail to restore the structural details of underwater scene, in which UGAN  even introduces serious artifacts at the boundary of objects. Fusion-18  can restore better color than other methods, but the performance in recovering object details is not as good as our TUDA.
The results of different methods on challenging underwater images sampled from UIEB and EUVP are presented in Fig.11 and Fig.12. As presented, for the image with the greenish tone, our TUDA significant removes the haze and color casts, and effectively recovers details, producing visually pleasing results. In comparison, all the comparison methods cannot produce the realistic color. Most of them suffer obvious over-enhancement and under-enhancement, such as Fusion-12 , Fusion-18 , HE-Prior , UIBLA  and Water-Net . Fusion-18  even lost the original color of the object in the second image. For these low-light underwater images, most methods generate unrealistic results with color artifacts and cannot effectively improve the visibility of objects, and often amplify noise in their enhanced results. Our TUDA not only can effectively increase the brightness of images but also refine the object edges, produce realistic results with correct color from extremely noisy.
Visual comparisons on challenging underwater images sampled from UFO-120 are shown in Fig.13. Compared to most existing methods, our TUDA significantly reduces color distortion and satisfactorily removes blurriness. It can be seen that the images enhanced by HE-Prior  have obvious reddish color shift and artifacts in some regions. Besides, UGAN  often introduces undesirable artifacts at the boundary of objects. Most methods cannot correct the colors well, even amplify color deviation (e.g., the color of background). Fusion-18  can produce relatively good results. However, they still contain numerous noises and color distortion.
All the above quantitative and visual comparison results demonstrate that considering both reduce the inter-domain gap and the intra-domain gap in our TUDA can produce visually pleasing results and have more robust performance. Due to the limited space, more experimental results are given in the supplementary material.
4.4 Ablation Studies and Analysis
In this section, we first evaluate the performance of our proposed RUIQA and analyze its superiority. Subsequently, a series of ablation studies are conducted to analyze the contribution of each proposed component. In addition, we study the influence of different ratio of on intra-domain adaptation training and TUDA testing.
4.4.1 Effectiveness of the Proposed RUIQA method
As mentioned above, image pairs (i.e., 7200) of the underwater ranking data are randomly selected as training data, and the other image pairs (i.e., 800) are used for IQA testing. To validate the generalization ability of our RUIQA, we compare it with two state-of-the-art methods: UCIQE and UIQM in terms of two metrics: Spearman Rank Order Correlation Coefficient (SROCC) and Pearson’s linear correlation coefficient (PLCC). Table 3 lists the corresponding comparison results. As shown, our method achieves the best performance, even has good correlation with MOS on the order of 0.900 and achieves the gain of 0.5 to 0.65 in comparison to UCIQE and UIQM, showing the superiority of our RUIQA metric. A visual comparison is also shown in Fig.14. The larger values indicate a better perceptual quality. Obviously, our RUIQA can more accurately reflect the perceptual quality of the image.
In addition, an ablation study is conducted to analyze the contribution of each individual component using the following settings: 1) UIQA: using the IQA network to directly predict image quality score; 2) PUIQA: using the ResNet50 network pre-trained on ImageNet data as our initialization backbone model; 3) RUIQA: using the ResNet50 network pre-trained on our rank data as our initialization backbone model. As presented in table 3, we can see that our RUIQA achieves the best evaluation performance and is significantly better than UIQA and PUIQA. It’s worth mentioning that the ImageNet has more than 1.28 million images and our rank training dataset only contains 720 image pairs. This indicates that the pre-trained ResNet50 network on the rank dataset can capture sufficient perceptual quality information of underwater image, and then quickly help the IQA task better predict the image quality score.
|BL + ITE||59.250||3.390|
|BL + ITE + ITA||61.436||3.595|
4.4.2 Effectiveness of the inter-domain adaptation phase
In this part, we perform an ablation study of 60 images randomly chosen from enhanced images in Test-R1000 to evaluate the effectiveness of the inter-domain adaptation, as follows: 1) BL: baseline network (trained on synthetic data); 2) BL+ ITE: baseline network with the inter-domain adaptation, i.e., dual-alignment network. Results are listed in Table 4. It can be seen that baseline network has only slightly higher PSNR values in comparison to our dual-alignment network (0.219dB higher on average), but the perceptual quality is far worse than the dual-alignment network (3.716 and 0.244 lower on average in RUIQA and perceptual score of user study, respectively). Such results show that our inter-domain adaptation part generates the enhanced results with well reconstructed details (high fidelity) and good perceptual visual quality. Some examples are shown in Fig.15, in which our inter-domain adaptation phase can better correct color casts and avoid over-enhancement than baseline network.
|Running time (s)||0.051||0.009||0.083||0.582|
4.4.3 Effectiveness of the intra-domain adaptation phase
In the intra-domain adaptation part, we conduct an ablation study of 60 images randomly selected from enhanced results in Test-R1000 with the following settings: 1) BL+ITE: baseline network with the inter-domain adaptation; 2) BL+ITE+ITA: baseline network with the inter-domain and the intra-domain adaption. The averaged RUIQA value and the averaged perceptual score are reported in Table 5. It can be seen BL+ITE+ITA achieves better performance, even the average performance gains up to 2.186 and 0.205 in two metrics, respectively. This indicates that intra-domain adaptation can effectively process hard samples and significantly improve the perceptual quality of the image, making enhanced results are more subject to human preferences. In addition, a few samples are illustrated in Fig.15. It can be noted that if only conduct inter-domain adaptation phase, the results of some hard samples still contain some noises and over-enhancement artifacts in some region. In other words, our intra-domain adaptation part is robust for real-world extremely hard underwater image enhancement, producing visually more pleasing results.
4.4.4 Analysis of Hyperparameter
denotes the real underwater images for inter-domain training, in our work, . The inter-domain enhancement part receives the input and outputs the enhanced image . The proposed RUIQA is used to evaluate their perceptual quality score, i.e., . We rank the value (i.e., ) and select the corresponding value of the ratio as the threshold (i.e., ) to separate the real underwater data into easy and hard samples (i.e., and ) for intra-domain training and the whole framework testing. Thus, different values of the ratio will have a significant impact on subsequent operations. Here, some experiments are conducted to decide the optimal in our framework. For a selected ratio , we first conduct intra-domain adaptation training. Then, the 2900 real training data (Test-S2900) is used as validate data in the test pipeline (see Fig.8). Finally, we predict the average perceptual quality score of 2900 enhanced images in term of the RUIQA metric, and set it as the metric for selecting . Results are shown in Table 6, where it can be observed that , the proposed method can achieve better performance.
4.5 Model Complexity Analysis
We compare the flops, parameters and time cost of different learning-based methods on a PC with an Intel(R) i5-10500 CPU, 16.0GB RAM, and a NVIDIA GeForce RTX 2080 Super. The test dataset is UIEB benchmark, which includes 950 images and its size is 256x256x3. The source codes and test parameters of all the compared methods are provided by their authors, and the results are presented in Table 7.
As presented, the computational aspect and time cost of our method are ideal. UGAN has the shortest running time, but its flops and parameters are the most, far exceeding our method. The size, computation and time cost of FUIE-GAN are less than our TUDA. However, the generalization performance on four real underwater benchmarks is limited, not as good as our method. The parameters of Water-Net is the least, but its flops and time cost are large. Such results demonstrate that our TUDA can achieve good performance and efficiency.
In this paper, a novel two-phase underwater domain adaptation method is proposed for enhancing underwater images, which contains an inter-domain adaptation and an intra-domain adaptation phase to jointly optimize the inter-domain gap and the inter-domain gap. Firstly, a dual-alignment network is introduced to jointly perform image-level and feature-level alignment using adversarial learning for better closing the inter-domain gap. Secondly, a simple yet efficient rank-based underwater IQA method is developed, which can evaluate the perceptual quality of underwater images with the aid of rank information, named RUIQA. Finally, coupled with the proposed RUIQA, an easy/hard adaptation technique is conducted to effectively reduce the intra-domain gap between easy and hard samples. Extensive experiments on four real underwater benchmarks demonstrate that our TUDA can significantly perform favorably against other state-of-the-art algorithms, particularly on eliminating color deviation, increasing contrast and avoiding over-enhancement.
-  R. Hummel, “Image enhancement by histogram transformation,” Computer Graphics and Image Processing, vol. 6, pp. 184–195, 1975.
-  K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics Gems, 1994.
-  M. S. Hitam, E. A. Awalludin, W. N. Jawahir Hj Wan Yussof, and Z. Bachok, “Mixture contrast limited adaptive histogram equalization for underwater image enhancement,” in Proc. Int. Conf. Computer Applications Technology (ICCAT), pp. 1–5, 2013.
C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwater
images and videos by fusion,” in
Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 81–88, June 2012.
-  X. Fu, P. Zhuang, Y. Huang, Y. Liao, X.-P. Zhang, and X. Ding, “A retinex-based enhancing approach for single underwater image,” in Proc. IEEE Int. Conf. Image Processing (ICIP), pp. 4572–4576, June 2014.
-  S. Zhang, T. Wang, J. Dong, and H. Yu, “Underwater image enhancement via extended multi-scale retinex,” Neurocomputing, vol. 245, pp. 1–9, 2017.
-  C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and P. Bekaert, “Color balance and fusion for underwater image enhancement,” IEEE Trans. Image Process., vol. 27, no. 1, pp. 379–393, 2018.
-  C. Ancuti, C. D. Vleeschouwer, and M. Sbert, “Color channel compensation (3c): A fundamental pre-processing step for image enhancement,” IEEE Trans. Image Process., vol. 29, pp. 2653–2665, 2020.
-  J. Chiang and Y. Chen, “Underwater image enhancement by wavelength compensation and dehazing,” IEEE Trans. Image Process., vol. 21, pp. 1756–1769, 2012.
-  P. Drews, E. R. Nascimento, S. Botelho, and M. Campos, “Underwater depth estimation and image restoration based on single images,” IEEE Comput. Graph. Appl., vol. 36, pp. 24–35, Mar. 2016.
-  C. Li, J. Guo, R. Cong, Y. Pang, and B. Wang, “Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior,” IEEE Trans. Image Process., vol. 25, pp. 5664–5677, 2016.
-  Y.-T. Peng and P. Cosman, “Underwater image restoration based on image blurriness and light absorption,” IEEE Trans. Image Process., vol. 26, pp. 1579–1594, Apr. 2017.
-  D. Berman, D. Levy, S. Avidan, and T. Treibitz, “Underwater single image color restoration using haze-lines and a new quantitative dataset,” IEEE Trans. Pattern Anal. Mach. Intell., 2020.
-  D. Akkaynak and T. Treibitz, “Sea-thru: A method for removing water from underwater images,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1682–1691, 2019.
-  C. Li, S. Anwar, and F. Porikli, “Underwater scene prior inspired deep underwater image and video enhancement,” Pattern Recognit., vol. 98, 2020.
-  J. Li, K. Skinner, R. Eustice, and M. Johnson-Roberson, “Watergan: Unsupervised generative network to enable real-time color correction of monocular underwater images,” IEEE Robot. Autom. Lett., vol. 3, pp. 387–394, 2018.
-  C. Fabbri, M. J. Islam, and J. Sattar, “Enhancing underwater imagery using generative adversarial networks,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pp. 7159–7165, 2018.
-  C. Li, J. Guo, and C. Guo, “Emerging from water: Underwater image color correction based on weakly supervised color transfer,” IEEE Signal Process. Lett., vol. 25, pp. 323–327, 2018.
-  C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,” IEEE Trans. Image Process., vol. 29, pp. 4376–4389, 2020.
-  X. Chen, J. Yu, S. Kong, Z. Wu, X. Fang, and L. Wen, “Towards real-time advancement of underwater visual quality with gan,” IEEE Trans. Ind. Electron., vol. 66, pp. 9350–9359, 2019.
-  C. Li, S. Anwar, J. Hou, R. Cong, C. Guo, and W. Ren, “Underwater image enhancement via medium transmission-guided multi-color space embedding,” IEEE Trans. Image Process., vol. 30, pp. 4985–5000, 2021.
-  J. Jaffe, “Computer modeling and the design of optimal underwater imaging systems,” IEEE J. Ocean. Eng., vol. 15, pp. 101–111, 1990.
-  A. Dudhane, P. Hambarde, P. Patil, and S. Murala, “Deep underwater image restoration and beyond,” IEEE Signal Process. Lett., vol. 27, pp. 675–679, Apr. 2020.
-  Y. Shao, L. Li, W. Ren, C. Gao, and N. Sang, “Domain adaptation for image dehazing,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2805–2814, 2020.
-  F. Pan, I. Shin, F. Rameau, S. Lee, and I. S. Kweon, “Unsupervised intra-domain adaptation for semantic segmentation through self-supervision,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3763–3772, 2020.
-  I. Shin, S. Woo, F. Pan, and I.-S. Kweon, “Two-phase pseudo label densification for self-training based domain adaptation,” ArXiv, vol. abs/2012.04828, 2020.
-  C. Zheng, T.-J. Cham, and J. Cai, “T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks,” in Proc. European Conference on Computer Vision (ECCV), pp. 798–814, Springer International Publishing, 2018.
-  S. Zhao, H. Fu, M. Gong, and D. Tao, “Geometry-aware symmetric domain adaptation for monocular depth estimation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 9780–9790, 2019.
-  M. A. V. D. I. Gulrajani, F. Ahmed and A. Courville, “Improved training of wasserstein gans,” in Neural Information Processing Systems, p. 5767–5777, Mar. 2017.
-  R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, pp. 4861–4875, 2020.
M. Yang and A. Sowmya, “An underwater color image quality evaluation metric,”IEEE Trans. Image Process., vol. 24, pp. 6062–6071, 2015.
-  K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired underwater image quality measures,” IEEE J. Ocean. Eng., vol. 41, pp. 541–551, July 2016.
-  D. Li, T. Jiang, W. Lin, and M. Jiang, “Which has better visual quality: The clear blue sky or a blurry animal?,” IEEE Trans. Multimedia, vol. 21, pp. 1221–1234, May 2019.
-  S. Su, Q. Yan, Y. Zhu, C. Zhang, X. Ge, J. Sun, and Y. Zhang, “Blindly assess image quality in the wild guided by a self-adaptive hyper network,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3664–3673, June 2020.
-  W. Zhang, Y. Liu, C. Dong, and Y. Qiao, “Ranksrgan: Generative adversarial networks with ranker for image super-resolution,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), pp. 3096–3105, Oct. 2019.
-  M. J. Islam, Y. Xia, and J. Sattar, “Fast underwater image enhancement for improved visual perception,” IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 3227–3234, 2020.
-  X. Fu, Z. Fan, M. Ling, Y. Huang, and X. Ding, “Two-step approach for single underwater image enhancement,” in Proc. Int. Symp. Intelligent Signal Processing and Communication Systems (ISPACS), pp. 789–794, 2017.
-  Q. Wu, L. Wang, K. N. Ngan, H. Li, F. Meng, and L. Xu, “Subjective and objective de-raining quality assessment towards authentic rain image,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 11, pp. 3883–3897, 2020.
-  R. I.-R. BT, “Methodology for the subjective assessment of the quality of television pictures,” International Telecommunication Union, 2002.
-  P. ITU-T RECOMMENDATION, “Subjective video quality assessment methods for multimedia applications,” International telecommunication union, 1999.
-  H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Trans. Image Process., vol. 15, no. 11, pp. 3440–3451, 2006.
-  N. Ponomarenko, O. Ieremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, et al., “Color image database tid2013: Peculiarities and preliminary results,” in european workshop on visual information processing (EUVIP), pp. 106–111, IEEE, 2013.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” inProc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976, 2017.
-  B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE Trans. Image Process., vol. 28, pp. 492–505, Jan. 2019.
-  M. J. Islam, M. Fulton, and J. Sattar, “Toward a generic diver-following algorithm: Balancing robustness and efficiency in deep visual detection,” IEEE Robot. Autom. Lett., vol. 4, pp. 113–120, Jan. 2019.