HybrUR: A Hybrid Physical-Neural Solution for Unsupervised Underwater Image Restoration

Robust vision restoration for an underwater image remains a challenging problem. For the lack of aligned underwater-terrestrial image pairs, the unsupervised method is more suited to this task. However, the pure data-driven unsupervised method usually has difficulty in achieving realistic color correction for lack of optical constraint. In this paper, we propose a data- and physics-driven unsupervised architecture that learns underwater vision restoration from unpaired underwater-terrestrial images. For sufficient domain transformation and detail preservation, the underwater degeneration needs to be explicitly constructed based on the optically unambiguous physics law. Thus, we employ the Jaffe-McGlamery degradation theory to design the generation models, and use neural networks to describe the process of underwater degradation. Furthermore, to overcome the problem of invalid gradient when optimizing the hybrid physical-neural model, we fully investigate the intrinsic correlation between the scene depth and the degradation factors for the backscattering estimation, to improve the restoration performance through physical constraints. Our experimental results show that the proposed method is able to perform high-quality restoration for unconstrained underwater images without any supervision. On multiple benchmarks, we outperform several state-of-the-art supervised and unsupervised approaches. We also demonstrate that our methods yield encouraging results on real-world applications.



There are no comments yet.


page 1

page 3

page 4

page 7

page 8

page 9

page 10

page 11


Single Underwater Image Restoration by Contrastive Learning

Underwater image restoration attracts significant attention due to its i...

Optimal Transport for Unsupervised Restoration Learning

Recently, much progress has been made in unsupervised restoration learni...

UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing

In real-world underwater environment, exploration of seabed resources, u...

Reveal of Domain Effect: How Visual Restoration Contributes to Object Detection in Aquatic Scenes

Underwater robotic perception usually requires visual restoration and ob...

Data-driven controllers and the need for perception systems in underwater manipulation

The underwater environment poses a complex problem for developing autono...

Towards Quality Advancement of Underwater Machine Vision with Generative Adversarial Networks

Underwater machine vision attracts more attention, but the terrible qual...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Fig. 1: Example of the restoration process. In the case of ignoring the camera halo model of any original underwater picture (without annotated restored result), the backscattering and attenuation images for R, G, and B channels could be split according to the Jaffe-McGlamery image degradation model. Following the inverse derivation of this model, the theoretical overwater picture can be obtained.

With the rapid development of underwater robots in recent years, humans have stringendo sped up the deep-sea exploration. Meanwhile, real-time and accurate underwater perception has become a vital inherent technique in most diving applications. Among the numerous candidate methods, the vision-based method has become a timely topic due to its low price and abundant information [1, 2]. However, the quality of underwater images is reduced by some optical challenges, including floating particulate matters, reflective effects, and optical caustics, which results in an obstacle to the post-processing of underwater visual signals, such as simultaneous localization and mapping (SLAM) [3] and object detection [4].

Unfortunately, it is hard to collect aligned underwater-terrestrial image pairs for supervised learning. Therefore, researchers usually apply traditional enhancement methods to generate pseudo ground truth

[5, 6, 7, 8, 9, 10]. Thereafter, an encoder-decoder structure, such as Unet [11], is used towards image translation [14, 15, 16, 17, 18, 12, 13, 19]. This kind of method is complicated to train and fragile in parameter settings. Most critically, their capability for color correction and contrast enhancement is restricted by the artificial labels, leading to modest restoration performance. Thereby, the unsupervised approach is worthy of in-depth exploration.

Recently, unsupervised methods usually leverage unpaired images to overcome the above-mentioned problems in supervised approaches [20, 22, 23, 24, 21]. However, only a few pieces of literature successfully leverage the unsupervised method on underwater image restoration [20, 25]. There are two main reasons for this dilemma. First, underwater image restoration is not a simple generative task, nor is it a naive variation of style transfer based on the disentanglement theory. In fact, the disentanglement theory has been widely employed in classical style transfer problems, such as seasonal change, painting style change, etc [26, 21]

. Noticeably, compared to other style transfer tasks, the image restoration task is closely related to the explicit degradation model. Nevertheless, most existing unsupervised restoration networks are still adopting the traditional training architecture including a single image-to-image generator, which can hardly achieve realistic color correction for lack of optical constraint (as shown in our experiment). Hence, we consider the problem of implementing a cyclic disentanglement architecture with a particular optical model instead of a naive transformation network. The second reason is that most previous unsupervised restoration efforts are purely data-driven. Whereas, the training dataset is usually too scarce to fully cover the rich data domains of underwater images. We consider that no matter what sort of underwater shots basically conform to the optical degeneration process, and the input information that is required to learn degradation coefficients is far less than that of the training of a whole underwater image. The dismantling of a one-step restoration procedure into several modular physics-driven networks is conducive to visualizing and adjusting the intermediate features. To this end, we propose an unsupervised data- and physics-driven underwater image restoration training architecture (see Fig. 


By leveraging the representation disentanglement theory [22, 27], we formulate underwater image restoration as a closed-loop training of two synthetic encoder-decoder structures that internally dissociate the image into attenuation, scattering, the spectrum of veiling light, and the scene depth map, without supervision for any of these factors [6]

. Different from deep convolution neural network (DCNN) utilized in most other unsupervised methods, we basically construct the generator by simulating the degradation process with submodules and combining them into an integrated network. Only through the internal relationship of the physical model, the multiple factors are interrelated to each other, showing a strong interpretation. In detail, the scene depth network is designed as a typical encoder-decoder network. The remanent elements are studied by three encoders that output in

R, G, and B channels, respectively.

Intuitively, in terms of disentanglement, the scene depth map is responsible for preserving image content, while attenuation, scattering, and veiling light are mainly formulated for the image style. Accordingly, our four-module generator produces a two-dimensional (2D) map for depth structure and three groups of three-dimensional (3D) degradation coefficients. These generation modules differ in task and model size, so the divergence of their convergence abilities could lead to a mode collapse. In order to coordinate the training of four modules and improve the restoration quality, we propose two hypotheses. The first is the correlation hypothesis: the attenuation and scattering are not only related to the input underwater image but also associated with the scene depth. To this end, we concatenate the raw image and generated depth map together as a four-channel input to predict attenuation and scattering. Second, we assume that the backscattering estimation of R, G, and B channels calculated with the scene depth, scattering coefficient, and veiling light is equal to the RGB

values of the darkest pixels in the dark channel prior (DCP) map. On the one hand, this hypothesis could derive an additional loss function that enhances the gradients of the depth network. On the other hand, the constraint conforms to the formation of an underwater image and forces the degradation coefficients to converge in a reasonable range.

Our contributions are summarized as follows:

  • We propose a hybrid physical-neural framework for underwater visual restoration, which can be trained in an unsupervised way without the need for aligned underwater-terrestrial image pairs.

  • We propose a correlation hypothesis between scene depth and two degradation factors, aiming to accelerate modular training and avoid mode collapse.

  • We assume an equivalence relationship between the backscattering estimation and DCP map, producing an accurate backscattering mask for contrast enhancement.

  • Experiments on RUIE and U45 datasets show that the proposed method outperforms state-of-the-art unsupervised visual restoration methods and even some supervised methods in terms of various metrics [29, 28, 30, 19].

The remainder of the paper is organized as follows. Section II briefly introduces the related work according to categories of implementation. Thereafter, the theoretical model and methodology are presented in Section III. Section IV illustrates a particular ablation study and comparative experimental results of our method and contemporary state-of-the-art approaches. Finally, the conclusion and future work are summarized in Section V.

Ii Related Works

In order to assess our contribution in relation to the vast literature on underwater image restoration and quality enhancement in a deep learning manner, it is important to consider two aspects of each approach: whether there is artificial supervisory information and whether the method refers to a certain physical model. Previous research can be broadly divided into three-fold based on the availability of models and supervision, and our approach belongs to unsupervised learning for restoration.

Ii-a Supervised Enhancement

In the early stage of a scarcity of creditable underwater image models, researchers mainly study pixel-level operation to achieve subjective perception improvement. Such methods include classical histogram equalization, white balance, wavelet transform, and some color correction methods specifically for underwater images [6, 9, 14]. These methods mainly referred to the numerical equalization of pixels, and did not relate to specific scene depth and illumination, reflecting a strong artifact. On the basis of traditional methods, researchers introduced deep learning method to directly study an end-to-end enhancement network. DCNN is verified to work as a generic image converter with proper loss functions and be widely used in image quality enhancement framework [8, 15, 16]. Wang et al. proposed a UIE-Net to address the issue of color correction and dehazing by two individual modular networks [18]. Hou et al. combined the CNN with gray world algorithm to refine the image blurring due to backward scattering [31]. Li et al. proposed an underwater image enhancement CNN model based on a prior of underwater scenes, which did not have to estimate the parameters of the underwater model [13]. These methods need to trade-off between generalization and enhancement, and are sensitive to loss functions. To some extent, although the generalization is strengthened by CNN, the visual effect such as color distribution and contrast can only culminate in traditional enhancement results.

Ii-B Supervised Restoration

In contrast, the model-based method usually presents strong interpretability through a certain physical model, but their restoration results vary a lot among different models. In general, optical models that are widely employed in image restoration include modified atmospheric turbulence model, dehaze model, Jaffe-McGlamery degradation model, etc. [19, 12, 32]. Most traditional model-based approaches are limited in color correction for some extreme aquatic environments [33, 19]. To cope with this problem, the Sea-Thru algorithm adds several assumptions to the Jaffe-McGlamery model, and obtains nearly realistic overwater outputs by using a credible depth map [32]. However, the Sea-Thru heavily relied on an accurate depth estimation, so it could produce unreasonable color distributions when combined with off-the-shelf depth estimation methods. In addition, most model-based restoration approaches are limited to parameter search and optimization, which costs too much computation time.

The learning approach conforming to an optical law has been investigated in recent years. Most of them are concentrated on supervised training, such as WaterGAN and UWGAN [15, 17]. These methods separately trained the optical feature maps including attenuation layer, scattering layer, and camera halation via decoupling the raw image. According to overwater visual signals, the generation network generated fake underwater images, which were employed as annotations to train a restoration network with supervision. On the contrary, Chen et al. first used atmospheric turbulence model and filtering-based restoration scheme (FRS) to obtain enhanced images, and then trained a GAN network in a supervised manner [19]

. In general, the second step of the above methods is supervised learning. Hence, they use enhanced and synthetic underwater images as the pseudo ground truth, which probably becomes the bottleneck of network performance. Besides, some modified CycleGAN architectures are also utilized as a backbone of the annotation generation network, such as deep residual CycleGAN and edge enhanced CycleGAN

[16, 24]. However, similar methods can hardly obtain a legible restored image through simply forward inference, and are more sensitive to loss function than antrorse CNN.

Ii-C Unsupervised Restoration

Researchers naturally have considered learning a direct restoration network with style transferring. Similarly, the unpaired terrestrial-underwater images can form a closed loop with self-supervision, thus avoiding the generation of annotation. Li et al. proposed a restoration transfer based on weak supervision, but they did not consider any optical law [20]. Therefore, they introduced specific evaluation information, such as contrast, and details into the loss function, which distinctly improved the generative effect. Whereas there was still some problem of color deviation, and the color distribution of some extreme picture fields was unreasonable. To solve this problem, we fuse the Jaffe-McGlamery model into the style transfer framework. With the input of underwater RGB images, we design a modular generator to produce the scene depth map, backscattering coefficient, attenuation coefficient, and underwater veiling light, respectively. We leverage the forward and inverse optical model to build a cyclic generation network. In order to improve the final restoration effect, we jointly optimize multiple loss functions, and apply two strong hypotheses to strengthen the intrinsic constraint of the network.

Iii Method

Fig. 2: The architecture of HybrUR. The above part shows the unsupervised cyclic training mode, in which the underwater and overwater image pairs serve as the initial and target image domains, respectively, and corporately compose a closed loop. In this architecture, adversarial loss, reconstruction loss, and perceptual loss are employed. The bottom part presents the structure of a single generator decomposing into attenuation, scattering, veiling light, and scene depth, the latter three elements of which should be calculated as a backscattering estimation loss together.

Iii-a Overall Architecture

Given a random underwater snapshot, we try to make the restored result as realistic as a true terrestrial image. Considering that there is no natural dewatering restoration image as a canonical reference in addition to the employment of artificial restored labels, we intend to discard the step of generating supervised annotations, and study the forward-and-inverse image degradation procedure directly with a cyclic network. Based on the training architecture of CycleGAN, our proposed network disentangles the content and style of images according to the Jaffe-McGlamery model. The overall training framework and prototype of a single generator are visualized in Fig. 2.

We consider the restoration problem as a style transfer between the underwater image domain and the terrestrial image domain while simultaneously retaining the details and structure of the original image. Different from most typical style transfer tasks, such as seasonal changes or painting style transfer, underwater image degradation could be formulated with optical processes, namely attenuation and scattering:


where indicates an underwater image in color channel , and represent attenuation and scattering. Attenuation, known as light absorption, is the main reason for image color deviation. Scattering includes two modes of forwarding scattering and backscattering, which are affected by suspended particles. In general, forward scattering can decrease the antrorse brightness of the image, while backscattering mainly leads to low contrast and blur.

In those underwater images shot with natural light, backscattering tends to have a more intensive influence on the image formation than forwarding scattering. In order to further construct our generator network, we adopt a more detailed Jaffe-McGlamery model and deduce its inverse expression, which are expressed as follows:


Eq. (2) describes the transformation of a terrestrial image to a synthetic underwater image, while Eq. (3) represents its reverse operation. represents unattenuated images restored from the raw input . is defined as the distance between camera and scenery along the line of sight. indicates veiling light. Factors and represent the dependencies of the coefficients and on scene depth , reflectance, spectrum of ambient light, physical scattering and beam attenuation coefficients. Note that some previous literature assumed that to simplify the parameters optimization [15, 18]. Instead, we consider keeping the two coefficients independent to distinguish their correlations and describe the degradation in detail [32, 12].

Fig. 3: Scene depth estimation and restoration of Sea-Thru and HybrUR. The restored results above show that consistent color correction and high brightness are conducted by HybrUR. The comparison of depth maps graphically explains the color confusion of the Sea-Thru algorithm owing to the limitation of aquatic depth estimation, while it is significantly improved in HybrUR.

Unlike the traditional pipeline of a style transfer, we do not disentangle the style (color, illumination, etc.) and content (texture, edge, etc.) of a single underwater image by building two complicated subnetworks. Considering that the scene depth map describes the 2D image information, and degradation factors only include numerical coefficients of different channels, we make a general hypothesis that scene depth, attenuation, scattering, and veiling light in the Jaffe-McGlamery model could express information about image style and content, respectively. Thereby, we design four corresponding subnetworks, which together compose an image style converter. In practice, the depth map is generated by an encoder-decoder, while attenuation coefficient , backscattering coefficient , and veiling light are produced by three individual encoders, respectively. Note that two inherent flaws of a naive image-to-image generator, namely color deviation and detail loss, are exactly two problems associated with style and content. To this end, we consider the depth map is primarily responsible for preserving the details of the restored image, while the remaining coefficients are used to implement optics-based style changes. Moreover, we assume that both and are related to a concatenation of raw image and depth , which has been proved to contribute to restoration performance in our ablation study. Besides, Eq. (2) is originally formulated for imaging in the horizontal direction, while we apply it to the scene in different directions under the hypothesis of small deviations [32]. As shown in Fig. 2, we should symmetrically initialize two sets of generators and discriminators to build a training closed-loop. The submodules implementation are detailed as follows.

Iii-B Scene Depth Map

According to the generator structure, depth map is embedded in both attenuation and scattering processes, constituting the content layer of raw input and . Nevertheless, we could not conduct the depth network to generate an overrefined depth map, since we find that the loss functions for excessive details can easily lead to the mode collapse during training. We also test that the state-of-the-art monocular depth estimation algorithm is not suitable for underwater scenes, and will also bring a large computational burden to the embedding platform. Hence, our estimation of scene depth is not designed to provide an extremely accurate depth map, but to train a range map that fits with the degradation coefficients, integrally contributing to better perception through joint optimization.

In practice, we build the backbone of the generator with 2 down-sampling layers and 6 residual blocks with the input of

images. Moreover, we get rid of the last activation function and decentration, which may influence the generative distribution of the depth map. Regarding the output of the depth net represents the realistic distance between scenery and camera, we regulate it into a range

according to the training set. In a pre-experiment, we qualitatively compared the restoration performance and depth estimation of the Sea-Thru and HybrUR as shown in Fig. 3. Four diagrams of Sea-Thru all contain different levels of unreasonable color deviations, while our method provides a consistent color correction. Since an off-the-shelf monodepth is employed for depth estimation in Sea-Thru, it could produce an inaccurate depth projection for underwater images. By contrast, the proposed depth network in HybrUR could learn more effective depth maps through joint optimization, which is also proved to be conducive to the improvement of color restoration.

Iii-C Veiling Light

indicates the underwater veiling light in Eq. (2). In underwater imaging, veiling light occurs when sunlight penetrates the water surface from the atmosphere. The light is reflected by the surface and refracted at the air-sea interface consecutively. We need to find out what factors are necessary to estimate the veiling light from its formulation, which is given by


Among most methods, the intensity of veiling light is mainly related to the optical parameter and the spectral reflection of camera imaging [37]. represents the wavelength of visible light in the wavelength range  nm, so that we should distinguish the veiling light in different channels. indicates the total amount of veiling light at depth  m, so that denotes the amount at the surface. and are physical scattering and beam attenuation coefficients, respectively. denotes vertical attenuation factors, which should be distinguished from . Since the employed dataset basically does not include any camera information, the camera halation is ignored in our modeling. Therefore, we obtain the values of veiling light for by parsing a single underwater picture. Besides, we make it physically meaningful by limiting the veiling light to in line with the suggestion in [34, 25].

Iii-D Attenuation and Scattering

As mentioned above, attenuation and scattering coefficients principally carry the style characteristics of underwater images, which need to be separately modeled and trained. Their explicit expressions are as follows:


Particularly, both two coefficients are related to common , , and that hard to be modeled. However, attenuation factor is more affected by depth , while is related to certain physical property . Noticeably, there is no fixed experimental values of and , but we could constraint the and to under the condition of . In detail, we employ two encoders with Sigmoid activation to produce attenuation and scattering coefficients for R, G, and B channels. However, rapid convergence of coefficients network is possibly achieved at the early stage of the depth network training. Thereby, explosive gradients could appear because of discordance between depth map and attenuation/scattering, leading to a mode collapse. To cope with this problem, we conduct two hypotheses based on physical models. The first one is that both attenuation and scattering are directly related to the scene depth, so we concatenate the raw image and depth as the input of the coefficient networks. This is because attenuation and scattering are definitely affected by the depth map in a certain trend according to Eq. (5) and Eq. (6), which could be regressed by the coefficient network. More importantly, the coefficient network keeps updating with the training of depth network, relieving the problem of mode collapse.

The second hypothesis is that the estimation of the total veiling light, scattering coefficient, and depth map meet an explicit relationship with the DCP map of both real and synthetic underwater images. Backscattering increases exponentially with , and eventually saturates [32]. In general, the RGB intensity where the reflectance or complete shadow . Hence, we utilize the DCP map to estimate the backscattering. Different from traditional DCP, we specially filtrate top 1 dark pixels instead of a whole DCP map to calculate a loss with the estimation of backscattering, which we model as


where represents the estimation of backscattering. In detail, we first create a mask , where the pixel with the lowest DCP values in the bottom 1 is set to 1, and the others are 0. Whereafter, across the whole image, operates as an overestimate of backscattering. Regarding a constraint, the hypothesis naturally provides an additional loss function. Scene depth, veiling light, and backscattering can be further optimized by decreasing the loss of left and right terms in Eq. (7).

Iii-E Loss and Training

Under the overall generative framework, the synthetic loss function of training is as follows:


where and indicate the traditional adversarial loss and cycle consistency loss, respectively. represents the perceptual loss to cope with edge imperfections and strengthen the details of generated results. is defined in Eq. (8) to improve the stability during training. Besides, four intrinsic coefficients , , and are defined to fine-turn the effect of the loss function on final results.

We keep classic CycleGAN loss to benefit from the self-supervision mechanism of a cyclic architecture, which is expressed as


where and represent the forward and inverse generators, respectively. and indicate the raw underwater and terrestrial images. Mao et al. suggested that the loss function based on cross entropy might lead to the gradient disappearance, and utilized the minimum mean square error to train GAN(LSGAN) [35], which is given by


where denotes the discriminant error, and represents discriminators. The two discriminators are both based on the


(CBR) module. Inspired by PatchGAN [23], we construct the discriminant branches with the idea of Patch, where patch size is related to the number of CBR modules. In our approach, we use 3 CBR modules to build the discriminator. Moreover, multi-scale discrimination is additionally applied to contribute to restoration [26].

However, it is not always robust to gather ideal restored images with these typical losses, especially when the network is trained with improper initialization. Since the loss function is sensitive to depth map imperfections and discordance of degradation coefficients, we add a perceptual loss to solve this problem. We employ an off-the-shelf image encoder (VGG16) to extract the representation where is the corresponding spatial domain [36]. The loss function of which is expressed as


where denotes the image reconstructed by cyclic network. We calculate the mean square error for each pixel index in the -th layer. In practice, we found that the restored results are better if singly using the features from layer of VGG16.

In addition, the second hypothesis derives a loss function to constrain the depth, veiling light, and scattering term (See Section III-D). We formulate the function as the error between the estimation of backscattering and the RGB values of darkest pixels in raw images. Note that this constraint is only valid for real and generated underwater images. The loss function is given by


In particular, not all the smallest 1 of pixels in some images with a poor light condition or rife shadow surroundings are employed. We only select no more than 10 thousand pixels in a piece of 256 256 image to calculate the loss.

Iv Experiments and Discussion

In this section, we present the details of configuration and trade-offs in training. To prove the superiority of the data- and physics-driven restoration architecture, we conduct a comprehensive ablation study corresponding to the proposed network structure and two hypotheses. The visualization results of backscattering and attenuation illustrate modeling and interpretability of the generator, indirectly confirming the advantage of the physical-neural hybrid restoration network. By comparing the restored results of a public dataset U45 and a nearshore turbid image dataset RUIE, we demonstrate that the proposed method presents better results in a variety of underwater image quality indicators. Especially, the subjective effect of our restored results is more prominent. Besides, we verify our enhancement performance based on several underwater vision-based application tests including keypoint extraction and matching.

Iv-a Training Details

We mainly adopt some turbid underwater photos of the offshore seabed with the size of as the training set, which is aimed at offshore underwater operation tasks. Therefore, we select 2076 images from the dataset RUIE as the underwater image domain. Different from the natural scenery pictures used in [20], we sample the same number of images as the terrestrial image domain from an indoor vision dataset Sun RGB-D. All images are used directly without data augmentation. We apply a batch size of 16. For sake of better testing generalization, we impose visual input from underwater data that are unaligned to the images from terrestrial data.

Since the proposed generator is modularized, our training parameters are set differently from CycleGAN. Concerning scene depth, degradation coefficients, and the discriminators, the learning rates are set to 0.0002, 0.0001, and 0.0001, respectively, which linearly decayed from the 30th epoch. The Adam optimizer is initialized with

. Note that specific weights of different loss functions are crucial in that they tightly affect the convergence and restoration effect. In general, is in charge of the visual effect of the generative results, while , , and primarily guarantee the stability during training. If the latter three terms are too large, the depth network should converge quickly and produce an unreasonable depth map, resulting in a mode collapse. In order to alleviate the color deviation to a large extent, we emphasize the generative error as well as relatively weaken the cycle consistency error and perception error. Experimentally, , , , and are identified based on the model performance.

Iv-B Comparison Methods

For well assessing the proposed method among acknowledged prime underwater restoration and enhancement networks, we make a comparison with some advanced and off-the-shelf methods. These methods include three model-based supervised approaches, UWGAN-UIE, GAN-RS, and deep image formation model (DIFM) [18, 19, 12]. In addition, we also compared with an unsupervised method Water-Net to demonstrate the improvement of unsupervised training brought by our approach [20]

. All these methods are implemented with open source codes. We separately compared them in both dataset U45 consisting of various water types and the dataset RUIE that is made up of turbid underwater images. Particularly, UWGAN-UIE provides several pre-trained models for three different water types, we only select one water type that is closest to our training set.

Iv-C Ablation Study

Fig. 4: Qualitative comparison of traditional CycleGAN and HybrUR in ablation study. (a) Raw visual input. (b) is the result of baseline. (c) indicates the output of the baseline attached with SESS structural constraint. (d) is the result of HybrUR. Traditional style transfer is easily affected by the input image domain, and the edge details are extremely destroyed. HybrUR exhibits better color consistency and content retention due to the use of degradation model and assumed constraints.

To evaluate the restoration enhancement of the proposed HybrUR and its relative hypotheses, structural similarity index measure (SSIM) and underwater color image quality evaluation (UCIQE) are employed as evaluation indexes in the ablation experiment. SSIM indicates brightness, contrast, and structure scores between original image and restored image. UCIQE is a linear combination of color concentration, the contrast of brightness, and mean of saturation. It is a non-reference image quality evaluation index, which is used to quantitatively evaluate the color deviation, blur, and brightness contrast of underwater images. UCIQE is formulated via the following expression


where , , and

represent the standard deviation of chroma, contrast of brightness, and mean value of saturation, respectively.

, , and are corresponding weights given by [28].

Tab. I uses the RUIE and U45 datasets to compare the underwater image quality by a CycleGAN-based baseline and our proposed method with or without two hypotheses. On the whole, CycleGAN is a model-free method used to benchmark our method. In detail, the baseline is trained with classical CycleGAN architecture and the same training set, while all loss functions except backscattering error (see Section III-E) are completely employed. As shown in Fig. 4, the baseline performs a relatively acceptable color correction with our proposed loss functions, but the generator could not recognize the foreground and background well, so that it presents an indistinctive color style. Moreover, the results of CycleGAN show that it is difficult to learn texture and image style perfectly at the same time by pixel-level operation, which is also consistent with low SSIM values. Therefore, some researchers fuse SSIM or strong edge and structural similarity (SESS) into the loss function to preserve detail [20]. As a result, such an approach could refine the image structure moderately, but the result remains blur and color deviation, as shown in Fig. 4. By contrast, our method presents advantages in both structural information and UCIQE by equipping with the Jaffe-McGlamery model. Furthermore, to understand the influence of the individual hypothesis of the model, we remove them one by one and evaluate the performance. For a single dataset, the second row denotes a pure model-based architecture without any hypotheses. The third row applies the depth correlation hypothesis. Thus, more training weights and gradients are produced by embedding the depth map into networks of attenuation and scattering. As a result, it presents a relatively high structural score. Note that w/ HYP obtains the best saturation on both datasets. However, we could not consider it as a key assessment index of a restored image, since high saturation is not positively correlated to a good subjective perception, and could lead to distortion. The fourth row applies the backscattering restraint hypothesis, to make backscattering estimation mask draw near the true backscattering of . Backscattering is the main reason for contrast decrease, so that has been distinctly improved based on HYP. Finally, our full network performs the highest UCIQE value with the best contrast of brightness and color correction. Noticeably, we only train our frame with the RUIE dataset, but it performs well both in the test set of RUIE and U45.


Dataset Method SSIM UCIQE
Baseline 0.422 0.024 0.902 0.771 0.457
w/o 0.644 0.044 0.892 0.790 0.470
RUIE w/ 0.712 0.053 0.898 0.793 0.476
w/ 0.709 0.055 0.912 0.790 0.480
Ours full 0.748 0.061 0.966 0.780 0.495
Baseline 0.330 0.031 0.843 0.711 0.429
w/o 0.618 0.077 0.877 0.731 0.465
U45 w/ 0.665 0.091 0.882 0.781 0.486
w/ 0.648 0.080 0.914 0.730 0.476
Ours full 0.713 0.099 0.917 0.771 0.497


TABLE I: Ablation study.

Iv-D Correlation Between Depth and Degradation Factors

Fig. 5: Visualization and statistics of generated depth in the first 5 epochs. (a) displays the generated depth map sequences. The generator with HYP shows a reasonable and fast convergence result, which qualitatively proves the necessity of HYP. (b) illustrates the mean depth and standard deviation at the first 5 epochs of the ablation study.

We consider a latent relationship between scene depth and underwater degradation factors. Experimentally, we concatenate the raw image and generated depth map as the input of attenuation and scattering networks. We separately compare four individual training architectures in the ablation study, and illustrate the generated depth maps of a random underwater visual input for the first 5 epochs. As shown in Fig. 5, the original image includes the nearby reefs and misty scenery in the distance, while the right diagrams represent four depth map sequences. The first row shows the generated results based on the naive physics-driven network, which lacks effective gradient updating of scene depth. In the early training, it is difficult for the network to learn distance information from the comprehensive loss. For example, w/o HYP incorrectly marks the close and distant views in Epoch 5. Thereby, the wrong depth map negatively conducts the premature convergence of degradation factors, and eventually leads to a mode collapse. According to the illustrations in the second and third rows, we find that both HYP and HYP have a positive influence on enhancing the gradient constraint. By contrast, the HYP shows significant improvement of the underwater depth map, so that it is necessary for the depth correlation hypothesis to steady the early training process. Besides, we calculate the mean depth and standard deviation of the RUIE dataset in the first 5 epochs, as shown in Fig. 5. The solid line represents the mean depth values, and the dashed line denotes the standard deviation of the depth map. The results adopting HYP are closer to 3 m, which is the middle value of the effective range. Moreover, the increased standard deviation also proves that HYP supports the production of informative scene depth maps. Furthermore, we provide another controlled experiment for depth correlation with an example diagram (see Fig. 6). The Fig. 6(a) is an example result of full structure, and (b) corresponds to w/ HYP. We utilize red and black boxes to highlight two typical areas. The comparison results at the bottom emphasize an unreasonable brightness caused by the mismatching of depth and degradation factors. Several pixels exceed the largest RGB value 255 in the calculation of the Jaffe-McGlamery model, thus compromising the edge detail. On the other hand, the qualitative comparison on the right side shows that the background color concentration of (a) is more reasonable, thus avoiding image distortion.

Fig. 6: Qualitative comparison of depth correlation hypothesis. The sections selected by boxes highlight a reduction of mismatches between depth and degradation coefficients by depth correlation hypothesis.

Iv-E Backscattering Constraint

The second hypothesis is proposed from the backscattering constraint of the underwater image degradation model, which provides extra gradients for depth map, degradation coefficients, and veiling light. In order to verify the improvement based on this hypothesis, we also use a diagram to illustrate the improvement in blur removal and contrast enhancement of the generated results. As plotted in Fig. 7, the original image estimates the depth map , the scattering factor , and veiling light , and then formulate the backscattering under the constraint of estimated backscattering. In general, the DCP map is usually regarded as the reference of the estimation, but in this paper, we employ the RGB values of the lowest pixel in the DCP map as the reference. Thus, the estimated backscattering shows dark green that is associated with the color distribution of the input data. By dissociating from , we obtain the attenuated figure , and then get the final result after the reverse attenuation process.

Fig. 7: Qualitative comparison of backscattering constraint. The process of eliminating backscattering is described graphically. This constraint can help the network to estimate an accurate backscattering mask, which effectively improves the edges and tiny details of the generated image.

We take Fig. 7(a)-(b) as examples to compare the details of raw images, restored results of the frame with HYP

and full scheme. First, we quantitatively analyze the contrast and the Laplacian mean-variance of figures, respectively. Since these figures are pictured in a turbid aquatic environment, the contrast and sharpness are very low. Through subtracting

from , both the contrast and sharpness are enhanced for the two figures. In particular, the result of the full-frame shows significant improvements compared with the frame only with HYP for applying the backscattering constraint. Besides, we illustrate several edge details in the patches marked by red boxes. From a qualitative perspective, some dim shadow at the distance caused by floating particles disappears, and the details of reef and seabed stick out through dissociating the estimation of backscattering from the original figure. Consequently, the backscattering constraint improves the scattering estimation accuracy of the background region, enhances the edge detail of the misty region, and preserves the color concentration well.

Iv-F Comparison on Restoration Quality

The qualitative comparison results for U45 dataset, shown in Fig. 8, verify the qualitative superiority of the proposed HybrUR. Compared with several prior and contemporary methods, our method achieves a brighter vision with balanced color and stretched contrast. In order to assess these several restorations or enhancement methods objectively, we present the results under different water quality conditions and divide all methods into supervised methods and unsupervised methods for a separate discussion.


Label Terms Contrast Laplacian Variance
(a) 20.001 43.209
24.799 67.209
25.457 87.084
73.493 240.354
88.997 324.938
(b) 24.870 82.201
36.245 137.512
37.816 157.545
99.638 436.524
122.227 501.414


  • Note: (a), (b) are regarding to figure labels in Fig. 7.

TABLE II: Quantitative comparison of image contrast and average variance of Laplacian operator.

First, UWGAN-UIE, GAN-RS, and DIFM are all model-based approaches with supervision, among which the latter two are specially designed to enhance the details of the image, so the texture and edge are clearer than the unsupervised methods. However, restoration results under different scenes should be various rather than a single style, and unsupervised methods can better produce realistic colors. As can be seen, UWGAN-UIE almost fails to achieve acceptable color correction in that its generated result presents a bluish hue. DIFM adopts an image dehazing model, which is similar to the Jaffe-McGlamery model but ignoring attenuation and scattering. Therefore, it retains a strong yellow-green color for the restoration results such as Fig. 8(a), 8(c), and 8(j), which are collected from a turbid water environment. Moreover, in terms of dehazing, backscattering estimation even provides a clearer vision than DIFM. As a supervised restoration algorithm, GAN-RS illustrates a remarkable edge enhancement for misty underwater images. However, it could merely produce the restored images in a similar cyan style for any underwater visual input, which leads to certain misunderstandings, such as the green seaweed in Fig. 8(g) and the cyan ground in Fig. 8(i). Furthermore, as plotted in Fig. 8(c), the end-to-end pixel-level generator produces some repeated textures in the distance of some horizontal scenes, which may obscure the details. Water-Net shows a reasonable color correction effect via utilizing the CycleGAN architecture, though it performs a lower contrast than our results. In addition, it can be seen from Fig. 8(c), 8(f), and 8(k) that some mist caused by floating particles has not been eliminated at all. By contrast, our HybrUR works well in a variety of aquatic environments though it is only trained with one water type. Noticeably, some of our restored results such as Fig. 8(h), 8(i) have almost achieved a complete water removal effect. Whereas, excessive brightness is a specific problem we have to address, mainly since the overwater pictures in training set are taken under indoor artificial light.

Fig. 8: Qualitative comparison between our method and contemporary approaches in terms of restoration quality. Images in each row are restored by the denoted method. The middle three rows are the results of supervised learning, and the last two rows are unsupervised methods.


Dataset Method UCIQE
Origin 0.386 0.059 53.192 0.200 0.043 0.569 0.779 0.377
GAN-RS 0.129 0.138 49.285 0.020 0.057 0.777 0.818 0.450
U45 Water-Net 0.252 0.082 46.188 0.074 0.064 0.777 0.821 0.455
UWGAN-UIE 0.393 0.077 56.141 0.110 0.049 0.606 0.756 0.384
DIFM 0.288 0.083 53.402 0.084 0.065 0.838 0.790 0.464
HybrUR 0.364 0.255 54.239 0.032 0.099 0.917 0.771 0.497
Origin 0.431 0.016 51.273 0.639 0.017 0.476 0.797 0.344
GAN-RS 0.124 0.087 50.766 0.031 0.041 0.806 0.810 0.449
RUIE Water-Net 0.200 0.045 48.846 0.119 0.047 0.810 0.818 0.455
UWGAN-UIE 0.430 0.040 49.037 0.235 0.041 0.657 0.780 0.401
DIFM 0.239 0.027 55.884 0.191 0.035 0.780 0.782 0.432
HybrUR 0.205 0.149 57.955 0.028 0.061 0.966 0.780 0.495


TABLE III: Quantitative comparison using no-reference quality assessment.

Furthermore, the numerical comparison is shown in Tab. II. Some no-reference quality assessment indexes are employed, including UCIQE and an underwater index in LAB color space. This index is proposed by Chen et al. to evaluate the color distribution in LAB color space, which is formulated as follows [19]:


where denotes the mean value of the light channel. indicates the distance between the center of image color distribution and LAB space origin. and represent image color span along axis and , respectively. Consequently, is regarded as the degree to which the color distribution of the image in the LAB space is uniformly concentrated at the origin. Tab. II consists of the mean value of two dataset test results. Some methods work well from a particular perspective, e.g., GAN-RS is closest to the origin in the LAB color space, but its span of color distribution is not large enough, which operates as the primary reason for the single style of restoration. The other unsupervised method, Water-Net, obtained the highest mean saturation , but its color correction and light index are even lower than other supervised enhancement methods, which demonstrates that the unsupervised pixel-level image-to-image generator has difficulty in achieving promising color restoration for lack of optical constraint. By contrast, the HybrUR provides the largest color span, which leads to a better subjective perception. On this basis, we also perform superior color correction and illumination variance compared to other approaches, thereby achieving the best UCIQE score. Therefore, it can be concluded that the comprehensive performance of the proposed HybrUR is better in terms of restoration quality.

Iv-G Feature-Extraction Tests

In this subsection, we focus on the representation performance of these underwater image enhancement and restoration algorithms. For sake of a more comprehensive evaluation, we apply several algorithms to evaluate their generative results from low-level expression and detailed information, including SIFT key point and Harris corner. As a visual example of the scale-invariant feature transform (SIFT) matching test, Fig. 9

shows the good matching pairs of a raw image and its restored results by five different networks. As can be seen, the better color correction by HybrUR offers more matched keypoints pairs than other methods. It is illustrated that HybrUR could explicitly improve feature extraction with a more realistic restoration. Furthermore, we test the average SIFT key points and Harris corners of all approaches mentioned over two datasets, the results of which are summarized in Tab. 

IV. SIFT, Harris are the number of detected SIFT key points and Harris corners. GAN-RS performs the best in both two datasets, while HybrUR also ranks at the front of the table. It is worth noting that we do not specially append any detail enhancement design to our algorithm, which will be the new focus of our follow-up work.


Dataset Method SIFT Harris
RUIE Origin 182 222
GAN-RS 883 367
UWGAN-UIE 306 245
DIFM 664 283
Water-Net 443 247
HybrUR 608 251
U45 Origin 34 247
GAN-RS 425 375
UWGAN-UIE 200 256
DIFM 352 267
Water-Net 259 254
HybrUR 356 262


TABLE IV: Average numbers of SIFT keypoints and Harris corners.
Fig. 9: Comparison of SIFT key points matching. The results shown correspond to all the methods in Fig. 8.

Iv-H Discussion

Underwater image quality restoration has been regarded as an essential technology for underwater vision-based localization and navigation. We propose a novel style transfer network based on the Jaffe-McGlamery model to achieve color correction and brightness improvement of underwater images in an unsupervised manner. We demonstrate that two hypotheses including the correlation between depth and degradation factors as well as a backscattering constraint conduce to color and contrast refinement via a thorough ablation experiment. Both the quantitative and qualitative comparisons support the superiority of HybrUR. All models in HybrUR are implemented using Pytorch, and are trained and tested on a GeForce GTX 1080ti GPU. The inference speed of HybrUR based on a GPU reaches 33.17 FPS, well ahead of the 0.16 FPS for Sea-Thru using the same Jaffe-McGlamery model, and 5.88 FPS for the other unsupervised method Water-Net. Whereas we do observe some shortcomings in two aspects: 1) Unsupervised generation network does cause a drop of edge details commonly, so we need to impose specific loss functions or super-resolution methods to increase the pixel-level accuracy; 2) The restoration effect is inevitably affected in the picture area with extremely large water quality span. This may be improved by referring to a

many-to-many style transfer method [38].

V Conclusion and Future Work

We have creatively presented a method that can learn an image restoration model of a turbid underwater image from an unconstrained underwater image dataset. The model is able to obtain a high-bright restored output with a well color correction from a single input frame. This is trained with a modular cyclic architecture based on the Jaffe-McGlamery model without any supervision, resembling an integrated autoencoder. Any underwater image is actively decoupled into content and style by splitting it into scene depth, scattering, absorption, and veiling light. These individual elements are then reverse composed to generate a dewatering picture, resulting in better restoration performance than a single end-to-end generator. We have demonstrated the scene depth correlation and backscattering constraint are strong cues for contrast enhancement and color correction, which can help the generator converge to a meaningful restoration. The comparison proves our method outperforms the current state-of-the-art unsupervised underwater image restoration network that uses CycleGAN architecture. In addition, the strengthening of low-level feature applications demonstrates the engineering value of our proposed method.

Regarding future work, we will primarily focus on ensuring that the restored image retains more detail and edges to improve the quality of the image. Furthermore, an underwater image style transfer from multiple input domains to multiple robust output domains will be considered to fuse with our method to improve the generalization of the applied waters.


  • [1] G. L. Foresti and S. Gentili, “A vision based system for object detection in underwater images,”

    Intern. J. Pattern Recognit. Artif. Intell.

    , vol. 14, no. 2, pp. 167–188, 2000.
  • [2] M. Myint, K. Yonemori, K. N. Lwin, A. Yanou, and M. Minami, “Dual-eyes vision-based docking system for autonomous underwater vehicle: an approach and experiments,” J. Intell. Robot. Syst., vol. 92, no. 1, pp. 159–186, 2018.
  • [3] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: an accurate open-source library for visual, visual-inertial multimap SLAM,” IEEE Trans. Robot., vol. 29, pp. 5557–5570, 2020.
  • [4] X. Chen, J. Yu, S. Kong, Z. Wu, and L. Wen, “Joint anchor-feature refinement for real-time accurate object detection in images and videos,” IEEE Trans. Circ. Syst. Vid., vol. 31, no. 2, pp. 594–607, 2021.
  • [5] P. Sahu, N. Gupta, and N. Sharma, “A survey on underwater image enhancement techniques,” Intern. J. Comput. Appl., vol. 87, no 13, pp. 19–23, 2014.
  • [6] C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and P. Bekaert, “Color balance and fusion for underwater image enhancement,” IEEE Trans. Image Process., vol. 27, no. 1, pp. 379–393, 2018.
  • [7] E. Provenzi, C. Gatta, M. Fierro, and A. Rizzi, “A spatially variant white-patch and gray-world method for color image enhancement driven by local contrast,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 10, pp. 1757–1770, 2008.
  • [8] M. J. Islam, Y. Xia, and J. Sattar, “Fast underwater image enhancement for improved visual perception,” IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 3227–3234, 2020.
  • [9] C. Tang, U. F. von Lukas, M. Vahl, S. Wang, Y. Wang, and M. Tan, “Efficient underwater image and video enhancement based on Retinex,” Signal Image Video Process., vol. 13, no. 5, pp. 1011–1018, 2019.
  • [10] Y. T. Peng and P. C. Cosman, “Underwater image restoration based on image blurriness and light absorption,” IEEE Trans. Image Process., vol. 26, no. 4, pp. 1579–1594, 2017.
  • [11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation”, in Ind. Conf. Med. Image Comput. Computer-Assisted Interv. (MICCAI), Munich, Germany, Oct. 2015, pp. 234–241.
  • [12] X. Chen, P. Zhang, L. Quan, C. Yi, and C. Lu, “Underwater image enhancement based on deep learning and image formation model,” arXiv preprint, arXiv:2101.00991, 2021.
  • [13] C. Li, S. Anwar, and F. Porikli, “Underwater scene prior inspired deep underwater image and video enhancement,” Pattern Recog., vol. 98, Art. No. 107038, 2020.
  • [14] C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwater images and videos by fusion,” in IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Providence, Rhode Island, Jun. 2012, pp. 81–88.
  • [15] F. Farhadifard, Z. Zhou, and U. F. von Lukas, “Learning-based underwater image enhancement with adaptive color mapping,” in 9th Int. Symp. Image Signal Process. Anal. (ISPA), Zagreb, Croatia, Sept. 2015, pp. 48–53.
  • [16] P. Liu, G. Wang, H. Qi, C. Zhang, H. Zheng, and Z. Yu, “Underwater image enhancement with a deep residual framework,” IEEE Access, vol. 7, pp. 94614–94629, 2019.
  • [17] J. Li, K. A. Skinner, R. M. Eustice, and M. Johnson-Roberson, “WaterGAN: Unsupervised generative network to enable real-time color correction of monocular underwater images,” IEEE Robot. Autom. Lett., vol. 3, no. 1, pp. 387–394, 2017.
  • [18] N. Wang, Y. Zhou, F. Han, H. Zhu, and Y. Zheng, “UWGAN: Underwater GAN for real-world underwater color restoration and dehazing”, arXiv preprint, arXiv:1912.10269, 2019.
  • [19] X. Chen, J. Yu, S. Kong, Z. Wu, X. Fang, and L. Wen, “Towards real-time advancement of underwater visual quality with GAN,” IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9350–9359, 2019.
  • [20] C. Li, J. Guo, and C. Guo, “Emerging from water: Underwater image color correction based on weakly supervised color transfer,” IEEE Signal Process. Lett., vol. 25, no. 3, pp. 323–327, 2018.
  • [21]

    J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in

    Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 2223–2232.
  • [22] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Adv. neural inf. proces. syst., Barcelona, Spain, Dec. 2016, pp. 2180–2188.
  • [23] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Honolulu, HI, United States, Jul. 2017, pp. 1125–1134.
  • [24]

    C. Fabbri, M. J. Islam, and J. Sattar, “Enhancing underwater imagery using generative adversarial networks,” in

    IEEE Int. Conf. Rob. Autom. (ICRA), Brisbane, QLD, Australia, Sept. 2018, pp. 7159–7165.
  • [25] D. Akkaynak and T. Treibitz, “A revised underwater image formation model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), Salt Lake City, UT, United States, Jun. 2018, pp. 6723–6732.
  • [26] X. Huang, M. Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proc. Europ. Conf. Comput. Vis. (ECCV), Munich, Germany, Sept. 2018, pp. 172–189.
  • [27] B. Lu, J. C. Chen, and R. Chellappa, “Unsupervised domain-specific deblurring via disentangled representations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., Long Beach, CA, United States, Jun. 2019, pp. 10225–10234.
  • [28]

    M. Yang and A. Sowmya, “An underwater color image quality evaluation metric,”

    IEEE Trans. Image Process., vol. 24, no. 12, pp. 6062–6071, 2015.
  • [29] R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 12, pp. 4861–4875, 2019.
  • [30] H. Li, J. Li, and W. Wang, “A fusion adversarial underwater image enhancement network with a public test dataset,” arXiv preprint, arXiv:1906.06819, 2019.
  • [31] M. Hou, R. Liu, X. Fan, and Z. Luo, “Joint residual learning for underwater image enhancement,” in 25th IEEE Int. Conf. Image Process. (ICIP), Athens, Greece, Oct. 2018, pp. 4043–4047.
  • [32] D. Akkaynak and T. Treibitz, “Sea-thru: A method for removing water from underwater images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (CVPR), Long Beach, CA, United States, Jun. 2019, pp. 1682–1691.
  • [33] F. Fan, K. Yang, M. Xia, W. Li, B. Fu, and W. Zhang, “Comparative study on several blind deconvolution algorithms applied to underwater image restoration,” Optical review, vol. 17, no. 3, pp. 123–129, 2010.
  • [34] J. Y. Chiang and Y. C. Chen, “Underwater image enhancement by wavelength compensation and dehazing,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1756–1769, 2011.
  • [35] X. Mao, Q. Li, H. Xie, R. Lau, and S. P. Smolley, “Least squares generative adversarial networks,” in IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Dec. 2017, pp. 2813–2821.
  • [36] J. Johnson, A. Alahi, and F. Li, “Perceptual losses for real-time style transfer and super-resolution,” in Europ. Conf. Comput. Vis. (ECCV), Amsterdam, Netherlands, Oct. 2016, pp. 694–711.
  • [37] J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 8, pp. 1335–1340, 2006.
  • [38] H. Y. Lee, H. Y. Tseng, J. Huang, M. Singh, and M. H. Yang. “DRIT++: Diverse image-to-image translation via disentangled representations,” Int. J. Comput. Vis., vol. 128, no. 10, pp. 2402–2417, 2020.