Simultaneous Enhancement and Super-Resolution. #RSS2020
In this paper, we introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision and provide an efficient solution for near real-time applications. We present Deep SESR, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. We supervise its training by formulating a multi-modal objective function that addresses the chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation. It is also supervised to learn salient foreground regions in the image, which in turn guides the network to learn global contrast enhancement. We design an end-to-end training pipeline to jointly learn the saliency prediction and SESR on a shared hierarchical feature space for fast inference. Moreover, we present UFO-120, the first dataset to facilitate large-scale SESR learning; it contains over 1500 training samples and a benchmark test set of 120 samples. By thorough experimental evaluation on the UFO-120 and other standard datasets, we demonstrate that Deep SESR outperforms the existing solutions for underwater image enhancement and super-resolution. We also validate its generalization performance on several test cases that include underwater images with diverse spectral and spatial degradation levels, and also terrestrial images with unseen natural objects. Lastly, we analyze its computational feasibility for single-board deployments and demonstrate its operational benefits for visually-guided underwater robots. The model and dataset information will be available at: https://github.com/xahidbuffon/Deep-SESR.READ FULL TEXT VIEW PDF
We present a deep residual network-based generative model for single ima...
In this paper, we present a conditional generative adversarial network-b...
This paper presents a holistic approach to saliency-guided visual attent...
In this paper, we present the first large-scale dataset for semantic
Automatic image matting (AIM) refers to estimating the soft foreground f...
In this paper, we introduce a generative model for image enhancement
Underwater object detection for robot picking has attracted a lot of
Simultaneous Enhancement and Super-Resolution. #RSS2020
Automatic generation of high resolution (HR) images from low resolution (LR) sensory measurements is a well-studied problem in the domains of computer vision and robotics due to its usefulness for detailed scene understanding and image synthesis[65, 73, 34]. For visually-guided robots, in particular, this single image super-resolution (SISR) capability allows zooming-in regions of interests (RoIs) for detailed perception, to eventually make navigational and other operational decisions. However, if the LR images suffer from noise and optical distortions, those get amplified by SISR, resulting in uninformative RoIs. Hence, restoring perceptual and statistical image qualities is essential for robust visual perception in noisy environments (, underwater [9, 25]). Although large bodies of literature on perceptual image enhancement and SISR offer solutions separately for both, a unified approach is more viable for computationally constrained real-time applications, which has not yet been explored in depth.
To this end, we introduce simultaneous enhancement and super-resolution (SESR), and demonstrate its effectiveness for both underwater and terrestrial imagery. SESR is particularly useful in the underwater domain due to its unique optical properties , , attenuation, refraction, and backscatter. These artifacts cause range-and-wavelength-dependent non-linear distortions that severely affect vision despite often using high-end cameras 
. Specifically, the captured images exhibit various levels of hue distortion, blurriness, low contrast, and color degradation based on the waterbody types, distances of light sources, etc. Some of these aspects can be modeled and estimated by physics-based solutions, particularly for dehazing, color correction , water removal , etc. However, these methods are often computationally too demanding for real-time robotic deployments. Besides, dense scene depth and optical waterbody measures are not always available in practical applications.
The learning-based approaches attempt to address the practicalities by approximating the underlying solution to the ill-posed problem of underwater image restoration with RGB data alone. Several existing models based on convolutional neural networks (CNNs)[48, 64] and generative adversarial networks (GANs) [35, 44, 20] provide state-of-the-art (SOTA) performance for perceptual color enhancement, dehazing, deblurring, and contrast adjustment. Additionally, inspired by the success of deep residual networks for terrestrial SISR [73, 42, 30], several models have been proposed for underwater SISR in recent years [12, 34], which report exciting results with reasonable computational overhead. Contemporary research work [35, 34]
further demonstrates that the perceptually enhanced underwater images provide significantly improved performance for widely-used object detection and human body-pose estimation tasks; moreover, detailed perception on salient image regions facilitates better scene understanding and attention modeling. However, as mentioned, separately processing visual data for these capabilities, even with the fastest available solutions, is not computationally feasible on single-board platforms.
In this paper, we present the first unified approach for SESR with an end-to-end trainable model. The proposed Deep SESR architecture incorporates dense residual-in-residual sub-networks to facilitate multi-scale hierarchical feature learning for SESR and saliency prediction. For supervision, we formulate a multi-modal objective function that evaluates the degree of chrominance-specific color degradation and loss in image sharpness, contrast, and high-level feature representation. As demonstrated in Fig. 1, it learns to restore perceptual image qualities at higher spatial scales (up to ); as a byproduct, it learns to identify salient foreground regions in the image. We also present the UFO-120 dataset, which contains over annotated samples for large-scale SESR training, and a test set with an additional samples.
Furthermore, we evaluate the perceptual enhancement and super-resolution performance of Deep SESR on UFO-120 and several other standard datasets. The results suggest that it provides superior performance over SOTA methods on respective tasks, and achieves considerably better generalization performance on unseen natural images. It also achieves competitive performance on standard terrestrial datasets without additional training or tuning, which indicates that SESR methods can be potentially effective for terrestrial applications as well. Finally, we specify several design choices for Deep SESR, analyze their computational aspects, and discuss the usability benefits for its robotic deployments.
Underwater image enhancement is an active research problem that deals with correcting optical image distortions to recover true pixel intensities [3, 10]. Classical approaches use hand-crafted filters to improve local contrast and enforce color constancy. These approaches are inspired by the Retinex theory of human visual perception [37, 72, 23], and mainly focus on restoring background illumination and lightness rendition. Another class of physics-based approaches uses an atmospheric dehazing model to estimate true transmission and ambient light in a scene [15, 27]. Additional prior knowledge or statistical assumptions (, haze-lines, dark channel prior , etc.) are often utilized for global enhancements. Recent work by Akkaynak [2, 3] introduces a revised image formation model that accounts for the unique characteristics of underwater light propagation; this contributes to a more accurate estimation of range-dependent attenuation and backscatter .
While accurate underwater image recovery remains a challenge, the learning-based approaches for perceptual enhancement have made remarkable progress in recent years. Driven by large-scale supervised training [35, 69], these approaches learn sequences of non-linear filters to approximate the underlying pixel-to-pixel mapping  between the distorted and enhanced
image domains. The contemporary deep CNN-based generative models provide SOTA performance in learning such image-to-image translation for both terrestrial[14, 11] and underwater domains [35, 48]. Moreover, the GAN-based models attempt to improve generalization performance by employing a two-player min-max game , where an adversarial discriminator evaluates the generator-enhanced images compared to ground truth samples. This forces the generator to learn realistic enhancement while evolving with the discriminator toward equilibrium. Several GAN-based underwater image enhancement models have reported impressive results from both paired [20, 45] and unpaired training 
. However, they are prone to training instability, and hence require careful hyper-parameter choices, and intuitive loss function adaptation[5, 53] to ensure convergence.
Single image super-resolution (SISR) problem deals with automatically generating a sharp HR image from its LR measurements. Although SISR is relatively less studied in the underwater domain, a rich body of literature exists for terrestrial imagery . In particular, existing deep CNN-based models [18, 42] and GAN-based models [58, 60] provide good solutions for SISR. Researchers have also exploited contemporary techniques [39, 40, 62]
such as gradient clipping, dense skip connection, and sub-pixel convolution to improve SISR performance on standard datasets. Moreover, deep residual networks[42, 30] and residual-in-residual networks [65, 47] are known to be very effective for learning SISR. Such networks employ skip connections to preserve the identity mapping within repeated blocks of convolutional layers; this contributes to a stable training of very deep models. Zhang  further demonstrated that dense skip connections within a residual block allow combining of hierarchical features from each layer, which substantially boosts the SISR performance.
In recent years, similar ideas have been effectively applied for underwater imagery as well. For instance, Chen  adopt residual-in-residual learning for underwater SISR, whereas Islam  introduce a deep residual multiplier model that can be dynamically configured for , , or SISR. Although these models report inspiring results, they do not account for underwater image distortions, and hence rely on a secondary network for enhancement. On the contrary, traditional approaches primarily focus on enhancing underwater image reconstruction quality by deblurring/denoising [13, 56], or descattering . Hence, their applicability for end-to-end SESR is limited.
Visual attention-based saliency prediction refers to finding interesting foreground regions in the image space [51, 64]. The classical stimulus-driven approaches use features such as luminance, color, texture, and often depth information to quantify feature contrast in a scene. This feature contrast is subsequently exploited for spatial saliency computation. Automatic saliency prediction over a sequence of frames is also explored extensively  because spatio-temporal features capture information about the motion and interaction among objects in a scene, which are important cues for attention modeling. Another genre of approaches deal with goal-driven saliency prediction for visual question answering , , finding the image regions that are relevant to a query.
In the underwater domain, however, existing research work mainly focuses on salient feature extraction for enhanced object detection performance[19, 52, 71]. Hence, they do not provide a general solution for attention modeling that can facilitate faster visual search or better scene understanding. Nevertheless, finding salient RoIs in distorted underwater images and generating corresponding enhanced HR patches can be extremely useful for visually-guided robots. We attempt to contribute to these aspects in this paper.
SESR refers to the task of generating perceptually enhanced HR images from their LR and possibly distorted (LRD) input measurements. We formulate the problem as learning a pixel-to-pixel mapping from a source domain (of LRD images) to its target domain (of enhanced HR images); we represent this mapping as a generative function . We adopt an extended formulation by considering the task of learning SESR and saliency prediction on a shared feature space. Specifically, Deep SESR learns the generative function ; here, the additional outputs and denote the predicted saliency map, and enhanced image (in the same resolution as the input ), respectively. Additionally, it offers up to SESR for the final output .
We utilize several existing underwater image enhancement and super-resolution datasets to supervise the SESR learning. We follow standard procedures [35, 46, 20] for optical/spatial image degradation, and use human-labeled saliency maps to create paired data of the form (, ); further details on the existing datasets are provided in Section 5.1.
In addition, we contribute over samples for training (and another for testing) in the UFO-120 dataset. It contains images collected from oceanic explorations in multiple locations having different water types, as seen in Fig. 1(a). The salient foreground pixels of each image are annotated by human participants. Moreover, we adopt a widely used style-transfer technique [35, 20] to generate their respective distorted images. Subsequently, we generate the LRD samples by Gaussian blurring (GB) and bicubic down-sampling (BD); based on their relative order, we group the data into three sets:
Set-U: GB is followed by BD.
Set-F: the order is interchanged with a probability.
Set-O: BD is followed by GB.
We use a kernel and a noise level of for GB. Additionally, as Fig. 1(b) illustrates, we use , , and BD to generate the LRD samples. Hence, there are nine available training combinations for SESR. The UFO-120 dataset can also be used for training underwater SISR (), image enhancement (), or saliency prediction () models.
As shown in Figure 3, the major components of our Deep SESR model are: residual dense blocks (RDBs), a feature extraction network (FENet), and an auxiliary attention network (AAN). These components are tied to an end-to-end architecture for the combined SESR learning.
Residual Dense Blocks (RDBs) consist of three sets of convolutional (conv
) layers, each followed by Batch Normalization (BN)  and ReLU non-linearity . As Figure 2(a) illustrates, the input and output of each layer is concatenated to subsequent layers. This architecture is inspired by Zhang  who demonstrated that such dense skip connections facilitate an improved hierarchical feature learning. Each conv layer learns filters of a given kernel size; their outputs are then fused by a conv layer for local residual learning.
Feature Extraction Network (FENet) uses RDBs as building blocks to incorporate two-stage residual-in-residual learning. As shown in Figure 2(b), on the first stage, two parallel branches use eight RDB blocks each to separately learn and filters in input image space; these filters are then concatenated and passed to a common branch for the second stage of learning. Four RDB blocks with filters are used in the later stage which eventually generates feature maps. Our motive for such design is to have the capacity to learn locally dense informative features while still maintaining a globally shallow architecture to ensure fast feature extraction.
Auxiliary Attention Network (AAN) learns to model visual attention in the FENet-extracted feature space. As shown in Figure 2(c), two sequential conv layers learn to generate a single channel output that represents saliency (probabilities) for each pixel. We show the predicted saliency map as green intensity values; the black pixels represent background regions.
The Deep SESR learning is guided along the primary branch by a series of conv and deconv (de-convolutional) layers. As Figure 2(c) demonstrates, the enhanced image (LR), and the SESR image (HR) are generated by separate output layers at different stages in the network. The enhanced image is generated from the conv layer that immediately follows FENet; it is supervised to learn enhancement by dedicated loss functions applied at the shallow output layer. The enhanced features are also propagated to another conv layer, followed by deconv layers for upsampling. The final SESR output is generated from upsampled features based on the given scale: , , or . Other model parameters, , the number of filters, kernel sizes, etc., are annotated in Figure 3.
The end-to-end training of Deep SESR is supervised by seven loss components that address various aspects of learning the function . By denoting as the generated output, we formulate the loss terms as follows:
1) Information Loss for saliency prediction is measured by a standard cross-entropy function [51, 64]. It quantifies the dissimilarity in pixel intensity distributions between the generated saliency map () and its ground truth (). For a total of pixels in , it is calculated as
2) Contrast loss (LR) evaluates the hue and luminance recovery in the enhanced images. The dominating green/blue hue in distorted underwater images often causes low-contrast and globally dim foreground pixels. We quantify this loss of relative strength (, intensity) in foreground pixels in RGB space by utilizing a differentiable function: Contrast Measurement Index (CMI) [59, 63]. The CMI measures the average intensity of foreground pixels () relative to the background () for an image , as
We exploit the saliency map (or to find the foreground pixels in (or , as and ; here, denotes element-wise multiplication. Subsequently, we compute the contrast loss as
An immediate consequence of using is that AAN can directly influence learning enhancement despite being on a separate branch. Such coupling also provides better training stability (otherwise AAN tends to converge too early and starts over-fitting). Moreover, in Figure 3(a)
, we show the distributions of CMI for training samples of the UFO-120 dataset, which suggests that the distorted samples’ CMI scores are skewed to much lower values compared to the ground truth. Hence,forces the CMI distribution to shift toward higher values for learning contrast enhancement.
3) Color Loss (LR/HR) evaluates global similarity of the enhanced () and SESR output () with respective ground truth measurements in RGB space. The standard loss terms are: . Additionally, we formulate two perceptual loss functions that are particularly designed for learning underwater image enhancement and super-resolution. First, we utilize two wavelength-dependent chrominance terms: , and , which are core elements of the Underwater Image Colorfulness Measure (UICM) [55, 49]. By denoting , , and , as the per-channel numeric differences between and , we formulate the loss as:
Here, , whereas , , and are the per-channel disparities between and . Finally, we adopt the color loss terms for enhancement and SESR as
4) Content loss (LR/HR) forces the generator to restore a similar feature content as the ground truth in terms of high-level representation. Such feature preservation has been found to be very effective for image enhancement, style transfer, and SISR problems [31, 34]; as suggested in , we define the image content function as high-level features extracted by the last conv layer of a pre-trained VGG-19 network. Then, we formulate the content loss for enhancement and SESR as
5) Sharpness loss (HR)
measures the blurriness recovery in SESR output by exploiting local image gradients. The literature offers several solutions for evaluating image sharpness based on norm/histogram of gradients or frequency-domain analysis. In particular, the notions of Just Noticable Blur (JNB) and Perceptual Sharpness Index (PSI) 
are widely used; they apply non-linear transformation and thresholding on local contrast or gradient-based features to quantify perceived blurriness based on the characteristics of human visual system. However, we found better results and numeric stability by using the norm of image gradients directly; specifically, we use the standardSobel operator  for computing spatial gradient for an image . Subsequently, we formulate the sharpness loss for SESR as
In Figure 3(b), we present a statistical validity of as a loss component; also, edge gradient features for a particular sample are provided in Figure 3(c). As shown, numeric disparities for the norm of gradients between distorted images and their HR ground truth are significant, which we quantify by to encourage sharper image generation.
We use a linear combination of the above-mentioned loss components to formulate the unified objective function as
where and are expressed by
Here, symbols are scaling factors that represent the contributions of respective loss components; their values are empirically tuned as hyper-parameters.
As mentioned in Section 3.2, Deep SESR training is supervised by paired data of the form
. We use TensorFlow libraries to implement the optimization pipeline (of Eq. 10); a Linux host with two Nvidia™ GTX 1080 graphics cards are used for training. Adam optimizer  is used for the global iterative learning with a rate of and a momentum of ; the network converges within -epochs of training in this setup (with a batch-size of ). In the following sections, we present the experimental results based on qualitative analysis, quantitative evaluations, and ablation studies. Since there are no existing SESR methods, we compare the Deep SESR performance separately with SOTA image enhancement and super-resolution models. Note that, all models in comparison are trained on the same train-validation splits (of respective datasets) by following their recommended parameter settings. Also, for datasets other than UFO-120, the AAN (and ) is not used by Deep SESR as their ground truth saliency maps are not available.
We first qualitatively analyze the Deep SESR-generated images in terms of color, contrast, and sharpness. As Fig. 5 illustrates, the enhanced images are perceptually similar to the respective ground truth. Specifically, the greenish underwater hue is rectified, true pixel colors are mostly restored, and the global image sharpness is recovered. Moreover, the generated saliency map suggests that it focused on the right foreground regions for contrast improvement. We further demonstrate the contributions of each loss-term: , , , and for learning the enhancement. We observe that the color rendition gets impaired without and , whereas, contributes to learning finer texture details. We also notice a considerably low-contrast image generation without , which validates the utility of saliency-driven contrast evaluation via CMI (see Section 4.2).
Next, we compare the perceptual image enhancement performance of Deep SESR with the following models: (i) relative global histogram stretching (RGHS) , (ii) unsupervised color correction (UCM) , (iii) multi-scale fusion (MS-Fusion) , (iv) multi-scale Retinex (MS-Retinex) , (v) Water-Net , (vi) UGAN , (vii) Fusion-GAN , and (viii) FUnIE-GAN . The first four are physics-based models and the rest are learning-based models; they provide SOTA performance for underwater image enhancement in RGB space (without requiring scene depth or optical waterbody measures). Their performance is quantitatively evaluated on common test sets of each dataset based on standard metrics [35, 49]
: peak signal-to-noise ratio (PSNR), structural similarity measure (SSIM) , and underwater image quality measure (UIQM) . The PSNR and SSIM quantify reconstruction quality and structural similarity of generated images (with respect to ground truth), whereas the UIQM evaluates image qualities based on colorfulness, sharpness, and contrast. The evaluation is summarized in Table 1; moreover, a few qualitative comparisons are shown in Fig. 6.
As Fig. 6 demonstrates, UCM and MS-Retinex often suffer from over-saturation, whereas RGBH, MS-Fusion, and Water-Net fall short in hue rectification. In comparison, the color restoration and contrast enhancement of UGAN, Fusion-GAN, and FUnIE-GAN are generally better. In addition to achieving comparable color recovery and hue rectification, the Deep SESR-generated images are considerably sharper. Since the boost in performance is rather significant for UFO-120 dataset (suggested by the results of Table 1), it is likely that the additional knowledge about foreground pixels through helps in this regard. Deep SESR achieves competitive and often better performance in terms of PSNR and SSIM as well. In particular, it generally attains better UIQM scores; we postulate that contributes to this enhancement, as it is designed to improve the UICM (see Section 4.2). Further ablation investigations reveal a drop in UIQM values without using in the learning objective.
We follow similar experimental procedures for evaluating the super-resolution performance of Deep SESR. We consider the existing underwater SISR models named RSRGAN , SRDRM , and SRDRM-GAN  for performance comparison. We also include the standard (terrestrial) SISR models named SRCNN , SRResNet , and SRGAN  in the evaluation as benchmarks. We compare their , , and SISR performance on two large-scale datasets: UFO-120, and USR-248. The results are presented in Table 2, and a few samples are shown in Fig. 7. Note that, the test images of USR-248 dataset are left undistorted for a fair comparison.
As Table 2 demonstrates, Deep SESR outperforms other models in comparison by considerable margins on UIQM. This is due to the fact that it enhances perceptual image qualities in addition to spatial resolution. As shown in Fig. 7, Deep SESR generates much sharper and better quality HR images from both distorted and undistorted LR input patches, which contributes to its competitive PSNR and SSIM scores on the USR-248 dataset. Fig. 8 further demonstrates that it does not introduce noise by unnecessary over-correction, which is a prevalent limitation of existing automatic image enhancement solutions. Lastly, we observe similar performance trends for all three types of spatial down-sampling, , for Set-U, Set-F, and Set-O (see Section 3.2); we present the relative quantitative scores in Table 3.
Due to the ill-posed nature of modeling underwater image distortions without scene-depth and optical waterbody measurements, learning-based solutions often fail to generalize beyond supervised data. In addition to the already-presented results, we demonstrate the color and texture recovery of Deep SESR on unseen natural images in Fig. 9. As seen in Fig. 8(a), Deep SESR-enhanced pixel intensities are perceptually similar to a comprehensive physics-based approximation . Additionally, it generates the respective HR images and saliency maps, and still offers more than times faster run-time.
Deep SESR also provides reasonable performance on terrestrial images. As demonstrated in Fig. 8(b), the color and texture enhancement of unseen objects (, grass, face, clothing, etc.) are perceptually coherent. Moreover, as Table 4 indicates, its performance in terms of sharpness and contrast recovery for , , and SISR are competitive with SOTA benchmark results [46, 73]. Note that, much-improved performance can be achieved by further tuning and training on terrestrial datasets. Nevertheless, these results validate that the proposed architecture has the capacity to learn a generalizable solution of the underlying SESR problem.
Deep SESR’s on-board memory requirement is only MB, and it offers a run-time of milliseconds (ms) per-frame, , frames-per-second (FPS) on a single-board computer: Nvidia™AGX Xavier. As shown in Table 5, it provides much faster speeds for the following design choices:
1) Learning and on separate branches facilitates a faster run-time when HR perception is not required. Specifically, we can decouple the , branches from the frozen model, which operates at FPS ( faster) to perform enhancement and saliency prediction. As shown in Fig. 10, the predicted saliency map can be exploited for automatic RoI selection by using density gradient estimation techniques such as mean-shift . The SESR output corresponding to the RoI can be generated with an additional ms of processing time.
|With FENet-1d||ms ( FPS)||ms ( FPS)|
|With FENet-2d||ms ( FPS)||ms ( FPS)|
2) FENet-1d and FENet-2d are two design choices for the FENet (see Fig. 2(b)); FENet-2d is the default architecture that learns and filters in two parallel branches, whereas, FENet-1d refers to using a single branch of filters. As shown in Table 5, faster feature extraction by FENet-1d facilitates a speed-up for Deep SESR. However, we observe a slight drop in performance, , // lower scores for PSNR/SSIM/UIQM on UFO-120 dataset. Nevertheless, the generated images are qualitatively indistinguishable and the trade-off is admissible in practical applications.
Overall, Deep SESR offers use-case-specific design choices and ensures computational efficiency with robust SESR performance. These features make it suitable for near real-time robotic deployments; further demonstration is available in the supplementary material.
In this paper, we introduce the problem of simultaneous enhancement and super-resolution (SESR) and present an efficient learning-based solution for underwater imagery. The proposed generative model, named Deep SESR, can learn SESR and saliency prediction on a shared feature space. We also present its detailed network architecture, associated loss functions, and end-to-end training pipeline. Additionally, we contribute over annotated samples to facilitate large-scale SESR training on the UFO-120 dataset. We perform a series of qualitative and quantitative experiments, which suggest that Deep SESR: i) provides SOTA performance on underwater image enhancement and super-resolution, ii) exhibits significantly better generalization performance on natural images than existing solutions, iii) provides competitive results on terrestrial images, and iv) achieves fast inference on single-board platforms. The inspiring performance, computational efficiency, and availability of application-specific design choices make Deep SESR suitable for near real-time use by visually-guided underwater robots. In the future, we seek to incorporate spatial upscaling capability into the model with reasonable performance trade-offs.
We acknowledge the support of the MnDrive initiative at the University of Minnesota111https://mndrive.umn.edu/ for our research. We are also grateful to the Bellairs Research Institute222https://www.mcgill.ca/bellairs/ of Barbados for providing us with the facilities for field experiments. Additionally, we thank Nvidia™ for donating two GPUs for our work. Finally, we acknowledge our colleagues at the IRVLab333http://irvlab.cs.umn.edu/ for their assistance in data collection, annotation, and in preparation of media files.
TensorFlow: A System for Large-scale Machine Learning. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 265–283. Cited by: §5.1.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6723–6732. Cited by: §1, §2.
Deep Colorization. In IEEE International Conference on Computer Vision (ICCV), pp. 415–423. Cited by: §2.
Image-to-image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134. Cited by: §2.
The UFO-120 dataset: http://irvlab.cs.umn.edu/resources/ufo-120-dataset.
The USR-248  dataset: http://irvlab.cs.umn.edu/resources/usr-248-dataset.
The Set5 , Set14 , Sun80 , and other terrestrial datasets: https://github.com/ChaofWang/Awesome-Super-Resolution.
David Gantt. A Sea Fan at Karpata. 2012. (Flickr):
SeaPics.com. Diver Over a Reef. 2015. (SeaPics):
Vincent POMMEYROL. Underwater Photography. 2012. (Flickr): https://www.flickr.com/photos/vincentpommeyrol/8301150254/.
BelleDeesse. Watercraft. 2013. (WallpaperUP):
Nick Shaw. Leptomithrax gaimardii (Great Spider Crab). 2018. (Atlas of Living Australia): https://images.ala.org.au/image/details?imageId=8b7d6bad-1ffb-4740-8161-d3c6d4f35018.
David Piano. City of Grand Rapids Shipwreck watermarked-2. 2018. (Flickr): https://www.flickr.com/photos/davidpiano/44403874204/
Koi Photos Underwater View. The Pond Experts. 2014. (PondExperts): https://www.pondexperts.ca/wp-content/uploads/2014/07/img_3870.jpg
Cat Trumpet. 2 Hours of Beautiful Coral Reef Fish, Relaxing Ocean Fish, 1080p HD. 2016. (YouTube):
Nature Relaxation Films. 3 Hours of Stunning Underwater Footage, French Polynesia, Indonesia. 2018. (YouTube):
Calm Cove Club - Relaxing Videos. 4K Beautiful Ocean Clown Fish Turtle Aquarium. 2017 (YouTube):
Scubasnap.com. 4K Underwater at Stuart Cove’s, 2014 (YouTube): https://youtu.be/kiWfG31YbXo.
4.000 PIXELS. Beautiful Underwater Nature. 2017 (YouTube): https://youtu.be/1-Cn0b1MKrM.
Magnus Ryan Diving. SCUBA Diving Egypt Red Sea. 2017 (YouTube): https://youtu.be/CaLfMHl3M2o.
TheSilentWatcher. 4K Coral World-Tropical Reef. 2018. (YouTube): https://youtu.be/uyb0wW0ln_g.
Awesome Video. 4K- The Most Beautiful Coral Reefs and Undersea Creature on Earth. 2017. (YouTube):
Earth Touch. Celebrating World Oceans Day in 4K. 2015. (YouTube): https://youtu.be/IXxfIMNgMJA.
BBC Earth. Deep Ocean: Relaxing Oceanscapes. 2018. (YouTube): https://youtu.be/t_S_cN2re4g.
Alegra Chetti. Let’s Go Under the Sea I Underwater Shark Footage I Relaxing Underwater Scene. 2016. (YouTube): https://youtu.be/rQB-f5BHn5M.
Underwater 3D Channel- Barry Chall Films. Planet Earth, The Undersea World (4K). 2018. (YouTube):
Undersea Productions. “ReefScapes: Nature’s Aquarium” Ambient Underwater Relaxing Natural Coral Reefs and Ocean Nature. 2009. (YouTube):
BBC Earth. The Coral Reef: 10 Hours of Relaxing Oceanscapes. 2018. (YouTube):
Robby Michaelle. Scuba Diving the Great Barrier Reef Red Sea Egypt Tiran. 2014. (YouTube):
Bubble Vision. Diving in Bali. 2012. (YouTube):
Vic Stefanu - Amazing World Videos. EXPLORING The GREAT BARRIER REEF, fantastic UNDERWATER VIDEOS (Australia). 2015. (YouTube):
Our Coral Reef. Breathtaking Dive in Raja Ampat, West Papua, Indonesia Coral Reef. 2018. (YouTube):
GoPro. GoPro Awards: Great Barrier Reef with Fusion Overcapture in 4K. 2018. (YouTube):
GoPro. GoPro: Freediving with Tiger Sharks in 4K. 2017. (YouTube): https://youtu.be/Zy3kdMFvxUU.
TFIL. SCUBA DIVING WITH SHARKS!. 2017. (YouTube): https://youtu.be/v8eSPf4RzTU.
Vins and Annette Singh. Stunning salt Water Fishes in a Marine Aquarium. 2019. (YouTube):
Akouris. H.M.Submarine Perseus. 2014. (YouTube): https://youtu.be/4-oP0sX723k.
Gung Ho Vids. U.S. Navy Divers View An Underwater Wreck. 2014. (YouTube):
Martcerv. Truk lagoon deep wrecks, GoPro black with SRP tray and lights. 2013. (YouTube):
Dmireiy. Shipwreck Diving, Nassau Bahamas. 2012. (YouTube): https://youtu.be/CIQI3isddbE.
Frank Lame. diving WWII Wrecks around Palau. 2010. (YouTube): https://youtu.be/vcI63XQsNlI.
Stevanurk. Wreck Dives Malta. 2014. (YouTube):
Stevanurk. Diving Malta, Gozo and Comino 2015 Wrecks Caves. 2015. (YouTube):
Octavio velazquez lozano. SHIPWRECK Scuba Diving BAHAMAS. 2017. (YouTube):
Drew Kaplan. SCUBA Diving The Sunken Ancient Roman City Of Baiae, the Underwater Pompeii. 2018. (YouTube):
Octavio velazquez lozano. LIBERTY SHIPWRECK scuba dive destin florida. 2017. (YouTube):
Blue Robotics. BlueROV2 Dive: Hawaiian Open Water. 2016. (YouTube):
JerryRigEverything. Exploring a Plane Wreck - UNDER WATER!. 2018. (YouTube):
Rovrobotsubmariner. Home-built Underwater Robot ROV in Action!. 2010. (YouTube):
Oded Ezra. Eca-Robotics H800 ROV. 2016. (YouTube):
Geneinno Tech. Titan Diving Drone. 2019. (YouTube):
Scubo. Scubo - Agile Multifunctional Underwater Robot - ETH Zurich. 2016. (YouTube):
Learning with Shedd. Student-built Underwater Robot at Shedd ROV Club Event. 2017. (YouTube):
HMU-CSRL. SQUIDBOT sea trials. 2015. (YouTube):
MobileRobots. Aqua2 Underwater Robot Navigates in a Coral Reef - Barbados. 2012. (YouTube):
Daniela Rus. underwater robot. 2015. (YouTube):
JohnFardoulis. Sirius - Underwater Robot, Mapping. 2014. (YouTube): https://youtu.be/fXxVcucOPrs.