Log In Sign Up

Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception

In this paper, we introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision and provide an efficient solution for near real-time applications. We present Deep SESR, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. We supervise its training by formulating a multi-modal objective function that addresses the chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation. It is also supervised to learn salient foreground regions in the image, which in turn guides the network to learn global contrast enhancement. We design an end-to-end training pipeline to jointly learn the saliency prediction and SESR on a shared hierarchical feature space for fast inference. Moreover, we present UFO-120, the first dataset to facilitate large-scale SESR learning; it contains over 1500 training samples and a benchmark test set of 120 samples. By thorough experimental evaluation on the UFO-120 and other standard datasets, we demonstrate that Deep SESR outperforms the existing solutions for underwater image enhancement and super-resolution. We also validate its generalization performance on several test cases that include underwater images with diverse spectral and spatial degradation levels, and also terrestrial images with unseen natural objects. Lastly, we analyze its computational feasibility for single-board deployments and demonstrate its operational benefits for visually-guided underwater robots. The model and dataset information will be available at:


page 1

page 3

page 5

page 7

page 8

page 9

page 10


Underwater Image Super-Resolution using Deep Residual Multipliers

We present a deep residual network-based generative model for single ima...

Fast Underwater Image Enhancement for Improved Visual Perception

In this paper, we present a conditional generative adversarial network-b...

SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots

This paper presents a holistic approach to saliency-guided visual attent...

Semantic Segmentation of Underwater Imagery: Dataset and Benchmark

In this paper, we present the first large-scale dataset for semantic Seg...

A Generative Approach for Detection-driven Underwater Image Enhancement

In this paper, we introduce a generative model for image enhancement spe...

UDepth: Fast Monocular Depth Estimation for Visually-guided Underwater Robots

In this paper, we present a fast monocular depth estimation method for e...

Deep Automatic Natural Image Matting

Automatic image matting (AIM) refers to estimating the soft foreground f...

Code Repositories


Simultaneous Enhancement and Super-Resolution. #RSS2020

view repo

1 Introduction

Automatic generation of high resolution (HR) images from low resolution (LR) sensory measurements is a well-studied problem in the domains of computer vision and robotics due to its usefulness for detailed scene understanding and image synthesis 

[65, 73, 34]. For visually-guided robots, in particular, this single image super-resolution (SISR) capability allows zooming-in regions of interests (RoIs) for detailed perception, to eventually make navigational and other operational decisions. However, if the LR images suffer from noise and optical distortions, those get amplified by SISR, resulting in uninformative RoIs. Hence, restoring perceptual and statistical image qualities is essential for robust visual perception in noisy environments (, underwater [9, 25]). Although large bodies of literature on perceptual image enhancement and SISR offer solutions separately for both, a unified approach is more viable for computationally constrained real-time applications, which has not yet been explored in depth.

To this end, we introduce simultaneous enhancement and super-resolution (SESR), and demonstrate its effectiveness for both underwater and terrestrial imagery. SESR is particularly useful in the underwater domain due to its unique optical properties  [2], , attenuation, refraction, and backscatter. These artifacts cause range-and-wavelength-dependent non-linear distortions that severely affect vision despite often using high-end cameras [35]

. Specifically, the captured images exhibit various levels of hue distortion, blurriness, low contrast, and color degradation based on the waterbody types, distances of light sources, etc. Some of these aspects can be modeled and estimated by physics-based solutions, particularly for dehazing 

[7], color correction [10], water removal [3], etc. However, these methods are often computationally too demanding for real-time robotic deployments. Besides, dense scene depth and optical waterbody measures are not always available in practical applications.

The learning-based approaches attempt to address the practicalities by approximating the underlying solution to the ill-posed problem of underwater image restoration with RGB data alone. Several existing models based on convolutional neural networks (CNNs) 

[48, 64] and generative adversarial networks (GANs) [35, 44, 20] provide state-of-the-art (SOTA) performance for perceptual color enhancement, dehazing, deblurring, and contrast adjustment. Additionally, inspired by the success of deep residual networks for terrestrial SISR [73, 42, 30], several models have been proposed for underwater SISR in recent years [12, 34], which report exciting results with reasonable computational overhead. Contemporary research work [35, 34]

further demonstrates that the perceptually enhanced underwater images provide significantly improved performance for widely-used object detection and human body-pose estimation tasks; moreover, detailed perception on salient image regions facilitates better scene understanding and attention modeling. However, as mentioned, separately processing visual data for these capabilities, even with the fastest available solutions, is not computationally feasible on single-board platforms.

In this paper, we present the first unified approach for SESR with an end-to-end trainable model. The proposed Deep SESR architecture incorporates dense residual-in-residual sub-networks to facilitate multi-scale hierarchical feature learning for SESR and saliency prediction. For supervision, we formulate a multi-modal objective function that evaluates the degree of chrominance-specific color degradation and loss in image sharpness, contrast, and high-level feature representation. As demonstrated in Fig. 1, it learns to restore perceptual image qualities at higher spatial scales (up to ); as a byproduct, it learns to identify salient foreground regions in the image. We also present the UFO-120 dataset, which contains over annotated samples for large-scale SESR training, and a test set with an additional samples.

Furthermore, we evaluate the perceptual enhancement and super-resolution performance of Deep SESR on UFO-120 and several other standard datasets. The results suggest that it provides superior performance over SOTA methods on respective tasks, and achieves considerably better generalization performance on unseen natural images. It also achieves competitive performance on standard terrestrial datasets without additional training or tuning, which indicates that SESR methods can be potentially effective for terrestrial applications as well. Finally, we specify several design choices for Deep SESR, analyze their computational aspects, and discuss the usability benefits for its robotic deployments.

2 Background

Underwater image enhancement is an active research problem that deals with correcting optical image distortions to recover true pixel intensities [3, 10]. Classical approaches use hand-crafted filters to improve local contrast and enforce color constancy. These approaches are inspired by the Retinex theory of human visual perception [37, 72, 23], and mainly focus on restoring background illumination and lightness rendition. Another class of physics-based approaches uses an atmospheric dehazing model to estimate true transmission and ambient light in a scene [15, 27]. Additional prior knowledge or statistical assumptions (, haze-lines, dark channel prior [7], etc.) are often utilized for global enhancements. Recent work by Akkaynak  [2, 3] introduces a revised image formation model that accounts for the unique characteristics of underwater light propagation; this contributes to a more accurate estimation of range-dependent attenuation and backscatter [57].

While accurate underwater image recovery remains a challenge, the learning-based approaches for perceptual enhancement have made remarkable progress in recent years. Driven by large-scale supervised training [35, 69], these approaches learn sequences of non-linear filters to approximate the underlying pixel-to-pixel mapping [36] between the distorted and enhanced

image domains. The contemporary deep CNN-based generative models provide SOTA performance in learning such image-to-image translation for both terrestrial 

[14, 11] and underwater domains [35, 48]. Moreover, the GAN-based models attempt to improve generalization performance by employing a two-player min-max game [26], where an adversarial discriminator evaluates the generator-enhanced images compared to ground truth samples. This forces the generator to learn realistic enhancement while evolving with the discriminator toward equilibrium. Several GAN-based underwater image enhancement models have reported impressive results from both paired [20, 45] and unpaired training [35]

. However, they are prone to training instability, and hence require careful hyper-parameter choices, and intuitive loss function adaptation 

[5, 53] to ensure convergence.

Single image super-resolution (SISR) problem deals with automatically generating a sharp HR image from its LR measurements. Although SISR is relatively less studied in the underwater domain, a rich body of literature exists for terrestrial imagery [67]. In particular, existing deep CNN-based models [18, 42] and GAN-based models [58, 60] provide good solutions for SISR. Researchers have also exploited contemporary techniques [39, 40, 62]

such as gradient clipping, dense skip connection, and sub-pixel convolution to improve SISR performance on standard datasets. Moreover, deep residual networks 

[42, 30] and residual-in-residual networks [65, 47] are known to be very effective for learning SISR. Such networks employ skip connections to preserve the identity mapping within repeated blocks of convolutional layers; this contributes to a stable training of very deep models. Zhang  [73] further demonstrated that dense skip connections within a residual block allow combining of hierarchical features from each layer, which substantially boosts the SISR performance.

In recent years, similar ideas have been effectively applied for underwater imagery as well. For instance, Chen  [12] adopt residual-in-residual learning for underwater SISR, whereas Islam  [34] introduce a deep residual multiplier model that can be dynamically configured for , , or SISR. Although these models report inspiring results, they do not account for underwater image distortions, and hence rely on a secondary network for enhancement. On the contrary, traditional approaches primarily focus on enhancing underwater image reconstruction quality by deblurring/denoising [13, 56], or descattering [50]. Hence, their applicability for end-to-end SESR is limited.

Visual attention-based saliency prediction refers to finding interesting foreground regions in the image space [51, 64]. The classical stimulus-driven approaches use features such as luminance, color, texture, and often depth information to quantify feature contrast in a scene. This feature contrast is subsequently exploited for spatial saliency computation. Automatic saliency prediction over a sequence of frames is also explored extensively [6] because spatio-temporal features capture information about the motion and interaction among objects in a scene, which are important cues for attention modeling. Another genre of approaches deal with goal-driven saliency prediction for visual question answering [68], , finding the image regions that are relevant to a query.

In the underwater domain, however, existing research work mainly focuses on salient feature extraction for enhanced object detection performance 

[19, 52, 71]. Hence, they do not provide a general solution for attention modeling that can facilitate faster visual search or better scene understanding. Nevertheless, finding salient RoIs in distorted underwater images and generating corresponding enhanced HR patches can be extremely useful for visually-guided robots. We attempt to contribute to these aspects in this paper.

3 Problem Formulation

3.1 Learning SESR

SESR refers to the task of generating perceptually enhanced HR images from their LR and possibly distorted (LRD) input measurements. We formulate the problem as learning a pixel-to-pixel mapping from a source domain (of LRD images) to its target domain (of enhanced HR images); we represent this mapping as a generative function . We adopt an extended formulation by considering the task of learning SESR and saliency prediction on a shared feature space. Specifically, Deep SESR learns the generative function ; here, the additional outputs and denote the predicted saliency map, and enhanced image (in the same resolution as the input ), respectively. Additionally, it offers up to SESR for the final output .

(a) A few sample ground truth images and corresponding saliency maps are shown on the top, and bottom row, respectively.
(b) Two particular instances are shown: the HR ground truth images are of size ; their corresponding LR distorted (LRD) images are of size , , and .
Figure 2: The UFO-120 dataset facilitates paired training of , , and SESR models; it also contains salient pixel annotations for all training samples. The combined data is used for the supervised training of Deep SESR model.

3.2 Data Preparation: The UFO-120 Dataset

We utilize several existing underwater image enhancement and super-resolution datasets to supervise the SESR learning. We follow standard procedures [35, 46, 20] for optical/spatial image degradation, and use human-labeled saliency maps to create paired data of the form (, ); further details on the existing datasets are provided in Section 5.1.

In addition, we contribute over samples for training (and another for testing) in the UFO-120 dataset. It contains images collected from oceanic explorations in multiple locations having different water types, as seen in Fig. 1(a). The salient foreground pixels of each image are annotated by human participants. Moreover, we adopt a widely used style-transfer technique [35, 20] to generate their respective distorted images. Subsequently, we generate the LRD samples by Gaussian blurring (GB) and bicubic down-sampling (BD); based on their relative order, we group the data into three sets:

  • Set-U: GB is followed by BD.

  • Set-F: the order is interchanged with a probability.

  • Set-O: BD is followed by GB.

We use a kernel and a noise level of for GB. Additionally, as Fig. 1(b) illustrates, we use , , and BD to generate the LRD samples. Hence, there are nine available training combinations for SESR. The UFO-120 dataset can also be used for training underwater SISR (), image enhancement (), or saliency prediction () models.

(a) A residual dense block (RDB) [73].
(b) The feature extraction network (FENet).
(c) The end-to-end architecture is shown. FENet-extracted feature maps are propagated along two branches: i) to AAN for learning saliency, and ii) to an intermediate convolutional layer for learning enhancement. Another convolutional layer and subsequent upsampling layers learn SESR along the main branch.
Figure 3: Network architecture and detailed parameter specification of the proposed Deep SESR model.

4 Deep SESR Model

4.1 Network Architecture

As shown in Figure 3, the major components of our Deep SESR model are: residual dense blocks (RDBs), a feature extraction network (FENet), and an auxiliary attention network (AAN). These components are tied to an end-to-end architecture for the combined SESR learning.

Residual Dense Blocks (RDBs) consist of three sets of convolutional (conv

) layers, each followed by Batch Normalization (

BN[32] and ReLU non-linearity [54]. As Figure 2(a) illustrates, the input and output of each layer is concatenated to subsequent layers. This architecture is inspired by Zhang  [73] who demonstrated that such dense skip connections facilitate an improved hierarchical feature learning. Each conv layer learns filters of a given kernel size; their outputs are then fused by a conv layer for local residual learning.

Feature Extraction Network (FENet) uses RDBs as building blocks to incorporate two-stage residual-in-residual learning. As shown in Figure 2(b), on the first stage, two parallel branches use eight RDB blocks each to separately learn and filters in input image space; these filters are then concatenated and passed to a common branch for the second stage of learning. Four RDB blocks with filters are used in the later stage which eventually generates feature maps. Our motive for such design is to have the capacity to learn locally dense informative features while still maintaining a globally shallow architecture to ensure fast feature extraction.

Auxiliary Attention Network (AAN) learns to model visual attention in the FENet-extracted feature space. As shown in Figure 2(c), two sequential conv layers learn to generate a single channel output that represents saliency (probabilities) for each pixel. We show the predicted saliency map as green intensity values; the black pixels represent background regions.

The Deep SESR learning is guided along the primary branch by a series of conv and deconv (de-convolutional) layers. As Figure 2(c) demonstrates, the enhanced image (LR), and the SESR image (HR) are generated by separate output layers at different stages in the network. The enhanced image is generated from the conv layer that immediately follows FENet; it is supervised to learn enhancement by dedicated loss functions applied at the shallow output layer. The enhanced features are also propagated to another conv layer, followed by deconv layers for upsampling. The final SESR output is generated from upsampled features based on the given scale: , , or . Other model parameters, , the number of filters, kernel sizes, etc., are annotated in Figure 3.

4.2 Loss Function Formulation

The end-to-end training of Deep SESR is supervised by seven loss components that address various aspects of learning the function . By denoting as the generated output, we formulate the loss terms as follows:

1) Information Loss for saliency prediction is measured by a standard cross-entropy function [51, 64]. It quantifies the dissimilarity in pixel intensity distributions between the generated saliency map () and its ground truth (). For a total of pixels in , it is calculated as


2) Contrast loss (LR) evaluates the hue and luminance recovery in the enhanced images. The dominating green/blue hue in distorted underwater images often causes low-contrast and globally dim foreground pixels. We quantify this loss of relative strength (, intensity) in foreground pixels in RGB space by utilizing a differentiable function: Contrast Measurement Index (CMI) [59, 63]. The CMI measures the average intensity of foreground pixels () relative to the background () for an image , as

We exploit the saliency map (or to find the foreground pixels in (or , as and ; here, denotes element-wise multiplication. Subsequently, we compute the contrast loss as


An immediate consequence of using is that AAN can directly influence learning enhancement despite being on a separate branch. Such coupling also provides better training stability (otherwise AAN tends to converge too early and starts over-fitting). Moreover, in Figure 3(a)

, we show the distributions of CMI for training samples of the UFO-120 dataset, which suggests that the distorted samples’ CMI scores are skewed to much lower values compared to the ground truth. Hence,

forces the CMI distribution to shift toward higher values for learning contrast enhancement.

(a) Contrast measure: .
(b) Sharpness measure: .
(c) Image contrast and sharpness properties of a particular sample compared to its ground truth measurement.
Figure 4: The lack of contrast and sharpness in LRD samples of UFO-120 dataset (compared to their ground truth) are shown in (a) and (b); as seen, distributions for LRD samples are densely skewed to lower values, whereas the ground truth distributions span considerably higher values. A qualitative interpretation of this numeric disparity is illustrated in (c).

3) Color Loss (LR/HR) evaluates global similarity of the enhanced () and SESR output () with respective ground truth measurements in RGB space. The standard loss terms are: . Additionally, we formulate two perceptual loss functions that are particularly designed for learning underwater image enhancement and super-resolution. First, we utilize two wavelength-dependent chrominance terms: , and , which are core elements of the Underwater Image Colorfulness Measure (UICM) [55, 49]. By denoting , , and , as the per-channel numeric differences between and , we formulate the loss as:


On the other hand, being inspired by [17, 34], we evaluate the perceptual similarity at HR as


Here, , whereas , , and are the per-channel disparities between and . Finally, we adopt the color loss terms for enhancement and SESR as


4) Content loss (LR/HR) forces the generator to restore a similar feature content as the ground truth in terms of high-level representation. Such feature preservation has been found to be very effective for image enhancement, style transfer, and SISR problems [31, 34]; as suggested in [38], we define the image content function as high-level features extracted by the last conv layer of a pre-trained VGG-19 network. Then, we formulate the content loss for enhancement and SESR as


5) Sharpness loss (HR)

measures the blurriness recovery in SESR output by exploiting local image gradients. The literature offers several solutions for evaluating image sharpness based on norm/histogram of gradients or frequency-domain analysis. In particular, the notions of Just Noticable Blur (JNB) 

[22] and Perceptual Sharpness Index (PSI) [21]

are widely used; they apply non-linear transformation and thresholding on local contrast or gradient-based features to quantify perceived blurriness based on the characteristics of human visual system. However, we found better results and numeric stability by using the norm of image gradients directly; specifically, we use the standard

Sobel operator [24] for computing spatial gradient for an image . Subsequently, we formulate the sharpness loss for SESR as


In Figure 3(b), we present a statistical validity of as a loss component; also, edge gradient features for a particular sample are provided in Figure 3(c). As shown, numeric disparities for the norm of gradients between distorted images and their HR ground truth are significant, which we quantify by to encourage sharper image generation.

4.3 End-to-end Training Objective

We use a linear combination of the above-mentioned loss components to formulate the unified objective function as


where and are expressed by

Here, symbols are scaling factors that represent the contributions of respective loss components; their values are empirically tuned as hyper-parameters.

Figure 5: Each row demonstrates perceptual enhancement and saliency prediction by Deep SESR on respective LRD input images; the corresponding results of an ablation experiment shows contributions of various loss-terms in the learning.
Dataset RGHS UCM MS-Fusion MS-Retinex Water-Net UGAN Fusion-GAN FUnIE-GAN Deep SESR


UFO-120 blue red
EUVP red blue
UImNet blue red


UFO-120 blue blue red
EUVP red blue
UImNet blue red


UFO-120 blue red
EUVP blue red
UImNet red blue
Table 1: Quantitative performance comparison for enhancement: scores are shown as ; the first and second best scores (in each row) are colored red, and blue, respectively.
Figure 6: Qualitative comparison of Deep SESR-enhanced images with SOTA models: RGHS [29], UCM [33], MS-Fusion [4], MS-Retinex [72], Water-Net [43], UGAN [20], Fusion-GAN [44], and FUnIE-GAN [35].


SRGAN blue blue
RSRGAN blue blue
SRDRM blue blue blue blue blue
Deep SESR red red red red red red red red red


SRGAN blue blue blue
SRDRM red red
SRDRM-GAN blue blue blue blue blue
Deep SESR red red red red blue red red red
Table 2: Quantitative performance comparison for super-resolution: scores are shown as ; the first and second best scores (in each column per-dataset) are colored red, and blue, respectively. (Does not support scale)
Figure 7: Qualitative comparison for SISR performance of Deep SESR with existing solutions and SOTA models: SRCNN [18], SRResNet [42], SRGAN [42], RSRGAN [12], SRDRM [34], and SRDRM-GAN [34].

5 Experimental Results

5.1 Implementation Details

As mentioned in Section 3.2, Deep SESR training is supervised by paired data of the form

. We use TensorFlow libraries 

[1] to implement the optimization pipeline (of Eq. 10); a Linux host with two Nvidia™ GTX 1080 graphics cards are used for training. Adam optimizer [41] is used for the global iterative learning with a rate of and a momentum of ; the network converges within -epochs of training in this setup (with a batch-size of ). In the following sections, we present the experimental results based on qualitative analysis, quantitative evaluations, and ablation studies. Since there are no existing SESR methods, we compare the Deep SESR performance separately with SOTA image enhancement and super-resolution models. Note that, all models in comparison are trained on the same train-validation splits (of respective datasets) by following their recommended parameter settings. Also, for datasets other than UFO-120, the AAN (and ) is not used by Deep SESR as their ground truth saliency maps are not available.

5.2 Evaluation: Enhancement

We first qualitatively analyze the Deep SESR-generated images in terms of color, contrast, and sharpness. As Fig. 5 illustrates, the enhanced images are perceptually similar to the respective ground truth. Specifically, the greenish underwater hue is rectified, true pixel colors are mostly restored, and the global image sharpness is recovered. Moreover, the generated saliency map suggests that it focused on the right foreground regions for contrast improvement. We further demonstrate the contributions of each loss-term: , , , and for learning the enhancement. We observe that the color rendition gets impaired without and , whereas, contributes to learning finer texture details. We also notice a considerably low-contrast image generation without , which validates the utility of saliency-driven contrast evaluation via CMI (see Section 4.2).

Next, we compare the perceptual image enhancement performance of Deep SESR with the following models: (i) relative global histogram stretching (RGHS) [29], (ii) unsupervised color correction (UCM) [33], (iii) multi-scale fusion (MS-Fusion) [4], (iv) multi-scale Retinex (MS-Retinex) [72], (v) Water-Net [43], (vi) UGAN [20], (vii) Fusion-GAN [44], and (viii) FUnIE-GAN [35]. The first four are physics-based models and the rest are learning-based models; they provide SOTA performance for underwater image enhancement in RGB space (without requiring scene depth or optical waterbody measures). Their performance is quantitatively evaluated on common test sets of each dataset based on standard metrics [35, 49]

: peak signal-to-noise ratio (PSNR) 

[28], structural similarity measure (SSIM) [66], and underwater image quality measure (UIQM) [55]. The PSNR and SSIM quantify reconstruction quality and structural similarity of generated images (with respect to ground truth), whereas the UIQM evaluates image qualities based on colorfulness, sharpness, and contrast. The evaluation is summarized in Table 1; moreover, a few qualitative comparisons are shown in Fig. 6.

As Fig. 6 demonstrates, UCM and MS-Retinex often suffer from over-saturation, whereas RGBH, MS-Fusion, and Water-Net fall short in hue rectification. In comparison, the color restoration and contrast enhancement of UGAN, Fusion-GAN, and FUnIE-GAN are generally better. In addition to achieving comparable color recovery and hue rectification, the Deep SESR-generated images are considerably sharper. Since the boost in performance is rather significant for UFO-120 dataset (suggested by the results of Table 1), it is likely that the additional knowledge about foreground pixels through helps in this regard. Deep SESR achieves competitive and often better performance in terms of PSNR and SSIM as well. In particular, it generally attains better UIQM scores; we postulate that contributes to this enhancement, as it is designed to improve the UICM (see Section 4.2). Further ablation investigations reveal a drop in UIQM values without using in the learning objective.

Table 3: Deep SESR performance on the UFO-120 dataset; set-wise mean scores are shown for // SESR.
Figure 8: Color and texture recovery of Deep SESR: comparison shown with two best-performing SISR models (as of Table 2).

5.3 Evaluation: Super-Resolution

We follow similar experimental procedures for evaluating the super-resolution performance of Deep SESR. We consider the existing underwater SISR models named RSRGAN [12], SRDRM [34], and SRDRM-GAN [34] for performance comparison. We also include the standard (terrestrial) SISR models named SRCNN [18], SRResNet [42], and SRGAN [42] in the evaluation as benchmarks. We compare their , , and SISR performance on two large-scale datasets: UFO-120, and USR-248. The results are presented in Table 2, and a few samples are shown in Fig. 7. Note that, the test images of USR-248 dataset are left undistorted for a fair comparison.

As Table 2 demonstrates, Deep SESR outperforms other models in comparison by considerable margins on UIQM. This is due to the fact that it enhances perceptual image qualities in addition to spatial resolution. As shown in Fig. 7, Deep SESR generates much sharper and better quality HR images from both distorted and undistorted LR input patches, which contributes to its competitive PSNR and SSIM scores on the USR-248 dataset. Fig. 8 further demonstrates that it does not introduce noise by unnecessary over-correction, which is a prevalent limitation of existing automatic image enhancement solutions. Lastly, we observe similar performance trends for all three types of spatial down-sampling, , for Set-U, Set-F, and Set-O (see Section 3.2); we present the relative quantitative scores in Table 3.

(a) Comparison with a physics-based color restoration method [7] that uses spectral waterbody measures and haze-lines prior.
(b) Performance for , , and SESR on terrestrial images.
Figure 9: Demonstration of generalization performance of Deep SESR model (trained on UFO-120 dataset).

6 Generalization Performance

Due to the ill-posed nature of modeling underwater image distortions without scene-depth and optical waterbody measurements, learning-based solutions often fail to generalize beyond supervised data. In addition to the already-presented results, we demonstrate the color and texture recovery of Deep SESR on unseen natural images in Fig. 9. As seen in Fig. 8(a), Deep SESR-enhanced pixel intensities are perceptually similar to a comprehensive physics-based approximation [7]. Additionally, it generates the respective HR images and saliency maps, and still offers more than times faster run-time.

Deep SESR also provides reasonable performance on terrestrial images. As demonstrated in Fig. 8(b), the color and texture enhancement of unseen objects (, grass, face, clothing, etc.) are perceptually coherent. Moreover, as Table 4 indicates, its performance in terms of sharpness and contrast recovery for , , and SISR are competitive with SOTA benchmark results [46, 73]. Note that, much-improved performance can be achieved by further tuning and training on terrestrial datasets. Nevertheless, these results validate that the proposed architecture has the capacity to learn a generalizable solution of the underlying SESR problem.

Set5 [8] / blue / / /
Set14 [70] blue / / / /
Sun80 [61] / / / /
Table 4: Deep SESR performance on terrestrial test data; blue (and boldfaced) scores represent (and ) margins with SOTA benchmark results for // SISR [46, 73].

7 Operational Feasibility & Design Choices

Deep SESR’s on-board memory requirement is only MB, and it offers a run-time of milliseconds (ms) per-frame, , frames-per-second (FPS) on a single-board computer: Nvidia™AGX Xavier. As shown in Table 5, it provides much faster speeds for the following design choices:

1) Learning and on separate branches facilitates a faster run-time when HR perception is not required. Specifically, we can decouple the , branches from the frozen model, which operates at FPS ( faster) to perform enhancement and saliency prediction. As shown in Fig. 10, the predicted saliency map can be exploited for automatic RoI selection by using density gradient estimation techniques such as mean-shift [16]. The SESR output corresponding to the RoI can be generated with an additional ms of processing time.

, , ,
With FENet-1d ms ( FPS) ms ( FPS)
With FENet-2d ms ( FPS) ms ( FPS)
Table 5: Run-time comparison for various design choices of Deep SESR (on Nvidia™AGX Xavier).
Figure 10: Demonstration of automatic RoI selection based on local intensity values in the saliency map; Deep SESR can be applied again on the enhanced RoI for a detailed perception.

2) FENet-1d and FENet-2d are two design choices for the FENet (see Fig. 2(b)); FENet-2d is the default architecture that learns and filters in two parallel branches, whereas, FENet-1d refers to using a single branch of filters. As shown in Table 5, faster feature extraction by FENet-1d facilitates a speed-up for Deep SESR. However, we observe a slight drop in performance, , // lower scores for PSNR/SSIM/UIQM on UFO-120 dataset. Nevertheless, the generated images are qualitatively indistinguishable and the trade-off is admissible in practical applications.

Overall, Deep SESR offers use-case-specific design choices and ensures computational efficiency with robust SESR performance. These features make it suitable for near real-time robotic deployments; further demonstration is available in the supplementary material.

8 Conclusion

In this paper, we introduce the problem of simultaneous enhancement and super-resolution (SESR) and present an efficient learning-based solution for underwater imagery. The proposed generative model, named Deep SESR, can learn SESR and saliency prediction on a shared feature space. We also present its detailed network architecture, associated loss functions, and end-to-end training pipeline. Additionally, we contribute over annotated samples to facilitate large-scale SESR training on the UFO-120 dataset. We perform a series of qualitative and quantitative experiments, which suggest that Deep SESR: i) provides SOTA performance on underwater image enhancement and super-resolution, ii) exhibits significantly better generalization performance on natural images than existing solutions, iii) provides competitive results on terrestrial images, and iv) achieves fast inference on single-board platforms. The inspiring performance, computational efficiency, and availability of application-specific design choices make Deep SESR suitable for near real-time use by visually-guided underwater robots. In the future, we seek to incorporate spatial upscaling capability into the model with reasonable performance trade-offs.


We acknowledge the support of the MnDrive initiative at the University of Minnesota111 for our research. We are also grateful to the Bellairs Research Institute222 of Barbados for providing us with the facilities for field experiments. Additionally, we thank Nvidia™ for donating two GPUs for our work. Finally, we acknowledge our colleagues at the IRVLab333 for their assistance in data collection, annotation, and in preparation of media files.


  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, et al. (2016)

    TensorFlow: A System for Large-scale Machine Learning

    In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 265–283. Cited by: §5.1.
  • [2] D. Akkaynak and T. Treibitz (2018) A Revised Underwater Image Formation Model. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    pp. 6723–6732. Cited by: §1, §2.
  • [3] D. Akkaynak and T. Treibitz (2019) Sea-Thru: A Method for Removing Water From Underwater Images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1682–1691. Cited by: §1, §2.
  • [4] C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert (2012) Enhancing Underwater Images and Videos by Fusion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 81–88. Cited by: Figure 6, §5.2.
  • [5] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein Generative Adversarial Networks. In International Conference on Machine Learning (ICML), pp. 214–223. Cited by: §2.
  • [6] L. Bazzani, H. Larochelle, and L. Torresani (2017) Recurrent Mixture Density Network for Spatiotemporal Visual Attention. In International Conference on Learning Representations (ICLR), Cited by: §2.
  • [7] D. Berman, D. Levy, S. Avidan, and T. Treibitz (2018) Underwater Single Image Color Restoration using Haze-Lines and a New Quantitative Dataset. arXiv preprint arXiv:1811.01343. Cited by: §1, §2, 8(a), §6.
  • [8] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel (2012) Low-complexity Single-image super-resolution based on Nonnegative Neighbor Embedding. In British Machine Vision Conference, pp. 135.1–135.10. Cited by: Table 4, 4th item.
  • [9] B. Bingham, B. Foley, H. Singh, R. Camilli, K. Delaporta, R. Eustice, et al. (2010) Robotic Tools for Deep Water Archaeology: Surveying an Ancient Shipwreck with an Autonomous Underwater Vehicle. Journal of Field Robotics (JFR) 27 (6), pp. 702–717. Cited by: §1.
  • [10] M. Bryson, M. Johnson-Roberson, O. Pizarro, and S. B. Williams (2016) True Color Correction of Autonomous Underwater Vehicle Imagery. Journal of Field Robotics (JFR) 33 (6), pp. 853–874. Cited by: §1, §2.
  • [11] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao (2016) DehazeNet: An End-to-end System for Single Image Haze Removal. IEEE Transactions on Image Processing 25 (11), pp. 5187–5198. Cited by: §2.
  • [12] Y. Chen, J. Sun, W. Jiao, and G. Zhong (2019) Recovering Super-Resolution Generative Adversarial Network for Underwater Images. In International Conference on Neural Information Processing, pp. 75–83. Cited by: §1, §2, Figure 7, §5.3.
  • [13] Y. Chen, B. Yang, M. Xia, W. Li, K. Yang, and X. Zhang (2012) Model-based Super-resolution Reconstruction Techniques for Underwater Imaging. In Photonics and Optoelectronics Meetings (POEM): Optoelectronic Sensing and Imaging, Vol. 8332, pp. 83320G. Cited by: §2.
  • [14] Z. Cheng, Q. Yang, and B. Sheng (2015)

    Deep Colorization

    In IEEE International Conference on Computer Vision (ICCV), pp. 415–423. Cited by: §2.
  • [15] Y. Cho, J. Jeong, and A. Kim (2018) Model-assisted Multiband Fusion for Single Image Enhancement and Applications to Robot Vision. IEEE Robotics and Automation Letters (RA-L) 3 (4), pp. 2822–2829. Cited by: §2.
  • [16] D. Comaniciu and P. Meer (1999) Mean Shift Analysis and Applications. In IEEE International Conference on Computer Vision (ICCV), Vol. 2, pp. 1197–1203. Cited by: §7.
  • [17] CompuPhase (2019) Perceptual Color Metric. Note: 12-12-2019 Cited by: §4.2.
  • [18] C. Dong, C. C. Loy, K. He, and X. Tang (2015) Image Super-resolution using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2), pp. 295–307. Cited by: §2, Figure 7, §5.3.
  • [19] D. R. Edgington, K. A. Salamy, M. Risi, RE. Sherlock, D. Walther, and C. Koch (2003) Automated Event Detection in Underwater Video. In Oceans, Vol. 5, pp. 2749–2753. Cited by: §2.
  • [20] C. Fabbri, M. J. Islam, and J. Sattar (2018) Enhancing Underwater Imagery using Generative Adversarial Networks. In IEEE International Conference on Robotics and Automation (ICRA), pp. 7159–7165. Cited by: §1, §2, §3.2, §3.2, Figure 6, §5.2, 3rd item.
  • [21] C. Feichtenhofer, H. Fassold, and P. Schallauer (2013) A Perceptual Image Sharpness Metric based on Local Edge Gradient Analysis. IEEE Signal Processing Letters 20 (4), pp. 379–382. Cited by: §4.2.
  • [22] R. Ferzli and L. J. Karam (2009) A No-reference Objective Image Sharpness Metric based on the Notion of Just Noticeable Blur (JNB). IEEE Transactions on Image Processing 18 (4), pp. 717–728. Cited by: §4.2.
  • [23] X. Fu, P. Zhuang, Y. Huang, Y. Liao, X. Zhang, and X. Ding (2014) A Retinex-based Enhancing Approach for Single Underwater Image. In 2014 IEEE International Conference on Image Processing (ICIP), pp. 4572–4576. Cited by: §2.
  • [24] W. Gao, X. Zhang, L. Yang, and H. Liu (2010) An Improved Sobel Edge Detection. In International Conference on Computer Science and Information Technology, Vol. 5, pp. 67–71. Cited by: §4.2.
  • [25] Y. Girdhar, P. Giguere, and G. Dudek (2014) Autonomous Adaptive Exploration using Realtime Online Spatiotemporal Topic Modeling. International Journal of Robotics Research (IJRR) 33 (4), pp. 645–657. Cited by: §1.
  • [26] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680. Cited by: §2.
  • [27] K. He, J. Sun, and X. Tang (2010) Single Image Haze Removal using Dark Channel Prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (12), pp. 2341–2353. Cited by: §2.
  • [28] A. Hore and D. Ziou (2010) Image Quality Metrics: PSNR vs. SSIM. In International Conference on Pattern Recognition, pp. 2366–2369. Cited by: §5.2.
  • [29] D. Huang, Y. Wang, W. Song, J. Sequeira, and S. Mavromatis (2018) Shallow-water Image Enhancement using Relative Global Histogram Stretching Based on Adaptive Parameter Acquisition. In International Conference on Multimedia Modeling, pp. 453–465. Cited by: Figure 6, §5.2.
  • [30] Z. Hui, X. Wang, and X. Gao (2018) Fast and Accurate Single Image Super-resolution via Information Distillation Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 723–731. Cited by: §1, §2.
  • [31] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. Van Gool (2017) DSLR-quality Photos on Mobile Devices with Deep Convolutional Networks. In IEEE International Conference on Computer Vision (ICCV), pp. 3277–3285. Cited by: §4.2.
  • [32] S. Ioffe and C. Szegedy (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning (ICML), Vol. 37, pp. 448–456. Cited by: §4.1.
  • [33] K. Iqbal, M. Odetayo, A. James, R. A. Salam, and A. Z. H. Talib (2010) Enhancing the Low Quality Images using Unsupervised Colour Correction Method. In IEEE International Conference on Systems, Man and Cybernetics, pp. 1703–1709. Cited by: Figure 6, §5.2.
  • [34] M. J. Islam, S. S. Enan, P. Luo, and J. Sattar (2019) Underwater Image Super-Resolution using Deep Residual Multipliers. arXiv preprint arXiv:1909.09437. Cited by: §1, §1, §2, Figure 7, §4.2, §4.2, §5.3, 2nd item.
  • [35] M. J. Islam, Y. Xia, and J. Sattar (2019) Fast Underwater Image Enhancement for Improved Visual Perception. arXiv preprint arXiv:1903.09766. Cited by: §1, §1, §2, §3.2, §3.2, Figure 6, §5.2, 3rd item.
  • [36] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017)

    Image-to-image Translation with Conditional Adversarial Networks

    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134. Cited by: §2.
  • [37] D. J. Jobson, Z. Rahman, and G. A. Woodell (1997) A Multiscale Retinex for Bridging the Gap between Color Images and the Human Observation of Scenes. IEEE Transactions on Image processing 6 (7), pp. 965–976. Cited by: §2.
  • [38] J. Johnson, A. Alahi, and L. Fei-Fei (2016) Perceptual Losses for Real-time Style Transfer and Super-resolution. In European Conference on Computer Vision (ECCV), pp. 694–711. Cited by: §4.2.
  • [39] J. Kim, J. Kwon Lee, and K. Mu Lee (2016) Accurate Image Super-resolution using Very Deep Convolutional Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1646–1654. Cited by: §2.
  • [40] J. Kim, J. Kwon Lee, and K. Mu Lee (2016) Deeply-recursive Convolutional Network for Image Super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645. Cited by: §2.
  • [41] D. P. Kingma and J. Ba (2015) Adam: A Method for Stochastic Optimization. In International Conference for Learning Representations (ICLR), Cited by: §5.1.
  • [42] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-realistic Single Image Super-resolution using a Generative Adversarial Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690. Cited by: §1, §2, Figure 7, §5.3.
  • [43] C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao (2019) An Underwater Image Enhancement Benchmark Dataset and Beyond. In IEEE Transactions on Image Processing (TIP), pp. 1–1. Cited by: Figure 6, §5.2.
  • [44] H. Li, J. Li, and W. Wang (2019) A Fusion Adversarial Underwater Image Enhancement Network with a Public Test Dataset. arXiv preprint arXiv:1906.06819. Cited by: §1, Figure 6, §5.2.
  • [45] J. Li, K. A. Skinner, R. M. Eustice, and M. Johnson-Roberson (2018) WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images. IEEE Robotics and Automation Letters (RA-L) 3 (1), pp. 387–394. Cited by: §2.
  • [46] Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu (2019) Feedback Network for Image Super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3867–3876. Cited by: §3.2, Table 4, §6.
  • [47] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced Deep Residual Networks for Single Image Super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshops, pp. 136–144. Cited by: §2.
  • [48] P. Liu, G. Wang, H. Qi, C. Zhang, H. Zheng, and Z. Yu (2019) Underwater Image Enhancement With a Deep Residual Framework. IEEE Access 7, pp. 94614–94629. Cited by: §1, §2.
  • [49] R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo (2020) Real-world underwater enhancement: challenges, benchmarks, and solutions under natural light. IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1. External Links: Document, ISSN 1558-2205 Cited by: §4.2, §5.2.
  • [50] H. Lu, Y. Li, S. Nakashima, H. Kim, and S. Serikawa (2017) Underwater Image Super-resolution by Descattering and Fusion. IEEE Access 5, pp. 670–679. Cited by: §2.
  • [51] J. Lu, J. Yang, D. Batra, and D. Parikh (2016) Hierarchical Question-image co-attention for Visual question Answering. In Advances in Neural Information Processing Systems, pp. 289–297. Cited by: §2, §4.2.
  • [52] A. Maldonado-Ramírez and L. A. Torres-Méndez (2016) Robotic Visual Tracking of Relevant Cues in Underwater Environments with Poor Visibility Conditions. Journal of Sensors. Cited by: §2.
  • [53] X. Mao, Q. Li, H. Xie, R. YK. Lau, Z. Wang, and S. Paul Smolley (2017) Least Squares Generative Adversarial Networks. In IEEE International Conference on Computer Vision (ICCV), pp. 2794–2802. Cited by: §2.
  • [54] V. Nair and G. E. Hinton (2010) Rectified Linear Units Improve Restricted Boltzmann Machines. In International Conference on Machine Learning (ICML), pp. 807–814. Cited by: §4.1.
  • [55] K. Panetta, C. Gao, and S. Agaian (2016) Human-visual-system-inspired Underwater Image Quality Measures. IEEE Journal of Oceanic Engineering 41 (3), pp. 541–551. Cited by: §4.2, §5.2.
  • [56] E. Quevedo, E. Delory, GM. Callicó, F. Tobajas, and R. Sarmiento (2017) Underwater Video Enhancement using Multi-camera Super-resolution. Optics Communications 404, pp. 94–102. Cited by: §2.
  • [57] M. Roznere and A. Q. Li (2019) Real-time Model-based Image Color Correction for Underwater Robots. arXiv preprint arXiv:1904.06437. Cited by: §2.
  • [58] M. S. Sajjadi, B. Scholkopf, and M. Hirsch (2017) Enhancenet: Single Image Super-resolution through Automated Texture Synthesis. In IEEE International Conference on Computer Vision (ICCV), pp. 4491–4500. Cited by: §2.
  • [59] A. Shaus, S. Faigenbaum-Golovin, B. Sober, and E. Turkel (2017) Potential Contrast–A New Image Quality Measure. Electronic Imaging 2017 (12), pp. 52–58. Cited by: §4.2.
  • [60] C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszár (2017) Amortised Map Inference for Image Super-resolution. In International Conference on Learning Representations (ICLR), Cited by: §2.
  • [61] L. Sun and J. Hays (2012) Super-resolution from Internet-scale Scene Matching. In IEEE International Conference on Computational Photography (ICCP), pp. 1–12. Cited by: Table 4, 4th item.
  • [62] T. Tong, G. Li, X. Liu, and Q. Gao (2017) Image Super-resolution using Dense Skip Connections. In IEEE International Conference on Computer Vision (ICCV), pp. 4799–4807. Cited by: §2.
  • [63] M. Trivedi, A. Jaiswal, and V. Bhateja (2013) A Novel HVS Based Image Contrast Measurement Index. In International Conference on Signal and Image Processing 2012 (ICSIP 2012), pp. 545–555. Cited by: §4.2.
  • [64] W. Wang and J. Shen (2017) Deep Visual Attention Prediction. IEEE Transactions on Image Processing 27 (5), pp. 2368–2378. Cited by: §1, §2, §4.2.
  • [65] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy (2018) Esrgan: Enhanced Super-resolution Generative Adversarial Networks. In European Conference on Computer Vision (ECCV), pp. 0–0. Cited by: §1, §2.
  • [66] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. (2004) Image Quality Assessment: from Error Visibility to Structural Similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §5.2.
  • [67] W. Yang, X. Zhang, Y. Tian, W. Wang, J. Xue, and Q. Liao (2019) Deep learning for Single Image Super-resolution: A Brief Review. IEEE Transactions on Multimedia. Cited by: §2.
  • [68] D. Yu, J. Fu, T. Mei, and Y. Rui (2017) Multi-level Attention Networks for Visual Question Answering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4709–4717. Cited by: §2.
  • [69] X. Yu, Y. Qu, and M. Hong (2018) Underwater-GAN: Underwater Image Restoration via Conditional Generative Adversarial Network. In International Conference on Pattern Recognition(ICPR), pp. 66–75. Cited by: §2.
  • [70] R. Zeyde, M. Elad, and M. Protter (2010) On Single Image Scale-up using Sparse-Representations. In International Conference on Curves and Surfaces, pp. 711–730. Cited by: Table 4, 4th item.
  • [71] L. Zhang, B. He, Y. Song, and T. Yan (2016) Underwater Image Feature Extraction and Matching based on Visual Saliency Detection. In OCEANS, pp. 1–4. Cited by: §2.
  • [72] S. Zhang, T. Wang, J. Dong, and H. Yu (2017) Underwater Image Enhancement via Extended Multi-scale Retinex. Neurocomputing 245, pp. 1–9. Cited by: §2, Figure 6, §5.2.
  • [73] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu (2018) Residual Dense Network for Image Super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2472–2481. Cited by: §1, §1, §2, 2(a), §4.1, Table 4, §6.

Appendix I: Dataset Information

Appendix II: Credits for Media Resources

  1. David Gantt. A Sea Fan at Karpata. 2012. (Flickr):

  2. Diver Over a Reef. 2015. (SeaPics):

  3. Vincent POMMEYROL. Underwater Photography. 2012. (Flickr):

  4. BelleDeesse. Watercraft. 2013. (WallpaperUP):

  5. Nick Shaw. Leptomithrax gaimardii (Great Spider Crab). 2018. (Atlas of Living Australia):

  6. David Piano. City of Grand Rapids Shipwreck watermarked-2. 2018. (Flickr):

  7. Koi Photos Underwater View. The Pond Experts. 2014. (PondExperts):

  8. Cat Trumpet. 2 Hours of Beautiful Coral Reef Fish, Relaxing Ocean Fish, 1080p HD. 2016. (YouTube):

  9. Nature Relaxation Films. 3 Hours of Stunning Underwater Footage, French Polynesia, Indonesia. 2018. (YouTube):

  10. Calm Cove Club - Relaxing Videos. 4K Beautiful Ocean Clown Fish Turtle Aquarium. 2017 (YouTube):

  11. 4K Underwater at Stuart Cove’s, 2014 (YouTube):

  12. 4.000 PIXELS. Beautiful Underwater Nature. 2017 (YouTube):

  13. Magnus Ryan Diving. SCUBA Diving Egypt Red Sea. 2017 (YouTube):

  14. TheSilentWatcher. 4K Coral World-Tropical Reef. 2018. (YouTube):

  15. Awesome Video. 4K- The Most Beautiful Coral Reefs and Undersea Creature on Earth. 2017. (YouTube):

  16. Earth Touch. Celebrating World Oceans Day in 4K. 2015. (YouTube):

  17. BBC Earth. Deep Ocean: Relaxing Oceanscapes. 2018. (YouTube):

  18. Alegra Chetti. Let’s Go Under the Sea I Underwater Shark Footage I Relaxing Underwater Scene. 2016. (YouTube):

  19. Underwater 3D Channel- Barry Chall Films. Planet Earth, The Undersea World (4K). 2018. (YouTube):

  20. Undersea Productions. “ReefScapes: Nature’s Aquarium” Ambient Underwater Relaxing Natural Coral Reefs and Ocean Nature. 2009. (YouTube):

  21. BBC Earth. The Coral Reef: 10 Hours of Relaxing Oceanscapes. 2018. (YouTube):

  22. Robby Michaelle. Scuba Diving the Great Barrier Reef Red Sea Egypt Tiran. 2014. (YouTube):

  23. Bubble Vision. Diving in Bali. 2012. (YouTube):

  24. Vic Stefanu - Amazing World Videos. EXPLORING The GREAT BARRIER REEF, fantastic UNDERWATER VIDEOS (Australia). 2015. (YouTube):

  25. Our Coral Reef. Breathtaking Dive in Raja Ampat, West Papua, Indonesia Coral Reef. 2018. (YouTube):

  26. GoPro. GoPro Awards: Great Barrier Reef with Fusion Overcapture in 4K. 2018. (YouTube):

  27. GoPro. GoPro: Freediving with Tiger Sharks in 4K. 2017. (YouTube):

  28. TFIL. SCUBA DIVING WITH SHARKS!. 2017. (YouTube):

  29. Vins and Annette Singh. Stunning salt Water Fishes in a Marine Aquarium. 2019. (YouTube):

  30. Akouris. H.M.Submarine Perseus. 2014. (YouTube):

  31. Gung Ho Vids. U.S. Navy Divers View An Underwater Wreck. 2014. (YouTube):

  32. Martcerv. Truk lagoon deep wrecks, GoPro black with SRP tray and lights. 2013. (YouTube):

  33. Dmireiy. Shipwreck Diving, Nassau Bahamas. 2012. (YouTube):

  34. Frank Lame. diving WWII Wrecks around Palau. 2010. (YouTube):

  35. Stevanurk. Wreck Dives Malta. 2014. (YouTube):

  36. Stevanurk. Diving Malta, Gozo and Comino 2015 Wrecks Caves. 2015. (YouTube):

  37. Octavio velazquez lozano. SHIPWRECK Scuba Diving BAHAMAS. 2017. (YouTube):

  38. Drew Kaplan. SCUBA Diving The Sunken Ancient Roman City Of Baiae, the Underwater Pompeii. 2018. (YouTube):

  39. Octavio velazquez lozano. LIBERTY SHIPWRECK scuba dive destin florida. 2017. (YouTube):

  40. Blue Robotics. BlueROV2 Dive: Hawaiian Open Water. 2016. (YouTube):

  41. JerryRigEverything. Exploring a Plane Wreck - UNDER WATER!. 2018. (YouTube):

  42. Rovrobotsubmariner. Home-built Underwater Robot ROV in Action!. 2010. (YouTube):

  43. Oded Ezra. Eca-Robotics H800 ROV. 2016. (YouTube):

  44. Geneinno Tech. Titan Diving Drone. 2019. (YouTube):

  45. Scubo. Scubo - Agile Multifunctional Underwater Robot - ETH Zurich. 2016. (YouTube):

  46. Learning with Shedd. Student-built Underwater Robot at Shedd ROV Club Event. 2017. (YouTube):

  47. HMU-CSRL. SQUIDBOT sea trials. 2015. (YouTube):

  48. MobileRobots. Aqua2 Underwater Robot Navigates in a Coral Reef - Barbados. 2012. (YouTube):

  49. Daniela Rus. underwater robot. 2015. (YouTube):

  50. JohnFardoulis. Sirius - Underwater Robot, Mapping. 2014. (YouTube):