Although technology allows to map the surface of the Moon or even Mars, there are still large knowledge gaps for our own planet. More than half of Earth’s surface is covered by at least one kilometer of sea water, and virtually all of this area has never been seen by any human and has not been visually mapped. The sunlight penetrates only a few hundred meters into the ocean and visibility of underwater cameras is limited to tens of meters under ideal (or laboratory) conditions and to typically less than 10m for deep diving robots in practice (even less in coastal waters, see fig. 1).
The lack of sunlight in the deep ocean requires robots to bring their own light sources, which creates two main problems: First, scattering of light can be viewed as a nuisance effect that makes images appear foggy, when light source and camera are relatively close to each other jaffe-90-optimalUnderwaterImagingSystems (as the volumetric scattering function of different waters, e.g. measured in the seminal work of Petzold petzold1972volume , has strong contributions into the ”backwards” direction). Since a real deep ocean robot has to maneuver in a harsh environment and is deployed by a surface vessel, it has to obey to physical limits to size and camera-light layout. Robots that fly very close to the seafloor suffer less from scattering, but a robot at 5 meter altitude can cover way more area per hour as compared to a robot at one meter altitude (area footprint of an image grows quadratically with altitude, and cameras at higher altitude can move faster before motion blur becomes visible). Unless one is willing to use a team of robots with distributed light sources and cameras, image material taken by a single robot for large area mapping will be degraded by scattering. Second, when that robot is moving, also the light cones will move through the total darkness and create changing illumination at the seafloor (see e.g. song2021deepsea ). Additionally, distance-dependent absorption makes surface points appear darker or brighter (or change apparant color) when taking overlapping imagery of the seafloor. Even when traversing a flat seafloor at constant altitude, outer rays in a downward facing camera have traveled a longer way from the seafloor as compared to the central ray. The actual amount of scattering and attenuation depends on the (local) composition of the water and varies with wavelength and distance. Finally, rather than using an ideal point light source, a real robot uses real lamps, nowadays often multi-LED setups, that have an angular characteristic bryson16colorcorection ; song2021deepsea . Due to refraction effects, this characteristic can differ in air and in water. For energy budget reasons, and to achieve reasonably homogeneous illumination, often multiple light sources are mounted to the robot, and the pattern projected to the seafloor depends on altitude and the robot’s 3D orientation.
All these details make it really difficult to calibrate a lighting system and to determine all the physical parameters involved. However, the light and water effects significantly change the apparent color of a seafloor point when it is illuminated and observed multiple times from different viewpoints and distances, in particular for high-altitude mapping. These strong appearance changes impair image registration (both sparse and dense correspondence search) and light effects can dominate larger maps when not compensated.
Classical underwater light propagation models McGlamery-1975-LightColModel ; jaffe-90-optimalUnderwaterImagingSystems have been used in the literature to undo some of these effects bryson16colorcorection , but there are many parameters to estimate. On top, for scenarios, where we need to compensate strong illumination effects before we can estimate the motion, parametric physical models lead to a chicken-and-egg-problem, since the motion would be needed to compensate the light. When exploring never-before visited territories of our planet, methods that require huge amounts of training data are not appropriate either, and in general very little ground-truthed data exists for the deep sea, since it is practically impossible to see how the ocean floor would look without water. In this work, we therefore propose an empirical, parameter-free way of estimating illumination, attenuation and scattering, simply as multiplicative and additive terms that change the true surface color in the respective pixel. We show a robust but very simple way of estimating and compensating them, the overall method taking less than one second for a high resolution photo, allowing efficient processing of datasets of ten-thousands of high-resolution photos in a few hours and much quicker for lower (preview or live) resolution.
2 Previous Work and Contribution
Underwater imaging has a long history (see jaffe15underwaterimaging for a recent overview). Tractable models for underwater lighting have been proposed by McGlameryMcGlamery-1975-LightColModel and Jaffejaffe-90-optimalUnderwaterImagingSystems and the low-level physics are discussed in detail by Preisendorfer and Mobley preisendorfer1964physical ; mobley1994light . Garcia et al. garcia02lighting provide a general overview of lighting issues for robotics. In early work for post-processing after the dive, Pizarro and Singhpizarro03largeareamosaicing divide each photo by the mean image of an entire mission, which imposes strong assumptions on altitude and attitude. As a parameter-free approach it is still used in ocean science practicemorris_2014-autosub , although it is not robust and does not account for scatter.
In general, two different types of approaches for tackling the underwater effects can be distinguished: Those that estimate the parameters of physical models and that undo the effects are called restoration techniques (e.g. schechner_05-underwaterPolarization ; Trucco_2006-underwaterImgEnh ; Sing_2007-towardsimaging ; Treibitz_2009-PolarizationDescatter ; Nicosevici_2009-efficientmosaicing ; sedlazeck20093d ; Williams_2012-benthicMonitoringAUV ; Tsiotsios_2014_CVPR ; bryson16colorcorection ; Akkaynak19seethrough ). Though restoration methods have a solid physical interpretation, the models are often very complex and require perfect knowledge and calibration of many parameters such as position, orientation and angular characteristic of every light source, camera calibration, refractive interfaces of all lights and cameras, water absorption and scattering parameters, or complete distance information for each pixel in every image, which can be infeasible in practise. Other, enhancement techniques, have been proposed that try to empirically improve image quality in fog, haze or underwater, e.g. by color histogram equalization, homomorphic filtering or using some assumptions about the scene (e.g. Bazeille_2006-imgPreProc ; Iqbal_07-imgEnhancement ; ancuti12enhancing ; galdran18dehazingmulti-exposurefusion ; santra2018transmittance ; KIM2013410 ; zhao19multiscale ), see also Raimondo_2010-ColCorrectionStateOfArt ; wang19reviewenhancement
for an overview. In contrast to restoration approaches, enhancement techniques usually do not require precalibrating all the parameters, but in particular single image enhancement methods are usually facing a strong ambiguity when trying to separate water effects, light cones and surface texture. Predicting plausible heuristics for previously unvisited deep sea territories is challenging. Also mixtures of pure empirical and strict physical models have been developed. These typically estimate a depth map from a single image (e.g. like using the dark channel prior in airhe11darkchannel ) and then use this approximate geometric layout to invert a parametric underwater imaging model ancuti16descattering ; peng15blurriness ; UnderwaterHazeLines . Learning water and illumination effects as for shallow waterLi_2017 is difficult, because very little training data exists for how the ocean floor would look without water and manually correcting images is infeasible for human annotators.
Despite the huge number of listed approaches above, almost all of them are designed for shallow water with sunlight and most deal with variants of the fog modelcozman1997depth . This an important setting in coastal areas and for diver scenarios in the top few meter of the ocean, but the lighting regime is entirely different from the deep seasong2021deepsea . Only Sing_2007-towardsimaging and bryson16colorcorection consider light cones and the deep sea scenario, although none of the two compensates additive backscatter. In an approach inspired by homomorphic filtering, Singh et al.Sing_2007-towardsimaging , fit a 4th order polynomial to an image in log space in order to represent the multiplicative illumination. This can capture the effects of a single light cone, but will also vary depending on the seafloor structures, i.e. is image dependent. On top, for multi-LED setups the degree of the polynomial fitted has to be adapted to the complexity of the illumination pattern. The work of Bryson et al.bryson16colorcorection on the other hand requires to model and calibrate each of the light sources jointly with the camera in order to undo the lighting effects. While this is a desirable solution in theory, obtaining all light source, camera and water parameters for a heavy deep sea robot with 24 LEDs can be challenging. On top, as we argue in this paper, robot localization and mapping can benefit from previous image enhancement, which is however not possible in case the enhancement itself already requires the results of the robot localization and mapping (chicken-and-egg problem).
Consequently, in this paper we propose a new calibration-free method that robustifies the mean image idea of pizarro03largeareamosaicing , extend it to scenarios with significant backscatter and generalize to missions with varying altitude/attitude. At the same time we analyse assumptions, applicability and breakdown point in detail. The novel contributions and desirable properties are as follows:
We show that effects can be categorized into additive or multiplicative nature. We then perform automatic, robust estimates of the sum of the additive and the product of the multiplicative components. These estimates do not suffer from floating particles or occasional bright or dark seafloor patches and no user interaction is needed.
We give clear preconditions in what scenarios the algorithms will work (breakdown point of robust estimator to observe the dominant seafloor). The only steering parameter (filter size) is rigorously derived from seafloor properties (percentage of uniform seafloor).
Rather than assuming a fixed light pattern at the seafloor, we only assume the additive component (scatter) to be static during a mission and dynamically re-compute the multiplicative estimate per image, allowing us to cope with varying altitudes and changes in vehicle orientation.
The approach does not require calibration of physical parameters, and we do not require any knowledge about light orientation and distance, nor lens calibration, nor water properties, nor a 3D model of the scene, making the approach attractive even for old videos with unknown parameters.
The sliding-window techniques are compatible with a streaming architecture allowing for a (near) real-time implementation for estimation and compensation.
Since obtaining ground truth for deep waters is close to impossible, we propose a new objective metric for computing the restoration quality on real data without ground truth: Overlapping imagery should restore the same color for corresponding pixels. The proposed metric does not require the true color and avoids a bias towards dark or low contrast restorations.
We also explicitely re-sketch the artificial light and water effects for the deep sea scenario with co-moving light sources (see also jaffe-90-optimalUnderwaterImagingSystems ; song2021deepsea ). This is not a new derivation, but we believe that it is important for readers to distinguish this scenario from shallow water settings with sunlight often approximated by the “fog model“cozman1997depth .
Once enhanced or restored versions of the original images have been created, usually some water or lighting effects remain. In order to create large maps, different mosaicing and blending strategies can be applied (see prados11blending for a discussion). Note however that the main goal of blending strategies is to make artefacts less prominent (e.g. by distributing intensity discrepancies over a larger area rather than creating a hard edge), or to create visually pleasant maps. Our goal is to improve the consistency of the input images, such that they can be used for visual mapping purposes (feature correspondences, dense reconstruction, loop detection and texturing).
3 Parameter-free Light Compensation
To illustrate the assumptions of the model used, we will look into figure 2 and inspect a particular ray that reaches a sensor pixel (and the sensor pixel will integrate over a range of wavelength and a range of spatial rays). Basically, the incoming light along a ray originates from a non-uniform point light source with angular characteristics . Commonly, several or many light sources are used, and the light from each of them has to be considered, but for clarity, we will just mention one light source here (since there is no interaction of light sources, the light received from a multi-light system is just the sum of all individual lights). The directional pattern of a light source might be quite complex and so in this contribution we do not attempt to estimate it, but use the term just for illustration purposes. Following one of the directions from the source, a fraction of the light will hit the seafloor after having traveled the lightsource-seafloor distance , and the light is attenuated along this distance. The seafloor reflects some amount of the incoming light according to its bidirectional reflectance distribution function towards the direction of the camera (for Lambertian surfaces, can be considered the color, or the albedo, of the surface, weighted by the cosine of the incident illumination). The reflected light then travels a distance of from the seafloor to the camera and is attenuated along the way, before some of this light reaches the image sensor. On top, also non-desired light is back-scattered into the optical path, and the sum of all scattering along the ray adds to the previously explained light component. So the overall intensity received at a camera pixel can be modeled as
where is a water-specific attenuation coefficient. Note that , ,, are all wavelength dependent. Since each pixel sees a different seafloor point, and as well as the distances and will vary with pixel position , and also the backscatter ”collected” along the respective line of sight depends on the pixel position. See fig. 3 for an example image.
If all parameters, including the camera/light pose dependent distances and angles, were perfectly known, one could try to use image restoration techniques to solve for the seafloor albedo , although this is a challenging problem already in shallow water Akkaynak19seethrough . However, prior to optical localization and mapping, distances and orientations of camera and lights to each seafloor point are not known exactly, or for old videos important parameters might be missing or detailed localization and mapping be infeasible. Even when planning a mission, geometric and radiometric calibration of a multi-light and camera system is a challenging task. Consequently, in the following we describe how the numerous individual parameters can be grouped into larger “combined effects“ which can be obtained from the statistics of the acquired images, if there is a predominant seafloor color (sediment, sand, etc.). Essentially eq. 1 can be rearranged into multiplicative and additive effects, and we will write the product of all factors as a function :
Actually, all depend on the pixel position in the image and on the relative pose between the vehicle and the ground ( only depends on the wavelength we are considering). To make this dependence more clear, we explicitly write
For an image taken at very high altitude (see fig. 4) and will be so large that will hold and we will see only backscatter:
This backscatter actually depends only on the relative pose of the light source with respect to the camera (but not to the ground) and the angular characteristics of the light source. To distinguish the deep sea setting from the natural sunlight setting, in fig. 5 we applied the simulator from song2021deepsea to simulate the scattering that happens in the deep sea (or at night) at different distances to the camera when a light cone originates from a position 2m to the right of the camera in the open water. This is a simple motivational example that only considers single scattering between light source and camera and the volume scattering function is chosen from Petzold’s measurement petzold1972volume (clear water), but attenuates all light exponentially with the distance travelled in the water. It can be seen that most of the visible scattering happens close to the camera, because also the scattered light is attenuated and only little intensity is observed from far away scattered light. There is only relatively little light scattered at 5m distance that reaches the camera. In fig. 6 we plot the backscatter received by a hypothetical camera along a single viewing ray where the light is at 2m and 1m distance sideways to the camera. For this simulation we use Jerlov water type II and volume scattering according to Petzold petzold1972volume (offshore southern California). However, in our experience, this scenario holds also when operating with artificial illumination in murky water, simply with all distances reduced: The camera has to go closer to the seafloor in order to see it, and for practical reasons (keep homogenoeus seafloor illumination, avoid drastic shadows, robot maneuverability close to the ground) we then use light sources that are closer to the camera (deep sea also requires more massive, larger vehicles), resulting in a similar relative geometry between seafloor, camera and light.
Consequently, we now make the assumption that the majority of the scattering originates from the first few meters in front of the camera and that during the seafloor mapping we will always fly ”high enough” to see most of the scattering, i.e. . Substituting this back into the equation, we obtain:
i.e. the intensity observed in the camera consists of the seafloor albedo multiplied by a factor image that depends on the lighting configuration, the relative pose of the vehicle with respect to the seafloor and the water attenuation, plus a fixed scatter image. For optical deep sea mapping missions in smooth terrain, AUVs usually follow a ”fixed altitude” mission. In this case the pose is constant over time and the factor image just depends on the pixel position but does not vary over time. But even if the AUV varies the altitude, the motion change of heavy diving robots is usually so slow that can be considered almost constant over short periods of time. In order to infer the seafloor color from an image , we will now perform robust estimates of the ”factor image” and the summand image .
3.1.1 Additive Term
Before reaching the working altitude above the seafloor the robot should already capture a certain number
of images that only show the water column, revealing information about the scattering. In practice, these images will also contain bright floating particles or dark parts very close to the camera which are not inside the light cone. These measurements have to be considered as outliers and consequently a robust estimator (cf. torobuststatistics ) is required to obtain from multiple measurements at each pixel position. We focus on the case where much less than half of the image pixels show floating particles, and where the particle positions can be considered random (i.e. they are not staying at the same position over time). This means that at each pixel position more than 50% of the time we can observe . Since the median has a breakdown point robuststatistics of 50%, we will perform a temporal median (across the measurements) at each pixel position to infer the ideal pure scatter image .
3.1.2 Multiplicative Term
We now suggest to estimate the factor image in a similar way. For illustration, imagine first that an “all-seafloor image” would be given that shows only homogeneous sediment of known sediment color . In such an ideal image of a homogeneous seafloor, we would typically still see the illumination pattern: The factor image would cause different observed pixel values depending on the pixel position, and each pixel in represents the factor for the light ray that belongs to the corresponding viewing ray. We denote the pose when this all-seafloor image was seen by , such that we can write also this ideal all seafloor image using multiplicative and additive terms:
This can be solved linearly for at each pixel position , since all other components are given. Afterwards, both the additive and the multiplicative lighting effects are known.
Now, whenever the AUV has exactly this relative pose to the seafloor, we can compute the seafloor albedo at a particular image position by the simple division
is our restored image of the seafloor.
It can be seen that the seafloor color used in equation 6 plays the role of a white balance reference in normal photographs. When setting it to grey although the actual seafloor color is brown, all other colors change accordingly, but in a consistent linear fashion. is still correct up to a global scale factor, i.e. we can later also correct all images up to one single global scale factor (per color channel resp. wavelength) if desired. For mapping and reconstruction this means that without prior knowledge the seafloor color can be chosen as grey and all images will be enhanced in a consistent way (allowing matching, SLAM and stereo reconstruction). This is similar to mapping on land with a camera that uses a fixed but unknown white balance.
Since usually a perfect all-seafloor image as needed by equation 6 is not available, we will now consider estimating it from survey data. For instance, in case multiple all-seafloor images exist, each perturbed by Gaussian noise, it is suggested to average them for estimation of . These images can even be captured at different locations, as long as the relative pose between the AUV and the seafloor ground plane stays the same (same altitude and pitch and roll). In the common case the seafloor is not of uniform color, but “contaminated”, e.g. by stones, fauna or other objects that cannot be considered Gaussian noise, a robust estimate of the all-seafloor image is suggested. Same as for the scatter image, for scenarios where the images shows significantly more than 50% uncontaminated seafloor with a uniform spatial distribution of objects occuring, we suggest using the temporal median as a robust estimator at each pixel position, i.e. we compute the median intensity at each pioxel position over many images. The majority will contain seafloor at this coordinate, and only in a few images this pixel will display a rock or other object. In the next section we will turn to the question over how many images we have to compute the median at least.
3.1.3 Determining the number of samples
Let be the number of images where we inspect a certain image position and let
be the general contamination rate of the images. The probabilityof obtaining uncontaminated seafloor samples from
images is then described by a binomial distribution. With increasing number of images , the probability that at least half the samples being contaminated becomes smaller and smaller (as ):
Consequently, the number of contaminated seafloor images to be used in pixelwise median estimation depends on the contamination rate , and can be chosen such that from eq. (8) becomes almost zero. For instance, if of the image is contaminated and using a median on 7 images, only in of the cases more than 3 contaminated samples are expected, which would invalidate the median estimate at this position (since the median has a breakdown point of 50%). Still, 3% of a 10 megapixel image means 300.000 pixels, for which median estimation does not work. For very special, high-frequent illumination patterns, the number of images must be increased in this case. However, typical illumination patterns vary smoothly, and so can be expected to be almost constant in a small neighborhood around some position . Therefore, the number of samples can also be increased by including spatial neighbors of the pixel under consideration and robust spatial averaging within the image (i.e. a spatial median) can be used to increase the number of uncontaminated seafloor samples available.
3.1.4 Varying Poses and Terrain
Unfortunately, because of terrain variations, and because of altitude and pose variations of the vehicle, the relative pose between the camera and the seafloor typically varies during a several hours mission. Therefore it is not recommended to compute only one all-seafloor image for an entire image sequence, but to compute an individual all-seafloor image in a sliding window fashion for each image, as for short amounts of time, the pose of the vehicle with respect to the seafloor is typically stable.
To summarize, for scenarios with seafloor contamination, we suggest to compute a sliding window temporal median for the images before, at and after the current image under investigation. Then, each resulting all-seafloor image is used in eq. (7) for normalization of the respective original image after subtracting the additive scatter commponent.
The goal of this work is an approach that can efficiently handle ten-thousands of photographs to enable large scale deep ocean mapping. The prototypical, unoptimized reference implementation on a single 3.7GHz Xeon CPU requires almost one minute of computation time for each 12 Megapixel photo. However, most operations are suitable for parallel scheduling and thus the algorithm was implemented in CUDA and parallelized on the pixel level. All images needed for the temporal median are uploaded to the GPU at the same time and organized as a ring buffer. Once the median on the central image is completed, the ”oldest” image is replaced by a new image from the stream.
To reduce the number of temporal images required for reliable robust estimation, and since the illumination patterns are usually very smooth, by default we downsample the undistorted 12MP input images by a factor of 8 (after a spatial median) to perform the temporal median on 7 images (and upsample afterwards). The overall implementation provides estimates of and produces 12Megapixel color-corrected images at 2Hz, which is twice as fast as we record the photos during a mission.
4.1 3D Reconstruction
In fig. 8 we have run a commercial 3D reconstruction software (Agisoft Photoscan) on three sample images of the flat seafloor. The software will first find correspondences, do robust matching and epipolar geometry estimation, triangulation of correspondences and bundle adjustment, dense estimation and finally produce a 3D model. The top part of the figure shows the raw images and two screenshots of a partially failed, blueish reconstruction. In the bottom part, we have run the same software with the same parameters on the same images, but this time the enhanced version of the images, and we obtain a detailed, consistent 3D model, where we cannot see the seams. In fig. 9 we perform a similar experiment on about 100 images taken by a different AUV with different camera and light system in the Baltic Sea, showing a world war torpedo sunken into the sediment (see figure 9f for a sample image in this very murky water). Also here we enhance the images prior to 3D reconstruction with our method and other approaches and show the results. Note that the torpedo example is taken in very turbid greenish coastal waters where capturing has to take place from low altitude (ca. 1m), whereas the deep sea data is much clearer and captured from high altitude (ca. 5m). It should be clear that an entire 3D reconstruction pipeline depends on many parameters and design choices, such that this should just be seen as an example. However, the result is in agreement with our own finding that our images are a useful preprocessing step before matching (both for sparse and dense correspondences).
4.2 Deep Sea Mosaic
In figures 10 and 11, we have registered the normalized versions of an image sequence taken at more than 4km water depth in polymetallic nodule fields at the seafloor of the Pacific Oceangreinert2017siar . The distance between camera and a 24-LED-flash (4kW electrical power) was about 2m, and the flying altitude was slightly less than 5m, taking one image per second with 1.5m/s speed and an across track field of view of 90 (undistorted). As outlined, the seafloor color can be assumed as grey (fig. 11, right) or can be set to the color of a seafloor sample if available (fig. 11, left), e.g. when taking samplesgreinert2015_SO242 . The micro-navigation of the deep sea robot has been obtained using structure from motion techniques on the enhanced images. This navigation information is then used to stitch the raw images (fig. 10) and also the enhanced versions (fig. 11) of the images. For stitching we use a simple two-band blending with rectangular weight that goes from 1 (image center) to 0 (image boundary): The high frequency information is taken from the pixel with the highest weight, the low frequency information is averaged among the images. It should be clear that the blending can be further tuned and improved, but the goal here is not to hide the problems, but rather to show them. The raw mosaic suffers from strong illumination effects at the right and left boundaries. If the mosaic is created from 10000 images, it will be dominated by the illumination patterns.
We also qualitatively compare our technique to fusion enhancementancuti12enhancing , multi exposure fusiongaldran18dehazingmulti-exposurefusion , optimized contrast enhancementKIM2013410 and backscatter removalzhang2016removing using the implementation and default parameters provided in wang19reviewenhancement . Note that most of these approaches were not designed for deep sea scenarios, but we believe it is nevertheless interesting to qualitatively see the effects. The results are displayed in fig. 12 and it can be seen that none of these methods produces consistent results for the deep sea light cone setting. It would very likely be possible to improve the results by tuning parameters, but it can be seen that all approaches suffer from inconsistency between overlapping images. The key idea of our proposed solution is that we do not need to sit down after each mapping campaign and manually adjust the parameters and retrain the algorithms. This would be impractical and thus we need a parameter-free approach. The only approach that produced reasonable images is Sing_2007-towardsimaging by fitting a 4th order polynomial in log space. However, the approach only considers multiplicative effects, which results in a loss of contrast, and the degree of the polynomial has to be adapted to the light cone: If the polynomial degree is too high, it will fit to (and remove) scene structures, if too small, it cannot cover the illuminatin pattern (e.g. for multi-LED setups).
Please note that for our approach knowledge and calibration of light sources, water properties and camera calibration is not needed, neither 3D reconstruction of the scene, and therefor approaches that require all these parameters cannot be compared (e.g. bryson16colorcorection ).
For the deep sea mosaics in figures 11,12, we also numerically analyse how consistent the colors of corresponding seafloor areas are. Before blending, we compare the backward-mapped colors of a seafloor mosaic pixel from different input images and compute their difference to the mosaic. This is averaged over all pixels that are seen from more than one input photo to obtain a RMSE. Finally, this number is divided by the standard deviation of all pixels produced by the respective normalization method, in order to avoid a bias for methods that just make the image dark or uniform color (which would be maximally consistent). Our method for deep sea lighting compensation outperforms all other methods (cf. to fig. 13). Note that only our method and Sing_2007-towardsimaging do not produce a dark rim towards the image boundaries. The data set chosen for evaluation does not even fully consider this darkening, as it is just a 1D transect. For large 2D mosaics with multiple tracks next to each other, the consistency margin would be even way higher. Consistency is important for instance for loop detection and to compensate drift, because places seen earlier appear less dissimilar if properly enhanced. It can also be seen, that Sing_2007-towardsimaging on the other hand suffers from a loss in contrast, potentially because the model is just multiplicative, but maybe also because the polynomial can fit to local seafloor structures that are actually no lighting artefact and then overcompensates.
Finally, in fig. 14 a cutout from 4 photos of the same object is presented. The left column shows the raw image data and the right column shows the enhanced version. It can be seen that despite quite different raw image appearance, the enhanced result stays qualitatively stable.
When viewing the enhanced images, the illumination patterns are completely removed and objects photographed several times show very consistent colors. Of course, shadows still remain, and shading of micro profiles remains, which makes image matching still challenging. Nevertheless, the method seems well suited as a preprocessing step to remove the largest nuisances, also from old or uncalibrated scenarios and – as a fast enhancement method – does a quite good job even for the final map. However, once the enhanced images have helped in registering the images and to find the micronavigation and to reconstruct the surface, one can still run the full parametric image restoration to create maps that are related to a physical model (including measurement uncertainties).
Limitations and Failure Cases
The main assumption of the method is that the seafloor has a constant dominant (50% of pixels) color and the method enforces this. If the seafloor changes from dark brown to light brown, the algorithm will force this change into the water-light regime. Consequently, the colors of animals or objects at the seafloor would be relatively altered and this situation can only be detected by monitoring the multiplicative image. On the other hand, a change of apparent seafloor color could also be explained by different water composition, leading to different attenuation behaviour. This is a generic problem for all approaches that can only be solved by extra knowledge (e.g. monitoring water properties).
Similarly, when factorizing the image into albedo and lighting uneven illumination patterns will be removed but there remains an ambiguity about the absolute color unless the seafloor color is (approximately) known. This can be imagined as a constant white balance of the entire mosaic (that can be adapted in post-processing). The effect can be seen in fig. 10. For most mapping applications, the absolute color of the seafloor will be less important than having a consistent map.
Moving light sources together with attenuation and scattering effects impair mapping in the deep sea, i.e. finding correspondences, 3D reconstruction and also making maps without water effects. The nuisances can be decomposed into additive and multiplicative terms and we have presented a robust and automatic method to estimate these terms when the seafloor is predominantly homogeneous and flat. The key observation is that in more than 50% of the pixels we expect to see a dominant seafloor color, which is also the clear prerequisite for being able to use the algorithm. The method does neither require calibration, nor determining physical water parameters or knowledge about light and camera configuration and therefore has the potential for improving also old seafloor footage or uncalibrated videos from the web. The enhanced images can improve SLAM and 3D reconstruction and can be used as a basis for large scale maps, which traditionally suffer from lighting artefacts that obscure the actual patterns. An efficient implementation allows the algorithm to run in (near) real-time, which is in contrast to other, partially very expensive, correction algorithms.
- (1) J. S. Jaffe, Computer modeling and the design of optimal underwater imaging systems, IEEE Journal of Oceanic Engineering 15 (2) (1990) 101–111. doi:10.1109/48.50695.
- (2) T. J. Petzold, Volume scattering functions for selected ocean waters, Tech. rep., Scripps Institution of Oceanography La Jolla Ca Visibility Lab (1972).
Y. Song, D. Nakath, M. She, F. Elibol, K. Köser, Deep sea robotic imaging simulator, in: Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021, Springer, 2021, pp. 375–389.doi:https://doi.org/10.1007/978-3-030-68790-8_29.
M. Bryson, M. Johnson-Roberson, O. Pizarro, S. B. Williams,
correction of autonomous underwater vehicle imagery, Journal of Field
Robotics 33 (6) (2016) 853–874.
- (5) B. L. McGlamery, Computer analysis and simulation of underwater camera system performance, Tech. rep., Visibility Laboratory, Scripps Institution of Oceanography, University of California in San Diego (1975).
- (6) J. S. Jaffe, Underwater optical imaging: The past, the present, and the prospects, IEEE Journal of Oceanic Engineering 40 (3) (2015) 683–700. doi:10.1109/JOE.2014.2350751.
- (7) R. Preisendorfer, Physical aspect of light in the sea, Univ. Hawai. Press. Honolulu Hawaii 51 60.
- (8) C. D. Mobley, Light and water: radiative transfer in natural waters, Academic press, 1994.
- (9) R. Garcia, T. Nicosevici, X. Cufi, On the way to solve lighting problems in underwater imaging, in: OCEANS ’02 MTS/IEEE, Vol. 2, 2002, pp. 1018–1024 vol.2. doi:10.1109/OCEANS.2002.1192107.
- (10) O. Pizarro, H. Singh, Toward large-area mosaicing for underwater scientific applications, IEEE Journal of Oceanic Engineering 28 (4) (2003) 651–672. doi:10.1109/JOE.2003.819154.
K. J. Morris, B. J. Bett, J. M. Durden, V. A. I. Huvenne, R. Milligan, D. O. B.
Jones, S. McPhail, K. Robert, D. M. Bailey, H. A. Ruhl,
new method for ecological surveying of the abyss using autonomous underwater
vehicle photography, Limnology and Oceanography: Methods 12 (11) (2014)
- (12) Y. Y. Schechner, N. Karpel, Recovery of underwater visibility and structure by polarization analysis, IEEE Journal of Oceanic Engineering 30 (3) (2005) 570–587. doi:10.1109/JOE.2005.850871.
- (13) E. Trucco, A. T. Olmos-Antillon, Self-tuning underwater image restoration, IEEE Journal of Oceanic Engineering 31 (2) (2006) 511–519. doi:10.1109/JOE.2004.836395.
H. Singh, C. Roman, O. Pizarro, R. Eustice, A. Can,
imaging from underwater vehicles, The International Journal of Robotics
Research 26 (1) (2007) 55–74.
- (15) T. Treibitz, Y. Y. Schechner, Active polarization descattering, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2008) 385–399. doi:http://doi.ieeecomputersociety.org/10.1109/TPAMI.2008.85.
- (16) T. Nicosevici, N. Gracias, S. Negahdaripour, R. Garcia, Efficient three-dimensional scene modeling and mosaicing, Journal of Field Robotics 26 (10) (2009) 759–788. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/rob.20305, doi:10.1002/rob.20305.
- (17) A. Sedlazeck, K. Köser, R. Koch, 3d reconstruction based on underwater video from rov kiel 6000 considering underwater imaging conditions, in: OCEANS 2009-EUROPE, IEEE, 2009, pp. 1–10.
- (18) S. B. Williams, O. R. Pizarro, M. V. Jakuba, C. R. Johnson, N. S. Barrett, R. C. Babcock, G. A. Kendrick, P. D. Steinberg, A. J. Heyward, P. J. Doherty, I. Mahon, M. Johnson-Roberson, D. Steinberg, A. Friedman, Monitoring of benthic reference sites: Using an autonomous underwater vehicle, Robotics Automation Magazine, IEEE 19 (1) (2012) 73–84. doi:10.1109/MRA.2011.2181772.
C. Tsiotsios, M. E. Angelopoulou, T.-K. Kim, A. J. Davison, Backscatter compensated photometric stereo with 3 sources, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- (20) D. Akkaynak, T. Treibitz, Sea-thru: A method for removing water from underwater images, Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
- (21) S. Bazeille, I. Quidu, L. Jaulin, J.-P. Malkasse, Automatic underwater image pre-processing, in: Proceedings of the Charactersation du Milieu Marin (CMM06), 2006, pp. 16–19.
- (22) K. Iqbal, R. A. Salam, A. Osman, A. Z. Talib, Underwater image enhancement using an integrated colour model, IAENG International Journal of Computer Science 34:2.
- (23) C. Ancuti, C. O. Ancuti, T. Haber, P. Bekaert, Enhancing underwater images and videos by fusion, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 81–88.
dehazing by artificial multiple-exposure image fusion, Signal Processing 149
(2018) 135 – 147.
- (25) S. Santra, R. Mondal, P. Panda, N. Mohanty, S. Bhuyan, Image dehazing via joint estimation of transmittance map and environmental illumination (2018). arXiv:1812.01273.
J.-H. Kim, W.-D. Jang, J.-Y. Sim, C.-S. Kim,
contrast enhancement for real-time image and video dehazing, Journal of
Visual Communication and Image Representation 24 (3) (2013) 410 – 425.
D. Zhao, L. Xu, Y. Yan, J. Chen, L.-Y. Duan,
optimal fusion model for single image dehazing, Signal Processing: Image
Communication 74 (2019) 253 – 265.
- (28) R. Schettini, S. Corchs, Underwater image processing: State of the art of restoration and image enhancement methods, EURASIP Journal on Advances in Signal Processing 2010.
- (29) Y. Wang, W. Song, G. Fortino, Z. Qi, W. Zhang, A. Liotta, An experimental-based review of image enhancement and image restoration methods for underwater imaging, IEEE Accessdoi:10.1109/ACCESS.2019.2932130.
- (30) K. He, J. Sun, X. Tang, Single image haze removal using dark channel prior, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (12) (2011) 2341–2353.
- (31) C. Ancuti, C. O. Ancuti, C. De Vleeschouwer, R. Garcia, A. C. Bovik, Multi-scale underwater descattering, in: 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 4202–4207. doi:10.1109/ICPR.2016.7900293.
- (32) Y. Peng, X. Zhao, P. C. Cosman, Single underwater image enhancement using depth estimation based on blurriness, in: 2015 IEEE International Conference on Image Processing (ICIP), 2015, pp. 4952–4956. doi:10.1109/ICIP.2015.7351749.
- (33) D. Berman, T. Treibitz, S. Avidan, Diving into haze-lines: Color restoration of underwater images, in: Proceedings of the British Machine Vision Conference, BMVA Press, 2017.
J. Li, K. A. Skinner, R. M. Eustice, M. Johnson-Roberson,
generative network to enable real-time color correction of monocular
underwater images, IEEE Robotics and Automation Letters (2017) 1–1doi:10.1109/lra.2017.2730363.
- (35) F. Cozman, E. Krotkov, Depth from scattering, in: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 1997, pp. 801–806.
- (36) R. Prados, R. Garcia, J. Escartín, L. Neumann, Challenges of close-range underwater optical mapping, in: OCEANS 2011 IEEE - Spain, 2011, pp. 1–10. doi:10.1109/Oceans-Spain.2011.6003501.
- (37) P. Huber, R. E., Robust Statistics, Second Edition, Wiley, 2009. doi:10.1002/9780470434697.
J. Greinert, T. Schoening, K. Köser, M. Rothenbeck,
Seafloor images and raw
context data along AUV tracks during SONNE cruises SO239 and SO242/1
- (39) Rv sonne fahrtbericht / cruise report so242-1 [so242/1]: Jpi oceans ecological aspects of deep-sea mining, discol revisited, guayaquil - guayaquil (equador) (2015). doi:10.3289/GEOMAR_REP_NS_26_2015.
- (40) H. Zhang, Removing backscatter to enhance the visibility of underwater object, M.Sc. Thesis of NTU.