1 Introduction
Scattering media degrades images. Studies aimed at enhancing visibility focus on singleimage dehazing [8, 12], or methods that modulate properties of the illumination, such as spatiotemporal structure [7, 9, 10, 20, 13], polarization [27, 29]
. However, all these methods do not exploit an important degree of freedom: the dynamic pose of the camera.
Pose dynamics is important, because most imaging platforms move anyway. Even without a participating medium, a camera must move around to view large areas and zones behind objects and concavities [2]. Platform motion, however, needs to be efficient, covering the surface domain in the highest quality, in the shortest time. The camera needs to move, so that object regions that have not been well observed, will be efficiently recovered next. This is the next best view (NBV) concept in robot vision. Prior NBV designes assumed no participating medium, being ruled solely by object occlusions. However, a scattering medium, not only occlusions, disrupt visibility. This affects drones overflying wide hazy scenes, autonomous underwater vehicles that scan the sea floor and inspect submerged infrastructure and firefighting rovers that operate in smoke. Despite their motion and need to overcome scatter, existing systems scan scenes [3] while ignoring scattering.
This work generalizes NBV to scattering media. We achieve 3D descattering in large areas and around occlusions, through sequential changes of pose. The obvious need to move the platform in large areas and occlusions is exploited
for optimized dehazing, i.e, estimation of surface albedo. On the other hand, scattering by the medium influences the optimal changes of pose. The challenge is exacerbated when lighting must be broughtin, in deep underwater operations, tissue and indoor smoky scenes. Scattering affects object irradiance and volumetric backscatter
[10, 14], as a function of the lighting pose, not only the camera pose (Fig. 1). Usually both the camera and lighting (C&L) are mounted on the same rig. However, visibility can potentially be enhanced using separate platforms [15]. Therefore, the next best underwater view (NBUV) optimizes the next joint poses of C&L.The optimization criterion is information gain, taken from information theory. We exploit the existence of a prior rough 3D model, since underwater such a model is routinely obtained using active sonar. We demonstrate this principle in scaleddown experimentation.
2 Theoretical Background
2.1 Imaging in a medium
Consider Fig. 2b. At time , the pose of light source L
has a vector of location and orientation parameters,
. The source irradiates submerged surface patch from distance . The medium has extinction coefficient . The surface irradiance [10] at is(1) 
The component is due to direct transmission from L to , while is due to ambient indirect surface illumination. The latter is mainly created by offaxis scattering of the illumination beam. The surface illumination decreases exponentially with :
(2)  
(3) 
where is the intensity of L. The object signal is , where is the albedo at and
(4) 
Here is the distance from to camera C. At time , the pose of C is represented by a vector of parameters . The line of sight from C to patch includes backscatter , which increases [14, 29] with .
The imaged radiance [6] is
(5) 
where is noise. Note that in this paper, all the radiometric terms ( etc..) are in photoelectron units [e]. The noise has two components [23]
: photon noise and the read noise. The variance of photon noise is
. Readout noise is assumed to be signalindependent, with variance. The probability distribution function (PDF) of
is approximately Gaussian with variance:(6) 
The patches’s signaltonoise ratio (SNR) is therefore:
(7) 
In a clear medium, backscatter is negligible . Under sufficient lighting: (See 7). Thus, at the best SNR, is maximized. This is achieved by avoiding shadows [25], i.e., placing L very close to C (Fig. 1).
Underwater, placing L very close to C results in significant backscatter , which reduces in (7). To reduce backscatter, L is usually separated from . Such a separation may result in shadows. In a shadow, while the extinction of light (Eqs. 24) compounds this effect. Thus optimal setting of L underwater is nontrivial.
2.2 Next Best View
The NBV task is generally formulated as follows. Let
represent a property of the object, e.g., the spatially varying albedo or topography. A computer vision system estimates this representation,
, using sequential measurements. By , the camera has already accumulated image data , . All this preceding data is processed to yield . Let be the set of all possible camera poses. A next view is planned for time , where the camera may be posed at , yielding new data. The new data helps getting an improved estimate . The NBV question is: out of all possible views in , what is the best , such that has the best quality? Formulating this task mathematically depends on a quality criterion, prior knowledge about , and the type of camera; e.g., passive or active 3D scanner. Different studies have looked at different aspects of the NBV task [4, 22, 30]. Nevertheless, they were all designed for imaging in clear media.2.3 Information Gain
Consider a random variable
. Let be its PDF. The differential entropy [1] of is then(9) 
At time , the variable has entropy . Then, at time , new data decreases the uncertainty of , consequently the PDF of is narrowed and its differential entropy decreases . The information gain due to the new data is then [19] defined by
(10) 
Suppose
is normally distributed, with variance
and at and respectively. Then Eqs. (9,10) yield(11) 
(12) 
3 Least Noisy Descattered Reflectivity
Before underwater optical inspection [3, 5, 32] bathymetry (depth mapping) routinely done using Sonar, which penetrates water to great distances. Hence, in relevant applications, the surface topography is roughly available [3]. At close distance, optical imaging and descattering seeks the spatial distribution of surface albedo , to notice sediments, defects in submerged pipes, parasitic colonies in various environments etc.^{1}^{1}1In addition, visual data can be integrated to further enhance the topography estimation [3]. Beyond removal of bias by backscatter and attenuation, descattered results need to have low noise variance, so that fine details [28] can be detectable. This is our goal.
The C&L pose parameters are concatenated into a vector . This vector is approximately known during operation, using established localization sensors [17, 21, 32]. Moreover, the water scattering and extinction characteristics are global parameters, that can be measured insitu. Consequently, and can be preassessed for each , and surface patch index .
Using Eq. (5), descattering based on an image at is
(13) 
Due to noise in , the variance of is:
(14) 
Note that is unknown, since Eqs. (5,6) depend on the unknown . Nevertheless, it is possible to define an operating point value for , by a typical value denoted . The reason is that, per application, the typical albedos encountered are familiar: typical soil in the known region, anticorrosive paints in known familiar bridge support etc. The value of is rough, but provides a guideline. Consequently
(15) 
(16) 
Here the defined is a local quality measure, precalculated .
Using Eqs. (8,13,16), in the frame of C:
(17) 
Components and are calculated directly using Eqs. (13,16) and , and .
Mutiframe MostLikely Descattering
As described in Sec. 2.2, by discrete time , the system has already accumulated image data . The measurements have independent noise. Hence, the joint likelihood of the data is equivalent to the product of probability densities . Consequently, the loglikelihood is
(18) 
Differentiating Eq. (18) with respect to , the maximum likelihood (ML) estimator of the descattered , using all accumulated data is
(19) 
where are derived in Eqs. (13,16). The variance of this estimator is
(20) 
From Eqs. (16,20), the quality of the ML descattered reflectivity is
(21) 
Eq. (21) expresses how the variance of can be updated using new data.
4 Next Best Underwater View
After time , the next view yields information gain . Let be the set of all possible (or permissible) cameralighting poses for time . The next underwater view and lighting poses are selected from , to maximize the information gain measure ,
(22) 
We now derive in our case. Information is an additive quantity for independent measurements. Hence, information gained by enhanced estimation of over independent surface patches is
(23) 
From Eq. (12),
(24) 
(25) 
5 Path Planning
Our formalism until now has focused on optimization of the next best view, underwater. What about next best sequence of views? Indeed the formalism can be extended to path planning, beyond a single next view. The information gain from to is given by Eqs. (23,25). Similarly, the information gain of patch due to a path from to is
(26) 
Thus
(27) 
A path of C&L is . Then, in terms of information gain, an optimal path satisfies
(28) 
6 Variable Resolution
Optimal scanning should strive to provide at least the desired spatial resolution, denoted . Over a flat terrain, this requirement is easily met by constraining to be under a specific altitude. Maintaining this altitude maximizes efficiency, since then each image captures the maximal surface area, within this constraint. In a complex terrain, the trajectory altitude and projected patch resolution vary. This section describe how calculations are affected.
At time , patch is projected to an image segment at resolution . Define . If , then patch appears too small in terms of pixels, which means that camera C may be too far from . This may be a problem for patch , but be of benefit to other patches, which are observed better. We optimize the overall information gain, accounting for all patches. To keep the optimization framework, unconstrained, we took the following step. When , the patch’s variance is penalized by:
(29) 
where is a constant parameter, which we set to 10. We found that this penalty keeps C from distancing from the surface, and provided good results.
When , patch occupies more pixels than the minimum. Pixel redundancy enables digital spatial averaging, which lowers the variance of patch . The image of occupies a set of pixels . Hence, when , spatial averaging sets
(30) 
where is the modeled singlepixel variance (17).
7 Discrete Domain Solution
This section discusses how the NBUV method is applied using standard discrete 3D object representations common in computer graphics.
Sonar usually results in a 3D mesh representation of the surface [3]. A 3D surface is parameterized by a triangulated mesh , where are the mesh’s faces and are the edges. To reduce complexity, the surface is divided into nonuniform segments , where is the index of the segment. Each segment contains several patches within its area . Let be the number of patches in the segment. In most cases, it is convenient to use the faces of as the segments, i.e. , where is the ’th face. At time segment is projected to a set of pixels as illustrated in Fig. 4bc. For a triangulated mesh representation, the support of on the image plane is a triangle. The size of is .
7.1 Fusing Measurements
The goal of estimation process of is to produce a texture map for . Each mesh triangle is linearly mapped onto a corresponding triangle in a ’texture’ image, (Fig. 4e). If is imaged once at , no fusing is required. Then, is mapped to at image (Fig. 3). However, if is imaged more than once, all its imaged segments are fused. Fusing is done as described in sec. 3. The fusing process weights the image formation noise as well as the scanning resolution. For a detailed description of the fusing process see Appendix.
7.2 Information Gain Calculation
Evaluating the information gain over the surface requires storing the uncertainty of all patches. To avoid this the information gain is calculated per segment. For segments, the information gain over the entire surface is calculated using:
(31) 
Eq. (31) is derived from Eq. (25), except that now the quality measure is attributed to the segments:
(32) 
where is the mean uncertainty of pixels belonging to in (Eq. 17):
(33) 
As in Eq. (29), the uncertainty is factored by ’s scale at time : , where is the effective resolution of at time . Finally we calculate the information gain on the entire surface using Eq. (25): where following Eq. (21). Eq. (31) assumes that the variance in noise affecting patches in each segment is small. This assumption improves with a finer parametrization of the surface.
8 Simulations
We use the scattering model of [10], in a homogeneous medium. This model renders the imaged radiance in Eq. (5). The surface is Lambertian. The validity of a Lambertian assumption increases underwater [26, 31]. We set [e] and a max well of 24,000 [e] in a perspective camera C based on Canon 60D camera data, while L is a spot light with no lateral falloff.
Fig. 5 illustrates a simple scenario. A straight path 40cm above a surface is set. The medium’s parameters are , while the anisotropy parameter is in the HenyeyGreenstein phase function [10, 11]. C&L start from . The initial LC baseline is 2cm. Such a baseline would be fine in a clear medium. Underwater however, this baseline results in significant backscatter (Fig. 5c). Hence, a 12cm baseline is used. The scan consists of 8 views, where is spread uniformly across the path.
In a traditional path, , while L is points to the center of C’s field of view on the surface. To the best of our knowledge, prior dehazing methods ignore SNR variability in image sequences. Thus, simple averaging is used for in a traditional process. When we ran our optimization, is selected out of a set of 32 possible locations with different radii around . In addition, in each location there are 9 orientations of , facing nadir, or off nadir to each lateraldirection. Then, NBUV is chosen out of a total number of by exhaustive search.
Fig. 5b shows the two C&L trajectory. Looking at the simple geometry under scan, it is clear that the illumination should be from the opposite side of the hills, when the camera passes above them. This is evident in (Fig. 5de), where the view chosen by the NBUV method is lit better than the fixed baseline one.
Fig. 6. illustrates how NBUV is used to determine an optimal scanning path. A cube having 28cm edge length is placed on a flat surface in scattering medium (
). A trivial scanning path 84cm above the surface is set over the scene. The path consists of 6 uniformly distributed views. To avoid backscatter the baseline is
. Let denote the scanning path. Optimization was initialized with the trivial path. Optimization was performed using 20 iterations of Matlab’s direct search function [16] over the ’s 60 degrees of freedom.^{2}^{2}2Rotation about the axis was excluded for both and .In the initial trivial scan, the left and right faces (see Fig. 6 red arrow) are occluded as passes over the cube. In addition, shadow from the cube heavily degrades the left side of the surface. Applying our method, and are moved to cover the occluded regions (Fig. 6 views 34). Note that the front and back faces of the cube (blue arrow in Fig. 6) are scanned in better resolution. This is owed to views 2 and 5 (Fig. 6 ) that changed perspective to scan these faces. In total, our method reduced the total estimation uncertainty by 30% over the trivial setup.
9 Experiment
We built a model having an arbitrary nontrivial topography submerged in water (Fig.7). Emulating a sonar scan, the surface model was prescanned using a depth camera in a clear medium to produce a 3D mesh. Mixing milk in the water then produced single scattering conditions that fit our image formation model [18]. A machine vision camera was submerged in a watertight housing. The submerged surface was illuminated using a Mouser Electronics, ’Warm White’ 3000K LED. The intrinsic parameters of the camera and the illumination angle of the LED were both calibrated underwater to account for the water’s refractive index. A robotic 2D plotter was used to move the light and camera to their approximate locations in space.
The medium’s parameters ( and ) were extracted insitu. The camera and LED light were placed in a known state above a flat white sheet (). Optimization was used to extract and , namely:
(34) 
9.1 Numerical Conditioning
Using Eq. (13) directly to recover may lead to unstable results. Shadowed image regions where poorly approximates the topography, may cause to be lower than desired. Lower stem from deviation from the imaging model, e.g. the scattering order in. Since is the denominator of Eq. (13), may exceed 1 in these regions. Therefore, is stabilized by:
(35) 
where , , and are Gaussian convolution kernels, and is an alpha mask. whenever . An example of the recovery result can be seen in Fig. (8).
9.2 Results
In a standard fixedbaseline scheme, the camera and light are initialized in a certain state above the model (Figure 9 a). Ten uniformlyspaced camera locations are used. Due to mechanical limitations, we allow the camera and light to move only horizontally. The elevation of the camera and light is set to 20cm above the object, and both are facing directly down.
In the fixedbaseline configuration the light is placed in a fixed distance from the camera so as to avoid significant backscatter. In our NBUV setup, per we allow the light to be placed in 40 fixed states around the location of the camera . We use our method to simulate the expected IG from each optional cameralight state , which in this case is chosen out of . Both configurations are initialized at .
A total of 11 views of the surface are taken in both fixedbaseline and NBUV configurations. The recovered albedo images were mapped to the 3D mesh of the surface (Fig. 9bc). Our method provides a better overall surface estimation over the fixedbaseline configuration. For the particular surface we used, that meant lighting dark spots on the left and center of the surface, where the shadows were cast by the fixed light due to the topography. Fig. 9d AC show areas where the expected estimation noise is significantly lower, when using NBUV. The total estimation uncertainty over the entire surface is lower. However, not all surface patches benefit Fig. 9d D.
10 Conclusions
The paper defines the nextbestview task, as well as optimized path planning, by taking into account scattering effects. The method optimizes viewpoints so the descattered albedo is least noisy, allowing resolution of fine details. It generalizes dehazing to scanning multiview platforms. We believe this approach can make drone imaging flights and underwater robotic imaging significantly more efficient when operating in poor visibility conditions due to scattering effects. Further work can use more comprehensive scattering models, image statistics priors and pathlength penalties. Moreover, the principle we proposed can benefit from optimization algorithms that are more efficient, as the number of degrees of freedom increases. The work here can be generalized to multiple agents (cameras) cooperatively scanning the scene.
Appendix A: Discrete Domain Fusing
The estimation process produces a texture map for . Each mesh triangle is linearly mapped onto a corresponding triangle in a ’texture’ image, (Fig. 4e). If is imaged once at , then on image . On the other hand, if is imaged more than once, e.g. at , all imaged segments (triangles) in corresponding images are fused. Each segment may be imaged with different resolutions. is allocated in a new image at resolution . The supports of in both and are scaled by and transformed to align with (see Fig. 4d). Once aligned, are fused using Eq. (19). used in Eq. (19) is factored according to as described in sec. 6.
References
 [1] N. A. Ahmed and D. Gokhale. Entropy expressions and their estimators for multivariate distributions. IEEE Trans. on IT, 35:688–692, 1989.
 [2] D. Anguelov, C. Dulong, D. Filip, C. Frueh, S. Lafon, R. Lyon, A. Ogale, L. Vincent, and J. Weaver. Google street view: Capturing the world at street level. Computer, (6):32–38, 2010.
 [3] R. Campos, R. Garcia, P. Alliez, and M. Yvinec. A surface reconstruction method for indetail underwater 3d optical mapping. Int. J. Robot. Res., vol. 34, pp. 64–89, 2014.
 [4] S. Chen and Y. Li. Vision sensor planning for 3d model acquisition. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Trans. on, 35(5):894–904, 2005.
 [5] E. Coiras and J. Groen. Simulation and 3D reconstruction of sidelooking sonar images. INTECH Open Access Publisher, 2009.
 [6] C. K. Cowan, B. Modayur, and J. L. DeCurtins. Automatic lightsource placement for detecting object features. In Applications in Optical Science and Engineering, pages 397–408. SPIE, 1992.
 [7] F. R. Dalgleish, A. K. Vuorenkoski, and B. Ouyang. Extendedrange undersea laser imaging: Current research status and a glimpse at future technologies. Marine Technology Society Journal, 47(5):128–147, 2013.
 [8] R. Fattal. Single image dehazing. In ACM TOG, volume 27, page 72. ACM, 2008.
 [9] I. Gkioulekas, A. Levin, F. Durand, and T. Zickler. Micronscale light transport decomposition using interferometry. ACM TOG, 34(4):37, 2015.

[10]
M. Gupta, S. G. Narasimhan, and Y. Y. Schechner.
On controlling light transport in poor visibility environments.
In
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on
, pages 1–8. IEEE, 2008.  [11] V. I. Haltrin. Oneparameter twoterm henyeygreenstein phase function for light scattering in seawater. Applied Optics, 41(6):1022–1028, 2002.
 [12] K. He, J. Sun, and X. Tang. Single image haze removal using dark channel prior. IEEE T.PAMI, 33(12):2341–2353, 2011.
 [13] F. Heide, L. Xiao, A. Kolb, M. B. Hullin, and W. Heidrich. Imaging in scattering media using correlation image sensors and sparse convolutional coding. Optics express, 22(21):26338–26350, 2014.
 [14] J. S. Jaffe. Computer modeling and the design of optimal underwater imaging systems. EEEJ. Oceanic Eng., 15(2):101–111, 1990.
 [15] J. S. Jaffe. Multi autonomous underwater vehicle optical imaging for extended performance. In OCEANS 2007Europe, pages 1–4. IEEE, 2007.
 [16] T. G. Kolda, R. M. Lewis, and V. Torczon. A generating set direct search augmented lagrangian algorithm for optimization with a combination of general and linear constraints. Sandia National Laboratories, 2006.
 [17] F. Maurelli, S. Krupiński, Y. Petillot, and J. Salvi. A particle filter approach for AUV localization. In OCEANS 2008, pages 1–7. IEEE, 2008.
 [18] S. G. Narasimhan, M. Gupta, C. Donner, R. Ramamoorthi, S. K. Nayar, and H. W. Jensen. Acquiring scattering properties of participating media by dilution. ACM TOG, 25(3):1003–1012, 2006.
 [19] K. H. Norwich. Information, sensation, and perception. Academic Press San Diego, 1993.
 [20] M. O’Toole, R. Raskar, and K. N. Kutulakos. Primaldual coding to probe light transport. ACM TOG, 31(4):39, 2012.
 [21] L. Paull, S. Saeedi, M. Seto, and H. Li. AUV navigation and localization: A review. EEEJ. Oceanic Eng., 39(1):131–149, 2014.
 [22] R. Pito. A solution to the next best view problem for automated surface acquisition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 21(10):1016–1030, 1999.
 [23] N. Ratner and Y. Y. Schechner. Illumination multiplexing within fundamental limits. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pages 1–8. IEEE, 2007.
 [24] C. Roman and H. Singh. A selfconsistent bathymetric mapping algorithm. J. of Field Robotics, 24(12):23–50, 2007.
 [25] S. Sakane and T. Sato. Automatic planning of light source and camera placement for an active photometric stereo system. In Robotics and Automation, 1991. Proceedings., 1991 IEEE International Conference on, pages 1080–1087. IEEE, 1991.
 [26] Y. Y. Schechner, D. J. Diner, and J. V. Martonchik. Spaceborne underwater imaging. In Proc. IEEE ICCP, pages 1–8. 2011.
 [27] T. Treibitz and Y. Y. Schechner. Active polarization descattering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(3):385–399, 2009.
 [28] T. Treibitz and Y. Y. Schechner. Recovery limits in pointwise degradation. In Computational Photography (ICCP), 2009 IEEE Inter. Conf. on, pages 1–8. IEEE, 2009.
 [29] T. Treibitz and Y. Y. Schechner. Turbid scene enhancement using multidirectional illumination fusion. Image Processing, IEEE Transactions on, 21(11):4662–4667, 2012.
 [30] S. Wenhardt, B. Deutsch, J. Hornegger, H. Niemann, and J. Denzler. An information theoretic approach for next best view planning in 3d reconstruction. In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, volume 1, pages 103–106. IEEE, 2006.
 [31] H. Zhang and K. J. Voss. Bidirectional reflectance study on dry, wet, and submerged particulate layers: effects of pore liquid refractive index and translucent particle concentrations. Applied optics, 45(34):8753–8763, 2006.
 [32] B. Englot and F. S. Hover. Threedimensional coverage planning for an underwater inspection robot. The International Journal of Robotics Research, 32(910):1048–1073, 2013.
Comments
There are no comments yet.