Waterdrop Stereo

by   Shaodi You, et al.

This paper introduces depth estimation from water drops. The key idea is that a single water drop adhered to window glass is totally transparent and convex, and thus optically acts like a fisheye lens. If we have more than one water drop in a single image, then through each of them we can see the environment with different view points, similar to stereo. To realize this idea, we need to rectify every water drop imagery to make radially distorted planar surfaces look flat. For this rectification, we consider two physical properties of water drops: (1) A static water drop has constant volume, and its geometric convex shape is determined by the balance between the tension force and gravity. This implies that the 3D geometric shape can be obtained by minimizing the overall potential energy, which is the sum of the tension energy and the gravitational potential energy. (2) The imagery inside a water-drop is determined by the water-drop 3D shape and total reflection at the boundary. This total reflection generates a dark band commonly observed in any adherent water drops. Hence, once the 3D shape of water drops are recovered, we can rectify the water drop images through backward raytracing. Subsequently, we can compute depth using stereo. In addition to depth estimation, we can also apply image refocusing. Experiments on real images and a quantitative evaluation show the effectiveness of our proposed method. To our best knowledge, never before have adherent water drops been used to estimate depth.



There are no comments yet.


page 1

page 2

page 5

page 7

page 8

page 9

page 10

page 11


Shape from Water Reflection

This paper introduces single-image 3D scene reconstruction from water re...

Monocular Retinal Depth Estimation and Joint Optic Disc and Cup Segmentation using Adversarial Networks

One of the important parameters for the assessment of glaucoma is optic ...

Shallow Water Bathymetry Mapping from UAV Imagery based on Machine Learning

The determination of accurate bathymetric information is a key element f...

Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset

Underwater images suffer from color distortion and low contrast, because...

Adaptive Surface Normal Constraint for Depth Estimation

We present a novel method for single image depth estimation using surfac...

Are We Ready for Unmanned Surface Vehicles in Inland Waterways? The USVInland Multisensor Dataset and Benchmark

Unmanned surface vehicles (USVs) have great value with their ability to ...

Photometric Stereo in Participating Media Considering Shape-Dependent Forward Scatter

Images captured in participating media such as murky water, fog, or smok...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Depth from real images is crucial information for many applications in computer graphics. A numerous methods have attempted to extract depth with various cues ( e.g., shading [Ikeuchi and Horn 1981], multi-views [Hartley and Zisserman 2004], defocus [Favaro and Soatto 2005]). In contrast to all existing methods, in this paper, we explore a new possibility of using water drops adhered to window glass or a lens to estimate depth.

Water drops adhered to glass are transparent and convex, and thus each of them acts like a fisheye lens. As shown in Fig. Waterdrop Stereo.a, water drops’ locations are normally scattered in various regions in an image. If we zoom in, each of the water drops displays the same environment from its own unique point of view. Due to the proximity to each other, some have similar visual content, but some can be relatively different, particularly when the water drops are apart in the image. Therefore, if we can rectify each of the water drops, we will have a set of images of the environment from relatively different perspectives, opening up the possibility of extracting the depth from the water drops, which is the goal of this paper.

To be able to achieve the goal, we need to rectify each water drop, so that planar surfaces look flat. Rectifying water drops, however, is problematic. In contrast to existing work in catadioptic imaging, which assumes the geometry of the sphere is known a priori, water drops shapes can vary in a considerable range and highly non-axial. To resolve this problem, we need to examine two physical properties of water drops. First, a static water drop has constant volume, and its geometric 3D shape is determined by the balance between the tension force and gravity. Because the water drop is in balance, it minimizes the overall potential energy, which is the sum of the tension energy and the gravitational potential energy. Based on this property, we introduce an iterative method to form the water-drop geometric shape. However, from a single 2D image, the volume cannot be directly obtained, since we do not know the thickness of the water drop. To solve this, we use the second physical property, i.e., water-drop appearance depends on their geometric shape and also the total reflection. The total reflection occurs near the water drop boundaries and triggers a dark band. We found that a water drop with a greater volume will have a wider dark band. Thus, we introduce a volume-varying-iteration framework that estimates the volume that best fit to the appearance. Having known the complete 3D shape of water drops, we perform the multiple view stereo through backward raytracing and triangulation. Finally, we rectify the warped images. Having obtained the depth and rectified images, one of our applications is image refocusing. Figure Waterdrop Stereo shows the general pipeline of our proposed method.


In this paper, we introduce a new way to recover depth using water drops from a single image. We also propose a novel method to reconstruct the 3D geometry of water drops by utilizing the minimum surface energy and total reflection. Aside from estimating depth, we also apply image refocusing through the information provided by water drops. Furthermore, the proposed non-parametric, non-axial algorithm can be generally applied to catadioptic imaging system.

The rest of the paper is organized as follows. Section 2 discusses related work in depth estimation, water modeling and shape from transparent objects. Section 3 explains the theory behind the water-drop physical properties. Section 4 introduces the methodology of the 3D shape estimation, stereo, as well as water-drop image rectification. Section 5 shows the three applications on stereo, image refocusing and image stitching. Section 6 shows the experimental results and evaluation. Section 7 concludes this paper.

2 Related Work

Three dimensional reconstruction of opaque objects from a single image have been explored for decades: Shape from shading [Ikeuchi and Horn 1981], shape from texture [Malik and Rosenholtz 1997], shape from defocus [Favaro and Soatto 2005] and piece-wise planarity [Horry et al. 1997]. A few approaches using silhouettes have been proposed to reconstruct a bounded smooth surface [Terzopoulos et al. 1988, Hassner and Basri 2006, Prasad and Fitzgibbon 2006, Joshi and Carr 2008, Oswald et al. 2012, Vicente and Agapito 2013]. The method of [Prasad and Fitzgibbon 2006] reconstructs a surface with minimum area, and [Oswald et al. 2012] proposes its speed-up version. However, none of these methods directly aim to model water or other transparent liquid from a single image.

Methods of [Garg and Nayar 2007, Roser and Geiger 2009, You et al. 2013, You et al. 2015] introduce airborne and adherent raindrop modeling. Their goal is to detect and remove raindrops, and not to reconstruct 3D structure from raindrops. [Roser et al. 2010] exploits water drop surface fitting using B-splines and silhouettes using 1D splines. [Morris 2004, Tian and Narasimhan 2009, Oreifej et al. 2011, Kanaev et al. 2012] exploit underwater imaging. They assume water surfaces are dynamic and dominated by transitions of waves, which do not suit to our specific problem.

Stereo and light field using perspective cameras with extra mirrors and lenses have also been explored. [Baker and Nayar 1999, Taguchi et al. 2010] propose algorithms using sphere mirrors. [Levoy et al. 2004] introduces arrays of planar mirrors. [Swaminathan et al. 2001, Ramalingam et al. 2006] address the use of axial cameras, and [Taguchi et al. 2010] extends the work to axial-camera arrays. Later, [Agrawal and Ramalingam 2013] proposes methods to automatically calibrate the system. All of these methods, however, assume radial or planar symmetry of the media (mirror/lens), which are not satisfied in the case of water drops, since water drops are highly non-axial.

3 Modeling

Theoretical background and modeling of water drops are discussed in this section. We first explain briefly the image formation, showing the correlations between the environment, water-drops and the camera. Subsequently, we model the raindrop 3D geometry, particularly the concept of minimum energy surface. Based on the image formation and the raindrop geometry, we study the total reflection inside water drops, which is necessary to determine the water-drop’s volume. All these aim at water-drop image rectification.

3.1 Image Formation

Fig. 1 illustrates our image formation. Rays reflected from the environment pass through two water drops before hitting the image plane. Unlike in the conventional image formation, the passing rays are refracted by water drops, where each water drop acts like a fisheye lens that warps the images. Assuming we have a few water drops that are apart to each other, the imageries of the water drops will be slightly different to each other, even though the environment is identical, as shown in Fig. 1.

From the illustration in Fig. 1, we can conclude that the image captured by the camera through water drops is determined by three interrelated factors: (1) the depth of the environment, (2) the 3D shape of water drops, which determine how light rays emitted from the environment are refracted and, (3) camera intrinsic parameters, which are assumed to be known. Therefore, to be able to recover the depth of the environment, we need to obtain the 3D shape of water drops.

3.2 Minimum Energy Surface

Figure 2: Parameters of a water drop.

(a) In the global coordinates, the geomertic shape is determined by the gravity. (b) The parameters in the camera coordinates. Point A is a two phase point (water-air), where the tensor force balances the pressure. Point B is a three phase point (water-air-material).

For the purpose of exploring the minimum energy surface to estimate the 3D shape of a water drop, we introduce a local coordinate system of the camera, which is illustrated in Fig. 1.a and d. In the coordinates, water drop 3D shape can be parameterized as:


where indicates the raindrop area attached to glass. is any point in the raindrop area and is the height.

A static water drop has a constant volume, and its 3D shape minimizes the overall potential energy , which can be written as:


where is the tension energy, and is the gravitational potential energy, is the volume. Therefore, to solve the geometry of a raindrop, we need to find the surface .

Figure 2 illustrates the 3D shape of a water drop. Point is a two-phase (water-air) balanced point, where surface tension balances pressure . Point B is a three phase point (water-air-material), where the tension is from both water and adhesion surface . These two types of tension balance the gravity, .

With the parameterized surface, we can write the surface tension energy as:


where is the surface tension index for water, denotes a unit surface area and is the gradient [Feynman et al. 2013]. As we can see, the tension energy is proportional to the area of the surface.

The gravitational potential energy can be expressed as:


where , and denote the angles between the coordinates and the gravity correspondingly. is the gravity and is the density of water, which are generally known. Moreover, we can add a constraint that:


Therefore, the parameterized surface is estimated by minimizing the overall potential energy determined by Eq. (2), (3) and (4) with the constraints of constant volume in Eq. (5). Figure 3 shows some examples of the surface estimated by using the technique. We will discuss the algorithm in detail in Section 4.

Figure 3: Minimum energy surfaces given the area and volume.

(b) The minimum energy surface when the volume coefficient , defined in Eq. (16). (c) . Assuming the gravity is along axis.

Note that, to uniquely determine the geometry of a water drop, we need to know both the 2D area where the water drop attached to glass, , and the volume . While the former can be directly inferred from the image, the latter is not straightforward to obtain. The subsequent section will discuss how we can possibly determine the volume.

3.3 Water-Drop Volume from Dark Band

As we can see in Fig. 1.c, the basic idea of our volume estimation is based on the dark band at the boundary of a water drop. We found that the wider the dark band the larger the volume of the water. This section discusses this idea further.

Figure 4: Refraction by a water drop.

(a) A ray coming from the environment is refracted twice before reaching the camera . (b) Backward ray tracing, where a virtual ray emitted from the camera passes through the same path as in (a). For simplification, we remove the refraction between flat surfaces by moving the camera position to . (c) When is greater than the critical angle, light will not be transmitted but reflected inside. (d) Two polarized components, and , of the incidence ray.

Refraction model

Fig. 4.a illustrates a ray coming from the environment is refracted twice before reaching the camera. Since, we are only interested in the rays that can reach the camera, we can use backward raytracing to know the paths of the rays. Moreover, we assume that the glass is so thin that we can ignore the refraction due to the glass. Because we are mostly interested in the refraction on the curved surface, to further simplify the model, we remove the refraction between the flat surface by moving the camera from position to , as shown in Fig. 4.b. For approximation, when the incident angle is small, we can consider that the perpendicular distance from the camera to the refraction plane, denoted as , is changed to , where and are the refractive indices of water and air, respectively. Detailed derivation of the position of is discussed in Appendix A.

Dark Band and Total Reflection

The dark band at the boundary of a water drop is caused by light coming from the environment reflected back inside the water, instead of being transmitted to the camera. This phenomenon is known as the total reflection, and applies to all light rays whose relative angles to the water’s surface normal are larger than the critical angle, denoted as .

To analyze the correlation between the critical angle with the water-drop 3D shape , we refer to Snell’s law, which indicates the critical angle:


As indicated in Fig. 4.c, we denote the surface normal as , which can be derived from as: where, and denotes the norm.

The angle between the surface normal and the -axis denoted as is the sum of the incidence angle of water, , and the angle between the incidence ray and -axis :


where is determined by the position of the camera and the position of the refraction. Considering the component of the normal , also defined as: , we know that when , the corresponding water drop area is totally dark. For instance, when is 0, and is approximately , we have


Where is the denotation for the critical value. Figure 5 shows some examples of synthetically generated dark bands. As we can observe, a greater volume of the water drop indicates a wider dark band. Therefore, to infer the water drop volume from the dark band is possible.

Figure 5: Dark bands.

(a) Water drop 3D geometry. (b) The component of surface normal. (c) The dark bands. Second row: A greater water drop volume, wider dark band.

Figure 6: Selecting water drops from a single image.
Figure 7: Iteration of water drop 3D shape with a fixed volume.

Dark Band and Fresnel Equation

While the dark band can be theoretically inferred from the water-drop geometry, detecting them from an image is nontrivial. Due to the sensor noise and the leak of light111Back light from camera side which is reflected by the glass plate and goes into the camera; and the interreflection inside a water drop., dark bands are not totally dark. Moreover, there are textures in the environment that can be darker than dark bands. To resolve the problem, we employ the Fresnel equation and formulate the brightness values near the critical angle.

The refraction coefficients, denoted as and , for two orthogonal polarized components for the light rays traveling from air to water are written as:


where and are depicted in Fig. 4.d. In our case, we assume the light from the environment is not polarized, and thus the overall refraction coefficient is


Concerning the dark bands, we are interested in two critical conditions. First, when the incidence angle is close to 0. In such a condition, , , and consequently:


Substituting the value for water gives us , and thus we have .

Second, when incidence angle is close to , (the locations near the dark band). Hence, , , as a result:


Similar to the first condition, substituting the value , we obtain:


Considering is connected with by Snell’s law, and the connection between and (Eq. (7)), we establish the connection between the component of surface normal and the refraction coefficient:


This relation between the image brightness and surface normals give us a constraint of a contrast between the dark band and other parts inside the waterdrop image. Using this, the water-drop volume can be inferred using the brightness close to the dark band region. Details are provided in Sec. 4.1

4 Methodology

In this section, the detailed algorithm for rectifying images of water drops and estimating depth is introduced. As illustrated in Fig. Waterdrop Stereo, it has three main steps: (1) water drop 3D shape reconstruction by minimizing energy surface, (2) Multi-view stereo and (3) Image rectification.

Water Drop Detection

Water-drops appearance is highly dependent on the environment, and thus detecting them is not trivial. Fortunately, in our case, we can assume water drops are in focus, and thus the environment image is rather blurred. Hence, we can utilize edge detection to locate water drops, as illustrated in Fig. 6. Having located water drops, we select those that are sufficiently large (e.g., the diameter is greater than 300 pixels). This is to ensure that rectified images are not too small.

4.1 Water Drop 3D Shape Reconstruction

Mesh representation and Initialization

To reconstruct the 3D shape of water drops, we first represent the water surface using a parameterized mesh. Referring to Eq. (1), we can describe a surface as: where are the location of a pixel in the water drop area. Accordingly, the area of is defined as: where 1 is the unit for a pixel’s area.

At this initialization, we do not know the volume of the water drop, and make an initial guess based on:


where is the volume coefficient and set to 0.30 as default. Based on the equation, with fixed, when the area increases in square rate, the volume will increase in cubic rate. This means when performing scale change for the water drop surface, remains the same value. Figure 3 gives some examples how is related to the reconstructed surface.

We initialize the mesh as a cylinder by defining:


Figure 7 shows an example of the initial surface.

Iteration with fixed volume

We solve the constrained minimum energy surface using the iterative gradient descent. For iteration we update the mesh in three steps: tensor energy update, gravity update, and volume update. This strategy is an extension of the smooth surface reconstruction proposed by [Oswald et al. 2012].

  • Step 1: Tension energy update. It attempts to construct the surface as smooth as possible:


    where controls the update speed with as default, is the tension coefficient in physics. We define:


    where is the divergence.

    In our settings, a water drop has size around or approximately 500 in pixel, and thus the size of a unit pixel is about . Tension coefficient for water in room temperature is .

  • Step 2: Gravity update. It intends to increase the height for the mesh points that lower the potential energy:


    where is the geometry centroid of the water drop and defined as:


    Substituting and , we found when the adherent surface tilt is small, the waterdrop geometry is mainly dominated by the tension energy.

  • Step 3: Volume update. Having updated the tension and gravity in the previous two steps, this step checks the current volume and compares it with the targeted volume V, and then re-adjusts the volume by adding the same value to all the mesh points:


After each iteration, we check the absolute change of volume: , and set the convergence threshold to as default, where is the targeted volume. We run the iterations up to 4000 times. Figure 7 shows the progress of the estimated volume.

Iteration with varying volume

Figure 8: Estimating the width of a dark band

Left: Underestimated volume where the darkband(shown in blue) is mostly covering the dark pixels. Middle: Correctly estimated volume, where the darkband cover about the same proportion of dark and bright pixels. Right: Overestimated volume, where the darkband is mostly covering bright pixels.

Having estimated the surface with fixed volume, we can obtain the surface normals and evaluate the brightness values near the dark band and gradually adjust the volume.

In Section 3.3, we have built the relation between the surface normal and the luminance (Eq. (15)). According to Eq. (8), when the component of surface normal is smaller than the critical value , the pixel brightness should be close to 0. Yet, when is slightly greater than , the pixel brightness should follow Eq. (15). Thus, if we compute the average refraction coefficient, denoted as , for pixels whose normals are within the range , we can do local linear expansion of Eq. (14) and obtain:


Specifically, we set , and thus . Consequently, the average brightness of the band, , is:


where is the average brightness of the non water-drop areas.

In Fig. 8, we sample the brightness of the estimated band. As shown in Fig. 8.a, when the volume is underestimated, the dark band is wider than the real one, resulting less bright pixels. On the contrary, when the volume is overestimated, the dark band is narrower than the real one, resulting in brighter pixels. With the above analysis, we update the volume every 400 iteration (as default) for the fixed volume algorithm introduced previously:


where is the sampled brightness, is the targeted brightness value, and is a weighting coefficient which controls the updating speed and is set to 0.5 as default. We demonstrate the accuracy of the estimation by experiments in Sec. 5.1.

Figure 9: Illuminance compensation of water drop images.

4.2 Waterdrop Stereo

Once the geometry of each water drop is obtained, we perform multi-view stereo to estimate depth. Unlike multiple view stereo based on perspective or radial catadioptric cameras, where the projection of each camera can be modeled using a few parameters, unfortunately the water drops are non-parametric and non-axial. To overcome this problem, we propose a raytracing based triangulation. (More detail about the implementation could be found in Appendix B.)

Figure 10: Ray-tracing based triangulation.

For a set of corresponding points , the backward raytracing finds the set of rays in the space. The position of the point in space is where the sum of Euclidean distance to all the rays is minimized.

As illustrated in Fig. 10, a point has its corresponding points on other water drops, denoted as , where is the index of water drops. By knowing the geometry of each water drop, we can find the location where the refraction happens in each water drop, denoted as . Specifically, to obtain the location of , we need: (1) The position of the camera , which is known a priori, and thus can be converted to equivalent position , as illustrated in Fig. 4.b. (2) The water drop geometry, which is already estimated.

At each refraction location, the incident angle is obtained using . The surface normal is known through geometry estimation. Through Snell’s law, we can obtain the outbound angle . The outbound ray is formulated as:


Hence, now we can perform the classical triangulation as illustrated in Fig. 10. Given a set of corresponding points on each waterdrop , we could obtain its outbound ray of fraction the index of water drops. The triangulation aim to find the position of point , which minimizes the Euclidean distance to all the rays:


The detailed derivation refers to [Szeliski 2010]. The depth of each point is its component. Figure 13 is an example of the depth map in water drop image.

4.3 Rectification of Water Drop Image

Having estimated the depth map on each water drop, we unwarp the distorted water drop image. Referring to the pinhole camera model (Fig. 1), for each water drop image, a space point with projection at is projected to . Figure 11 and Figure 12 shows results of the rectified water drop images.

According to Eq. (9) and Eq. (10), with the water drop geometry obtained, we can compensate the brightness values according to the refractive coefficient . Figure 9 shows an example of the brightness compensation.

Figure 11: Quantitative evaluation of water surface reconstruction and rectification.
Figure 12: Rectification of real water images.

The first two row are water-drop images taken by ourselves. The first three columns in Row 3 and 4 are downloaded from the Internet. And the last three images are taken by sperical mirrors by [Taguchi et al. 2010]. There is slant between the background and the water drop in some data, however the rectified image is not necessary to be rectangles.

Figure 13: Stereo using two dewarped water drop images.
Figure 14: Stereo using real waterdrop images.

The stereo uses 2-4 waterdrops, however only two of the dewarp results are shown.

5 Experiments and Analysis

We conduct experiments using both synthetic data and real data to examine and analyze the performance of our method. In the experiment, we evaluate the estimated 3D shape of the water drops, and the depth estimation. Without loss of generality, our method can also be used to handle axial mirror/lens models [Taguchi et al. 2010].

5.1 3D Shape Reconstruction and Image Rectification

To evaluate the accuracy of the 3D shape of water drops, we utilize synthetic data. We cannot use real data, since automatic 3D acquisition systems, such as a laser range finder, cannot be used to estimate the 3D of water. We use real images for the evaluation of the image rectification. Some of the real images are taken by ourselves and some are downloaded from the Internet.

Figure 11 shows the generated synthetic water drops with a variety of boundaries. A quantitative evaluation is performed by comparing our estimation with the ground truth 3D shape. The error is normalized to the percentage of the scale of the water drops. As one can observe, the reconstruction error is less than 3% even for the most irregular water drops.

Figure 12 shows a collection of the rectified water drop images from real data. The input image is cropped for better visualization, yet the camera center is not at the cropped image center.

Without loss of generality, our proposed method can also be used for axial models, which is considered as a specific case when the water drop is exactly radially symmetric. The last two rows of Fig. 12 show the results on spherical mirrors [Taguchi et al. 2010]. Because dark band estimation is not applicable on mirrors, we specify the volume parameter for the mirrors. And refraction is changed to reflection in raytracing.

We implemented our method in Matlab and measured the computational time without parallelization. For water 3D shape estimation, the time varies depending on the water drop volume and the mesh resolution. Table 1 shows the computation time of varying volume and fixed mesh resolution. And Table 2 shows the time of varying mesh resolution. At typical case, the resolution of mesh is set to 200200 and the reconstruction time is about 10s. Note that, because each of the water drop reconstruction are performed separately, we can simply parallelize each of the tasks. Thus, the overall computation time does not increase with the number of water drops.

The mesh resolution is fixed to 200200.

Table 1: Computation time for water drop 3D reconstruction with varying volume.

The volume is fixed to .

Table 2: Computation time for water drop 3D reconstruction with varying mesh resolution.

5.2 Depth Estimation

We use both synthetic and real water drop data to demonstrate our stereo method. Furthermore, Our non-parametric, non-axial method could be applied to axial-mirror/lens model as well.

Figure 13 shows the generated synthetic data from the Middlebury data set. As can be seen, the depth estimation result highly resembles the ground truth with only errors occur at object’s boundaries.

The result on the real waterdrop images are shown in the first 4 rows of Fig. 14. For the first row, the data is taken using a micro-lens, and the resolution for each waterdrop is more than 600 pixels. The second and third data is taken by normal commercial lens, the resolution for each waterdrop is about 200-300 pixels. As shown, our proposed method can generally recover the depth structure. However, we find the bottleneck of our method is in finding the corresponding points between water drop images. Since, when the camera is zoomed-in to focus on the details of water drops, the sensor noise and the dust on the plate is no longer negligible, which adversely affect the accuracy and stability of both sparse and dense corresponding methods.

As mentioned previously, our non-parametric, non-axial method can be applied to axial-mirror/lens model. Fig. 15 shows our stereo estimation results. Since the image has sufficiently high resolution and less noise, the dense matching is significantly stable, and consequently it enables us to estimate the depth more accurately.

Figure 15: Stereo using axial mirror images.

Our non-parametric, non-axial method can be applied to axial data as well. The first column is the result using the original data provided by [Taguchi et al. 2010], the second and third column are based on low-resolution data, because the original data is not provided.

6 Discussion and Conclusion

In this paper, we had exploited the depth reconstruction from water drops. In our pipeline, there are three key steps: the water-drop 3D shape reconstruction, depth estimation using stereo, and water-drop image rectification. All of these are done using a single image. We evaluated our method, and it shows that the method works effectively for both synthetic and real images. Nevertheless, there are still some limitations in it. One of the limitation is the common perspective camera and lens cannot obtain high resolution image of water drops. Which degraded the overall performance of the depth estimation, specifically, the sparse/dense correspondence quality is degraded because of the low-resolution images. For future works, we are considering improving the image quality. Furthermore, we will consider simultaneous estimation of the waterdrop geometry and depth.


  • [Agrawal and Ramalingam 2013] Agrawal, A., and Ramalingam, S. 2013. Single image calibration of multi-axial imaging systems. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, 1399–1406.
  • [Baker and Nayar 1999] Baker, S., and Nayar, S. K. 1999. A theory of single-viewpoint catadioptric image formation. International Journal of Computer Vision 35, 2, 175–196.
  • [Favaro and Soatto 2005] Favaro, P., and Soatto, S. 2005. A geometric approach to shape from defocus. Pattern Analysis and Machine Intelligence, IEEE Transactions on 27, 3, 406–417.
  • [Feynman et al. 2013] Feynman, R. P., Leighton, R. B., and Sands, M. 2013. The Feynman Lectures on Physics, Desktop Edition Volume I, vol. 1. Basic Books.
  • [Garg and Nayar 2007] Garg, K., and Nayar, S. 2007. Vision and rain. International Journal of Computer Vision 75, 1, 3–27.
  • [Hartley and Zisserman 2004] Hartley, R. I., and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, second ed. Cambridge University Press, ISBN: 0521540518.
  • [Hassner and Basri 2006] Hassner, T., and Basri, R. 2006. Example based 3d reconstruction from single 2d images. In Computer Vision and Pattern Recognition Workshop, 2006. CVPRW’06. Conference on, IEEE, 15–15.
  • [Horry et al. 1997] Horry, Y., Anjyo, K.-I., and Arai, K. 1997. Tour into the picture: using a spidery mesh interface to make animation from a single image. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., 225–232.
  • [Ikeuchi and Horn 1981] Ikeuchi, K., and Horn, B. K. 1981. Numerical shape from shading and occluding boundaries. Artificial intelligence 17, 1, 141–184.
  • [Joshi and Carr 2008] Joshi, P., and Carr, N. A. 2008. Repoussé: automatic inflation of 2d artwork. In Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling, Eurographics Association, 49–55.
  • [Kanaev et al. 2012] Kanaev, A. V., Hou, W., Woods, S., and Smith, L. N. 2012. Restoration of turbulence degraded underwater images. Optical Engineering 51, 5, 057007–1.
  • [Levoy et al. 2004] Levoy, M., Chen, B., Vaish, V., Horowitz, M., McDowall, I., and Bolas, M. 2004. Synthetic aperture confocal imaging. In ACM Transactions on Graphics (TOG), vol. 23, ACM, 825–834.
  • [Malik and Rosenholtz 1997] Malik, J., and Rosenholtz, R. 1997. Computing local surface orientation and shape from texture for curved surfaces. International journal of computer vision 23, 2, 149–168.
  • [Morris 2004] Morris, N. J. W. 2004. Image-based water surface reconstruction with refractive stereo. PhD thesis, University of Toronto.
  • [Oreifej et al. 2011] Oreifej, O., Shu, G., Pace, T., and Shah, M. 2011. A two-stage reconstruction approach for seeing through water. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 1153–1160.
  • [Oswald et al. 2012] Oswald, M. R., Toppe, E., and Cremers, D. 2012. Fast and globally optimal single view reconstruction of curved objects. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 534–541.
  • [Prasad and Fitzgibbon 2006] Prasad, M., and Fitzgibbon, A. 2006. Single view reconstruction of curved surfaces. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, IEEE, 1345–1354.
  • [Ramalingam et al. 2006] Ramalingam, S., Sturm, P., and Lodha, S. K. 2006. Theory and calibration for axial cameras. In Computer Vision–ACCV 2006. Springer, 704–713.
  • [Roser and Geiger 2009] Roser, M., and Geiger, A. 2009. Video-based raindrop detection for improved image registration. IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops).
  • [Roser et al. 2010] Roser, M., Kurz, J., and Geiger, A. 2010. Realistic modeling of water droplets for monocular adherent raindrop recognition using bezier curves. Asian Conference on Computer Vision.
  • [Swaminathan et al. 2001] Swaminathan, R., Grossberg, M. D., and Nayar, S. K. 2001. Caustics of catadioptric cameras. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2, IEEE, 2–9.
  • [Szeliski 2010] Szeliski, R. 2010. Computer vision: algorithms and applications. Springer Science & Business Media.
  • [Taguchi et al. 2010] Taguchi, Y., Agrawal, A., Veeraraghavan, A., Ramalingam, S., and Raskar, R. 2010. Axial-cones: modeling spherical catadioptric cameras for wide-angle light field rendering. ACM Transactions on Graphics-TOG 29, 6, 172.
  • [Terzopoulos et al. 1988] Terzopoulos, D., Witkin, A., and Kass, M. 1988. Symmetry-seeking models and 3d object reconstruction. International Journal of Computer Vision 1, 3, 211–221.
  • [Tian and Narasimhan 2009] Tian, Y., and Narasimhan, S. G. 2009. Seeing through water: Image restoration using model-based tracking. In Computer Vision, 2009 IEEE 12th International Conference on, IEEE, 2303–2310.
  • [Vicente and Agapito 2013] Vicente, S., and Agapito, L. 2013. Balloon shapes: reconstructing and deforming objects with volume from images. In 3D Vision-3DV 2013, 2013 International Conference on, IEEE, 223–230.
  • [Xu et al. 2012] Xu, L., Jia, J., and Matsushita, Y. 2012. Motion detail preserving optical flow estimation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 34, 9, 1744–1757.
  • [You et al. 2013] You, S., Tan, R. T., Kawakami, R., and Ikeuchi, K. 2013. Adherent raindrop detection and removal in video. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
  • [You et al. 2015] You, S., Tan, R., Kawakami, R., Mukaigawa, Y., and Ikeuchi, K. 2015. Adherent raindrop modeling, detection and removal in video. Pattern Analysis and Machine Intelligence, IEEE Transactions on.
  • [Zorich and Cooke 2004] Zorich, V., and Cooke, R. 2004. Mathematical analysis. Springer.

Appendix A: Equivalent Camera Position for Flat Air-Water Refraction

Here we prove that we can further remove the refraction from water to air in Fig. 4.a by moving the camera to its approximated equivalent place at , as illustrated in Fig. 4.b.

We assume the camera position is and the flat plate is . Given a point on the plate where the refraction happens, the orientation of the incidence angle is:


The parallel and orthogonal components to the surface normal are:


The orientation of angle of refraction, denoted as , can be obtained according to Snell’s law:


Considering is normalized: , and thus:


Hence, the equivalent camera position is: when the incidence angle is close to the optic axis of the camera, i.e., , we can have the approximation that the equivalent camera position .

Appendix B: Detailed Implementation of Water-drop Multiple View Stereo

Once the geometry of each water drop is obtained, we perform multiple view stereo to estimate depth. We propose a raytracing based triangulation. As illustrated in Fig. 10, for two corresponding points and and their rays of refractions, the triangulation aims to find the position of point which minimizes the Euclidean distance to all the rays. This idea can be directly extended to more than two water drops.

There are 3 main steps in our multiple view steres: (1) Inverse raytracing, (2) Corresponding points for different water drops, and (3) Triangulation.

Inverse Raytracing

The goal of the inverse raytracing is to find the orientation of the ray of refraction. We call it inverse raytracing because we assume the ray is originated from the camera, refracted by the water drops and arrives at the objects.

As illustrated in Fig. 10, we show the inverse raytracing on the left water drop. Without loss of generality, we assume the camera position is and the flat plate is , and the image-plane has corresponding pixels with the flat plane using rotation and scaling. According to Appendix A, the equivalent camera position is .

For a pixel on the image plane, with a corresponding point on the flat plate , we can find the refraction location, denoted as , by using the constraints:


In practice, because finding the intersection between a flat plane and a line is easier than finding the intersection between a line and a curved surface, we specify and find the corresponding pixel .

At point , the angle of incidence is:


The surface normal is obtained according to the water drop geometry:



Then, the orientation of ray of refraction is obtained according to Snell’s law:


Finding Correspondence Between Water Drops

Finding the corresponding pixels between different warped water drop images is a challenging task. Compared to the normal cameras, the distortion between water drop images is significantly worse. Morever, unlike spherical mirrors/lenses where all the mirror/lenses share the same distortion, each water drop has its own distortion.

To solve this problem, we try to find the corresponding pixels on the angular-dewarped images. Note that, since the depth of image is not yet obtained, we cannot accurately dewarped the image. Thus, we dewarp the images solely according to the angle of refraction. Nevertheless, we find the angular-dewarping can signficantly recover the images.

As introduced in the inverse ray-tracing, for a pixel , with ray of refraction , we project the pixel to


Because the water drop surface is smooth and convex, implying the Jacobian on the surface is always positive, it means the angular mapping is one-to-one [Zorich and Cooke 2004]. Thus, we can map back the dewarped corresponding pixels to the warped image.

The third and fourth columns of Figure 13 show examples of the angular dewarping results and the dense correspondence on the dewarped images. Specifically, we use [Xu et al. 2012] for the dense correspondence estimation.


Now we can perform the classical triangulation as illustrated in Fig. 10. Given a set of corresponding pixels on each water drop , we can obtain its outbound ray of refraction:


The triangulation’s goal is to find the position of point , which minimizes the Euclidean distance to all the rays [Hartley and Zisserman 2004]:


The depth is the component of .