Scene and object 3D reconstruction is the process of capturing their shape and appearance using various methods and approaches such as stereo, structure from motion, shape from shading, and many more . The reconstruction is highly applicable in a number of fields as it provides the ability to understand 3D scenes and objects on basis of 2D images. The applications ranging from robotics and automated industrial quality inspection over human-machine interaction 
(example action, gesture and face recognition), satellite 3D data analysis, to movies and architectural applications . Additionally, the method is commonly used to analyse the surfaces of a celestial object, such as the Moon .
Photometric stereo (PS) is a well-established technique that is used for 3D surface reconstruction . The approach generally inherits the principle of appearance analysis of a 3D object on its 2D images. Based on the intensity information, these approaches attempt to infer the shape of the depicted object 
. It estimates shape and recovers surface normals of a scene by utilising several intensity images obtained under varying lighting conditions with an identical viewpoint[41, 16]
. By default, PS assumes a Lambertian surface reflectance; a standard reflectance model which defines a linear dependency between the normal vectors and image intensities. The definition of the model then can be used to determine the 3D space in the image. However, just a single Lambertian image is not adequate to correctly determine the surface shape. Therefore, the PS uses several images whose pixels corresponds to a single point on the object and is able to recover surface normals and albedos .
Light displays complicated attributes while interacting with objects resulting direct and indirect illumination as shown in figure 1.However, classical PS naively assumes that a scene is illuminated only directly by the emitting source. In presence of indirect illumination, it produces erroneous results with reduced reconstruction accuracy . For example, an indirect illumination such as inter-reflections makes concave objects appear shallower .
In this paper, we present an iterative 3D reconstruction method considering inter-reflections due to the concavities and the environment. We propose a novel method that accounts for inter-reflections in a calibrated photometric stereo environment. This approach utilises a reverted Monte Carlo ray tracing method to extract the environmental colour trying to minimise the inter-reflections within images used for photometric stereo. This approach not only accommodates the concave surface but also applies to any object in a scene with inter-reflections. The proposed method Iterative Ray Tracing Photometric Stereo - IRT PS iteratively applies Photometric Stereo (PS) and a reverted ray tracing algorithm based on a Monte-Carlo implementation to reconstruct with higher accuracy the observed surfaces. This approach iteratively reconstructs the surface and separates the indirect from direct lighting considering also the environment around the object. Likewise, the proposed IPT-PS method can be integrated to any PS technique removing the effects of inter-reflections and improving the overall reconstruction accuracy.
Our approach is extensively evaluated on three datasets and the overall results demonstrate improvement over the classic approaches. The main contributions of our work are:
a reverted Monte Carlo ray tracing algorithm to estimate the indirect lighting both from the environment and the object’s concavities.
an iterative surface reconstruction method that is utilised by the reverted Monte Carlo ray tracing
the proposed methodology that allows IRT-PS to be combined with any other PS algorithm improving the overall performance.
The paper is organised as follows: Section II provides background material on Photometric Stereo, followed by invert light transport, their properties and related works. In section 3, we introduce the mathematical definition of necessary the terms. In section 4, we propose a novel iterative PS method and discuss the suggested reverted Monte Carlo ray tracing algorithm. The performance of this approach is investigated in section 5, with section 6 concluding the work.
Ii Photometric Stereo
Photometric stereo (PS) is an approach to estimate the surface normal and reflectance (i.e albedo) of an object based on three or more intensity images with the fixed view under varying lighting condition . A number of solutions have been proposed to address this problem. Woodham  was the first to introduce the PS method. He proposed an approach simple and effective. However, he only considered Lambertian surface which suffers from noise. In his method, it is assumed that the surface albedo is known a prior for each point on the surface, the surface gradient can be obtained by using a three-point light source. Onn and Bruckstein  developed a two-image PS method. Their work was based on the assumption that the objects are smooth and no self-shadows are present. The PS was further extended by Coleman and Jain , which utilises four light sources, discards the specular reflections and estimates the surface shape by performing mean of diffuse reflections and the use of the Lambertian reflection model. Nayar et al.  proposed a PS method which used a linear combination of an impulse specular component and the Lambertian model to recover the shape and reflectance for a surface. Similarly, an algorithm for estimating the local surface gradient and real albedo from four sources in the presence of highlights and shadows was proposed by Barsky and Petrou . Chandraker et al. proposed an algorithm that required at least four light sources and images to reconstruct surface in presence of shadow. It is also worth mentioning the related work presented in [28, 11, 1, 34] following similar architectures and approaches.
Furthermore, over the previous years, methods that consider images produced by more general lighting conditions not known a prior. Basri et al.  proposed a PS method where no prior knowledge of the light source and its type is required, however the emitting source should be distant or unconstrained. They utilised low order spherical harmonics and optimised it to low-dimensional space to represent Lambertian objects. Likewise, Shi et al.  used colour and intensity profiles, which are obtained from registered pixels across images to propose a self-calibrating PS method. They automatically determine a radiometric response function and resolved the generalised bas-relief for estimating surface normals and albedos. While lighting conditions could be unknown, they required fixed viewpoint.
Nevertheless, a majority of the methods and models while working well with with matte objects, under-perform when the reconstructed objects are specular, transparent or with inter-reflections. Non-Lambertian reflection and specifically inter-reflection may be difficult to solve in photometric stereo. Solomon and Ikeuchi  developed a method where they utilised four lights and tried to extract the surface shape and roughness of an object which has specular lobe. They used a simplified version of Torrance-Sparrow reflectance model to determine the surface roughness. Bajcsy et al.  presented an algorithm for detecting diffuse and specular interface reflections and some inter-reflections. They used brightness, hue, and saturation values instead of RGB as they point out that the values have a direct correspondence to body colours and to diffuse and specular, shading, shadows and inter-reflections. But, the algorithm requires uniformly coloured dielectric surface under single coloured scene illumination. Tozz et al.  proposed a PS method that is independent of the albedo values and uses image ratio formulation. However, their method requires an initial separation of diffuse and specular components.
In addition, because of the nature of light, inter-reflection is unavoidable even in a controlled environment. This may vary in magnitude depending on the environment itself, the structure, and the material of the object. Moreover, it may not be uniform over the whole surface. As a result, the images are blurred locally in shade. Most photometric methods do not consider inter-reflection from an environment and concave surfaces, and those that do have considered one of the two cues only. One of the first attempts at scene recovery under inter-reflection was purpose by Nayar et al. . They presented an iterative algorithm which recovers shapes from a concave surface which first estimates the shape from intensity data; then this shape is used as input, and the radiosity method is applied to estimate a corrected, no-interreflection image intensity distribution. These steps are carried iteratively until convergence. Nevertheless, the algorithm only examines the inter-reflection in concave shapes, Lambertian reflectance models and does not take into account the colour of the inter-reflected light. Funt and Drew 
proposed an algorithm which is based on singular value decomposition of the colour for a convex surface. They proposed a“one-bounce” model which measured inter-reflection between two matte convex surfaces with a uniform colour and illumination can vary spatially in its intensity but not in its spectral composition. Again, the algorithm is specific to convex surface assuming a uniform colour and illumination that can vary spatially. Langer  did the study on the shadows which becomes inter-reflections. They proposed a method for inferring surface colour in a uni-chromatic scene which is based on the relative contrast of the scene in different colour channels. Again, the method is highly specific and only deals with inter-reflection related to shadow.
Most existing shape from intensity techniques accounts for an only direct component of light transport. Nayar et al.  proposed using high-frequency illumination patterns to separate direct and indirect illumination from more general scenes.Gupta et al.  studied the relation between illumination defocus and global light transport. Again, Chen et al.  used modulated structured light patterns with high-frequency patterns to mitigate the effects of indirect illumination. Lamond et al.  used high-frequency light patterns to separate the diffuse and specular components of BRDF. Holroyd et al.  constructed a high-accuracy imaging system for measuring the surface shape and BRDF. All these techniques either are an active method or they assume that the indirect illumination in each of the acquired images is caused by a single source. In contrast, we consider separation of indirect components by simulating the inter-reflections and removing it from the source images.
Iii Forward Light Propagation
An image captured by the camera is the results of a complex sequence of reflections and inter-reflections. When light is emitted from the source, it bounces off the scene’s surface one or more times before reaching to a camera.
In theory, every image can be captured as infinity sum, , where denotes the total contribution of light that bounces times before reaching the camera as shown in figure 1. For example, is the captured image if it was possible to remove all the indirect illumination from reaching the camera sensor, while the infinite sum describes the total contribution of indirect illumination. Although we can capture the final image using a camera, the individual “n-bounce” images are not directly measurable in the real-world scenario.
Nevertheless, the techniques for simulating inter-reflections and other light transport effects are not new in the computer vision and graphics. The algorithm that simulated the forward light transport was solved by Kajiya. The algorithm is also known as rendering equation. The rendering equation is an integral in which the radiance leaving a point is given as the sum of emitted plus reflected radiance under a geometric optics approximation.
Where is related to the intensity of light passing from to point . is a ”geometry” term, is related to the intensity of emitted light from to and is related to the intensity of light scattered from to by a patch of surface at .
An algorithm such as ray tracing  solved the equation 1 by using Monte-Carlo methods, whereas radiosity  used finite element method to produce near realistic looking images in the field.
For a Lambertian object illuminated by a light source of parallel rays, the observed image intensity at each pixel is given by the product of the albedo and the cosine of the incidence angle (the angle between the direction of the incident light and the surface normal) . The above incidence angle can be expressed as the dot product of two unit vectors, the light direction and the surface normal , .
Let us now consider a Lambertian surface patch with albedo and normal , illuminated in turn by several fixed and known illumination sources with directions , , …, . In this case we can express the intensities of the obtained pixels as:
We stack the pixel intensities to obtain the pixel intensity vector
. Also the illumination vectors are stacked row-wise to form the illumination matrix . Equation (2) could then be rewritten in matrix form:
If there are at least three illumination vectors which are not coplanar, we can calculate and using the Least Squares Error technique, which consists of using the transpose of , given that is not a square matrix:
Since has unit length, we can estimate both the surface normal (as the direction of the obtained vector) and the albedo (as its length). Extra images allow one to recover the surface parameters more robustly.
Iv Proposed Iterative Ray Tracing Photometric Stereo Method (IRT-PS)
In nature, when we illuminate a surface, light not only reflects towards the viewer but also among all surfaces in the environment. This is always true, with exception of scenes that consists only of a single convex surface. In general, scenes include concave surfaces where points reflect light between themselves. Furthermore, inter-reflections can occur due to the environment and appreciably can alter a scene’s appearance. In figure 2, to simulate the inter-reflections the sphere is placed within the Cornell box  and highlights the inter-reflections i.e sphere receive the colours from its environment.
Existing computer vision algorithms do not account for effects of inter-reflections and hence often produce erroneous results. The algorithms that are directly affected by inter-reflections are the shape-from-intensity algorithms including Photometric Stereo. Due to the common assumption of single surface reflections (direct illumination) and disregarding higher order (inter-reflections, a subset of global illumination), photometric methods produce erroneous results when applied to open scenes.
The first stage of this approach (stage 0), is performed only once throughout the process and involves the acquisition of the initial input images. It is assumed that inter-reflections are present and that the captured surface is within the known environment. In our case within a Cornell Box.
Moving to the following stage, PS is applied to the images acquired at stage 0 using equation 4 to obtain the initial albedo and normals . Integrating over the obtained normals a 3D surface is obtained using the M-estimator technique. This initial surface that is affected by the presence of the inter-reflections becomes the input to the following stage, that involves the proposed reverted ray tracing algorithm.
As environment information is known prior to reconstruction, we can implement our environment. The Cornell Box was setup as the environment at the following stage 3. More realistic textures can be used for the walls without affecting the proposed methodology.
In stage 4, we simulate the environment assuming the Cornell box is given or estimated. In our case, this approach can be extended to other realistic environmental projection such as Hemispherical Dome Projection  without affecting the proposed methodology. Then we place the generated surface within this environment.
In the following stage, based on equation 7, the reverted ray tracing algorithm is applied. Since we are only interested in inter-reflections, only the indirect illumination is calculated.To implement the ray tracer for Lambertian surface, we solve the rendering equation by integrating Monte Carlo estimator
Where is the total outgoing radiance reflected at along the direction. is the radiance incident at along the direction. determines how much radiance is reflected at in direction , due to irradiance incident at along the direction. is from the Lambert’s cosine law: diffuse reflection is directly proportional to of the normals and the incident illumination (). Finally, is an integral over a given hemisphere.
As Monte-Carlo approximation is a method to approximate the expectation of a random variable, using samples.
where, is an approximation of average value of random variable . is the sample size. And when we integrate it to equation 5 we solve the rendering equation.
However, Monte-Carlo estimator is affected by noise, the ray tracer algorithm also inherited such a problem. For example, to half the noise in an image rendered by ray tracing, we need to quadruple the number of samples.
To estimate the environmental colour, we first hit the surface with rays from each pixel, consider techniques such as hemisphere sampling and we randomly reflect the rays toward the environment. As a result, the images of the environment are captured for the various levels/depths of ray reflection. In this study, we only use up to 3 reflection rays (1 to 3) with just a single sampling, as shown in figure 5. Because we are not calculating all the ray reflections within the environment, we will have pixel locations without intensity values. An example can be seen in figure 6
. Therefore, we are using a non-uniform interpolation algorithm to approximate the missing values in the obtained environmental intensity images , where corresponds to the number of ray reflections.
In figure 6, we see that the more ray reflects, the less bright the pixels become. The main reason behind this phenomenon is because of ray tracing algorithm and considering that the first ray has more influence on the final pixel intensity than the ray . Therefore, when we have more ray reflections, the intensity of the pixels needs to be reduced, accordingly.
In stage 5, we generate the new input images by subtracting the environmental intensity reducing the inter-reflections from the original input images. There are three different sets of images for each ray reflection , and .
Finally, the obtained images which have fewer inter-reflections (example difference image is shown in figure 7) are used for as input to photometric stereo, generating a new surface. The whole process can be applied iteratively for a certain number of iterations or until the difference between a new 3D surface and the previous one is less than a given threshold.
V Experiments and Results
In our comparative evaluation study, three different datasets with ground truth were used. Scan data from the Harvard PS dataset , a dataset with faces  and synthetic data generated by simulated objects (see figures 8 and 11).
We used the photometric stereo approach to reconstruct the sets of the acquired surface, with and without inter-reflections considering different numbers (1 to 3) of ray reflections in the proposed reverted Monte-Carlo ray tracing algorithm. We then estimate the height-, albedo- and normal-error comparing to classic PS method  using the available ground truth.
To calculate the height-error we used the equation,
is the mean for height error. is the height value of ground truth surface, whereas is the height value of reconstructed surface. Regarding the albedo-error we use the equation below,
where is the albedo-error from mean of individual colour channel; Red , Green , and Blue channel.
Likewise, to calculate normal-error we utilise the following equation:
denote the mean normal-error for all the axis and .Where is a mean error for X axis, is mean error for Y, and is mean error for Z, is normal from reconstructed surface.
From the table 1, and charts in figure 11, we can see that the overall trend of mean Height, Albedo, and Normal errors are reduced with our approach than the classic photometric stereo one. In table 1, text highlighted in red are the average overall results of the  photometric stereo method. Whereas best results from our IRT-PS approach are highlighted in the green text. From the charts figure 11, we can see the general trend of the height error: Results improve with each additional ray and the best result is achieved by Ray 3. Likewise, the best result for Albedo and Normal are given by Ray 2. The indirect illumination captured by Rays R3 and R2 of the environment were able to reduce the inter-reflection effect from the original images. Furthermore, looking at the overall table and comparing to PS , we again see that our method improves in all the estimation. The greatest improvement can be seen in Height, followed by Normal, and finally the Albedo error. This shows that if we improve the captured indirect illumination then it should result in more accurate and detailed reconstructed surfaces.
In this work, a novel iterative method considering inter-reflections both due to concavities and the environment was proposed. The IRT-PS approach iteratively applies Photometric Stereo and a reverted Monte-Carlo ray tracing algorithm, reconstructing the observed surface and separating the indirect from direct lighting. A comparative study was performed evaluating the reconstruction accuracy of the proposed solution on three different datasets and the overall results demonstrate improvement over the classic approaches that do not consider environmental inter-reflections.
This work is co-funded by the NATO within the WITNESS project under grant agreement number G5437. The Titan X Pascal used for this research was donated by the NVIDIA Corporation.
-  V. Argyriou and M. Petrou. Photometric stereo: an overview. Adv. Imaging Electron. Phys., 156:1–54, 2009.
-  R. Bajcsy, S. W. Lee, and A. Leonardis. Detection of diffuse and specular interface reflections and inter-reflections by color image segmentation. International Journal of Computer Vision, 17:241–272, 1996.
-  S. Barsky and M. Petrou. Shadows and highlights detection in 4-source colour photometricstereo. ICIP, 3:967–970, October 2001.
-  R. Basri, D. Jacobs, and I. Kemelmacher”. ”photometric stereo with general, unknown lighting.”. ”IJCV”, ”72”(”3”):”239–257”, ”2007”.
-  P. N. Belhumeur and D. J. Kriegman. What is the set of images of an object under all possible lighting conditions? In CVPR, 1996.
V. Bloom, D. Makris, and V. Argyriou.
Clustered spatio-temporal manifolds for online action recognition.
2014 22nd International Conference on Pattern Recognition, pages 3963–3968, Aug 2014.
-  M. Chandraker, S. Agarwal, and D. Kriegman. Shadowcuts: Photometric stereo with shadows. CVPR, June 2007.
-  T. Chen, H.-P. Seidel, and H. P. A. Lensch. Modulated phase-shifting for 3d scanning. 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008.
-  E. Coleman and R. Jain. Obtaining 3-dimensional shape of textured and specular surfaces using four-source photometry. CVGIP, 18:309, 1982.
-  C. H. Esteban, G. Vogiatzis, and R. Cipolla. Multiview photometric stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:548–554, 2008.
-  G. Finlayson, M. Drew, and C. Lu. Intrinsic images by entropy minimisation. In Proc. ECCV, pages 582–595, 2004.
-  J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes. Computer graphics - principles and practice, 2nd edition. 1990.
-  R. T. Frankot and R. Chellappa. A method for enforcing integrability in shape from shading algorithms. IEEE Tran PAMI, 10(4):439–451, Jul 1988.
-  B. V. Funt and M. S. Drew. Color space analysis of mutual illumination. IEEE Trans. Pattern Anal. Mach. Intell., 15:1319–1326, 1993.
-  M. Gupta, Y. Tian, S. G. Narasimhan, and L. Zhang. (de) focusing on global light transport for active scene recovery. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 2969–2976, 2009.
-  H. Hayakawa. Photometric stereo under a light source with arbitrary motion. 2002.
-  S. Herbort and C. Wöhler. An introduction to image-based 3d surface reconstruction and a survey of photometric stereo methods. 2011.
-  M. Hicks, B. J. Buratti, J. W. Nettles, M. Staid, J. Sunshine, C. Pieters, S. Besse, and J. M. Boardman. A photometric function for analysis of lunar images in the visual and infrared based on moon mineralogy mapper observations. 2011.
-  M. Holroyd, J. Lawrence, and T. E. Zickler. A coaxial optical scanner for synchronous acquisition of 3d geometry and surface reflectance. ACM Trans. Graph., 29:99:1–99:12, 2010.
-  B. Horn. Understanding image intensities. Artificial Intelligence, 8(11):201–231, 1977.
-  K. Ikeuchi. Determining surface orientations of specular surfaces by using the photometric stereo method. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-3:661–669, 1981.
-  D. S. Immel, M. F. Cohen, and D. P. Greenberg. A radiosity method for non-diffuse environments. In SIGGRAPH, 1986.
-  W. Jarosz, H. W. Jensen, and C. Donner. Advanced global illumination using photon mapping. In SIGGRAPH ’08, 2008.
-  J. T. Kajiya. The rendering equation. In SIGGRAPH, 1986.
-  D. Konstantinidis, T. Stathaki, V. Argyriou, and N. Grammalidis. Building detection using enhanced hog–lbp features and region refinement processes. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(3):888–905, March 2017.
-  B. Lamond, P. Peers, A. Ghosh, and P. Debevec. Image-based separation of diffuse and specular reflection using environmental structural illumination. 2009 IEEE International Conference on Computational Photography.
-  M. S. Langer. When shadows become interreflections. International Journal of Computer Vision, 34:193–204, 1999.
-  M. Levine and J. Bhattacharyya. Removing shadows. Pattern Recognition Letters, 26(3):251–265, 2005.
-  S. Nayar, K. Ikeuchi, and T. Kanade. Determining shape and reflectance of hybrid surfaces by photometric sampling. IEEE T. RA, 6(4):418–431, 1990.
-  S. K. Nayar, K. Ikeuchi, and T. Kanade. Shape from interreflections. International Journal of Computer Vision, 6:173–195, 1990.
-  S. K. Nayar, G. Krishnan, M. D. Grossberg, and R. Raskar. Fast separation of direct and global components of a scene using high frequency illumination. ACM Trans. Graph., 25:935–944, 2006.
-  S. Niedenthal. Learning from the cornell box. Leonardo, 35:249–254, 2002.
-  R. Onn and A. M. Bruckstein. Integrability disambiguates surface recovery in two-image photometric stereo. International Journal of Computer Vision, 5:105–113, 1990.
-  H. Ragheb and E. Hancock. A probabilistic framework for specular shape-from-shading. Pattern Recognition, 36(2):407–427, 2003.
-  F. Remondino. Image-based 3d modelling: a review. 2006.
-  B. Shi, Y. Matsushita, Y. Wei, C. Xu, and P. Tan. Self-calibrating photometric stereo. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1118–1125, 2010.
-  F. Solomon and K. Ikeuchi. Extracting the shape and roughness of specular lobe objects using four light photometric stereo. IEEE Trans PAMI, 18(4):449–454, 1996.
-  J. Sun, M. Smith, L. Smith, S. Midha, and J. Bamber. Object surface recovery using a multi-light photometric stereo technique for non-lambertian surfaces subject to shadows and specularities. Image and Vision Computing, 25(7):1050–1057, July 2007.
-  P. B. Swinburne. Spherical mirror : A new approach to hemispherical dome projection. 2005.
-  P. Tan, S. Lin, and L. Quan. Subpixel photometric stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:1460–1471, 2008.
-  A. Tankus and N. Kiryati. Photometric stereo under perspective projection. Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, 1:611–616 Vol. 1, 2005.
-  P. Thévenaz, T. Blu, M. Unser, and philippe. thevenaz. Image interpolation and resampling. 1999.
-  S. Tozza, R. Mecca, M. Duocastella, and A. D. Bue. Direct differential photometric stereo shape recovery of diffuse and specular surfaces. Journal of Mathematical Imaging and Vision, 56:57–76, 2016.
-  V.Argyriou and M. Petrou. Recursive photometric stereo when multiple shadows and highlights are present. Proceedings of CVPR, 2008.
-  R. Woodham. Photometric stereo: A reflectance map technique for determining surface orientation from image intesit. SPIE, 155:136–143, 1978.