Recovering the 3D geometry of an object is still a quite open challenge in computer vision as most of the techniques provide good results in specific frameworks. In particular, two well-known approaches namely multi-view and photometric stereo have been developed to produce great results considering key complementary assumptions. Indeed, while multi-view is assumed to provide rough 3D volumetric reconstructions of textured objects, photometric stereo is supposed to retrieve highly detailed surfaces from a single view. High quality volumetric reconstruction of objects achieved by refining coarse multi-view reconstruction[10, 32] with shading information [12, 37, 34] is an old open way  of merging complementary information.
Multi-View Photometric Stereo (MVPS) approaches have been developed so as to overcome constraints coming from both sides, in order to deal with: specular highlights [14, 1], dynamic scenes ,visibility and occlusions  and mapping of the photometric stereo views onto the coarse volume [31, 29].
Since implicit parameterisation of volumes has been developed using level-set approaches [22, 28], recent advances in parametrising volumes with signed distance functions (SDFs) [45, 26] have made the multi-view approach prone to be merged with differential formulation of irradiance equation providing shading information . On the other hand, recent photometric stereo approaches have moved towards more realistic assumptions considering point light sources [24, 30] that make easier the acquisition process by using LEDs in a calibrated setting.
In this work we propose a novel method based on the following three novelties
A differential parameterisation of the volume based on the signed distance function that allows irradiance equation ratios to deal with near-field photometric stereo modeling .
A variational optimisation that fuses information from multiple viewpoints into a single system.
An octree implementation capable of retrieving highly accurate volumetric reconstructions in scenes with multiple discrete objects.
2 Related Works
Reconstructing accurate 3D geometry of a volume has been a very challenging area in computer vision. Most of the research trying to solve this problem has been developed merging multi-view methods for coarse reconstruction , with techniques based on shading information for providing high frequency details of the surface [40, 25, 4] rather than topological evolution of the surface . However, regarding the refinement, several methods take inspiration from Shape from Shading  to extract 3D geometry from a single image (MVSfS) and consider shape refinement coming from single shading cues [42, 41, 3]
. With the aim to improve the quality of the details and make the reconstruction more robust to outliers, multiple shading images from a single view point are considered. A number of MVPS approaches have been presented[7, 29, 44].
Merging shading information with multi-view images becomes a more complicated problem when considering specular surfaces. Drastic changes in both shading under different lighting and viewing point modify the appearance of the 3D geometry so that specific approaches have been developed to deal with irradiance equations with not negligible specular component. Jin et al. 
exploit a rank constraint on the radiance tensor field of the surface in space with the aim to fit the Ward reflectance model. Other approaches instead reconstructed an unknown object by using a radiance basis inferred from reference objects[35, 1]. Zhou et al.  developed a camera and a handheld moving light system for firstly capturing sparse 3D points and then refining the depth along iso-depth contours . A similar handheld system has been developed by Higo et al.  where multi-view images were acquired under varying illumination by a handled camera with a single movable LED point light source for reconstructing static scene.
In order to make the MVPS solvable, additional assumptions have been considered. Particularly, with the aim to compute the camera positions so as to map accurately the photometric stereo views, the relative motion of the camera and the object can be constrained. Hernandez et al.  captured multi-view images for a moving object under varying illuminations by combining shading and silhouettes assuming circular motion in order to compute the visual hull. Zhang et al.  generalised optical flow, photometric stereo, multiview-stereo and structure from motion techniques assuming rigid motion of the object under orthographic viewing geometry and directional lighting. Furthermore, shadows, occlusions or inter-reflections are not considered.
When photometric stereo (as well as SfS) has to be integrated with multi-view techniques, the problem of finding the correspondence of pixels with shading information onto the 3D surface is crucial. Geometric distortions produced by changes in pose have to be combined with varying illumination. One way to do so is by region tracking considering brightness variations using parametric models of geometry and illumination, or outlier rejection . Okatani and Deguchi  proposed a photometric method for estimating the second derivatives of the surface shape of an object when only inaccurate knowledge of the surface reflectance and illumination is given by assuming represented in a probabilistic fashion.
Other approaches instead align the shading images with the coarse 3D in order to map the photometric stereo data onto the coarse 3D shape [19, 15]. Delaunoy and Prados  use a gradient flow approach whereas Sabzevari et. al  firstly computes a 3D mesh with structure from motion with a low percentage of missing point and then the mesh is reprojected onto a plan using a mapping scheme . Recently, Park et al.  proposed a refinement method by computing an optical displacement map in the same 2D planar domain of the photometric stereo images. To do so, they transformed the coarse 3D mesh into parametrised 2D space using a distortion parameterisation technique .
In this work, with the aim to avoid the mapping procedure, we present a differential approach for MVPS. Being inspired by the signed distance function parameterisation used by Maier et al.  for the MVSfS problem, we derive a volumetric parameterisation handling the differential irradiance equation ratio presented in  for near-field photometric stereo. Instead projecting 2D images onto a rough 3D shape estimation as adopted in the state of the art MVPS method , we build an octree implementation which allows a fast ray-tracing. This accelerates the computation of shadows and occlusions from different views and high level of refinement starting from rough initial estimate of the 3D volume.
3 Signed Distance Function Parameterisation
With the aim to provide suitable mathematical characterisation of a collection of solid objects, we consider the implicit surface parameterisation in terms of the SDF
. This parameterisation turns out to be suitable for our aim due to its practical way of describing the outgoing normal vector to a surface. In fact, the SDF allows to describe the volumetric surface as the zeroth level-set of, . The essence of our differential approach is the observation that the surface normal equals to gradient of the SDF as follows
Similarly to  that used the SDF for single image shading refinement, we consider the SDF for the irradiance equation to derive a differential multi-view photometric stereo formulation where we assume to have images (i.e. light sources) for each known camera position (that is , ).
To exploit the monocular aspect of the photometric stereo problem, we consider image ratios for the Lambertian shading model  assuming calibrated nearby LED light sources
where is the image-plane projection of the 3D point and indicates the albedo. Note that as we are following a volumetric approach, the irradiance equation can be considered for each 3D point . The bar over a vector means that it is normalized (i.e. ). We model point light sources by considering the following from , where is the known position of the point light source with respect to the global coordinate system. We model the light attenuation considering the following non linear radial model of dissipation
where is the intrinsic brightness of the light source, is the principal direction (i.e. the orientation of the LED point light source) and is an angular dissipation factor.
Modeling with image ratios
As in , we follow the ratio method that significantly simplifies the PS problem by eliminating the dependence on the albedo as well as the non-linear normalisation of the normal.
Indeed, dividing equations for images and (from the same point of view ) as in (2), we have
which leads to
By substituting the parametrisation of the normal from (1), we get the following albedo independent, homogeneous linear PDE
The geometrical meaning of (6) is the extension to the 3D volumetric reconstruction of the PDE approach presented in . In fact, the photometric stereo model still consists of a homogeneous linear PDE where the tangentiality of on the surface is by definition the zeroth level set of the SDF. However, an important difference with  is that does not depend on (i.e. 6 is linear and not quasi-linear as proposed in ) in due to the fact that the relevant quantities are expressed on a global coordinate system independent of the existence of a surface. An interesting observation is that Equation 6 is conceptually similar with the iso-depth curves in the work of . Nonetheless, the SDF formulation is a more natural ’object centered’ depth and this allows for a unified optimisation as we describe in the next section.
In order to simplify the notation, we will rename the pair as and we will call the set of all the combination of pairs of images (with no repetition).
MVPS as a weighted least squares problem
With the aim to consider into a single mathematical framework photometric stereo images coming from different views, we stack in a single system the following weighted version of (6)
where and denotes the viewing vector on the volume for the camera position . This weight term is essentially a measure of visibility. The resulting system then counts equations as shown in (9) :
With the aim to solve it as a least square problem, we consider the normal equations:
is now a positive, semi-definite, 3x3 matrix.
The geometrical constraint coming from (6) ensures that all the vector fields span the same bi-dimensional space of the volume as they define the level-set of the SDF. This means that under ideal circumstances, the rank of in (10
) should be exactly 2. However, due to numerical approximations this is never exactly true; we enforce this constraint by using eigenvalue decomposition ofhence:
with and setting .
We note that this rank correction is a sanity check step. Indeed if with full rank, then which can never be true as (Eikonal equation) and so cannot be the SDF of any real surface.
4 Variational resolution
In this section, we describe how we build the variational solver to compute the signed distance field based parameterisation introduced in the previous section.
First of all, we note that as (10) is rank deficient, a closed form computation of the volume is not possible. We follow the standard of most modern variational approaches (e.g. ) and adopt a Tikhonov regulariser of the form where is some initial estimate of the SDF obtained from the distance transform of the an initial surface estimate. Thus the regularised problem becomes (using ):
To avoid excessive computation, we note that the photometric stereo equations do not need to be computed in the whole volume but rather only to a subset of voxels , which are close to the surface. In fact, (1) is only true in the vicinity the surface. We discretise the variational problem (12) by using first order forward finite differences , with being the sparse kernel matrix describing the connectivity in . The resulting linear system is solved with the conjugate gradients method.
|Experiment||Triangle Number||Visual Hull|
4.1 Octree Implementation
To manage the required set of voxels described above we use an octree structure. is defined at the leafs of the tree and Voxel neighbors for computing finite differences are found by bottom up up traversal of the tree.
We perform an iterative procedure of solving (12) on the leafs on the tree and then subsequently subdividing those leafs where the absolute value of SDF is smaller than 2 voxel sizes. At each iteration stage, the weights in (9) depend on the current surface normal which is approximated with the previous estimate of the geometry. During the iterations the octree evolves allowing bigger chances for the volume at the beginning that decrease in size as soon as the voxel are subdivided. The procedure repeats until the voxels are small enough so as their projection on the image planes is smaller than the pixel size and thus the maximum obtainable resolution has been reached. As a result, only a small fraction of the volume is considered for calculations and the hierarchy of voxels is densely packed around the surface. Finally, the reconstructed surface is computed with the Marching cubes variant of .
In order to deal with scenes with a complex geometry and potentially multiple objects, occlusions need to addressed. This is performed by ray-tracing lines from each voxel to each light source and camera and using the current estimate of geometry to check for cast shadows and occlusions. The octree structure allows for very quick visibility checks and whenever an occlusion/shadow is detected, the relevant weight in (9) is set to 0.
4.2 World Scale estimation
Similar to standard near-field calibrated PS methods, the proposed approach requires an initialisation of the correct world scale in order to parametrise light propagation and attenuation (6). For example, [23, 30] initialise the optimisation with a flat plane at a rough mean depth obtained from a ruler measurement. As explained in , too small mean distance flattens the reconstruction; too large makes it non-linearly stretch. On the other hand, the initial estimate obtained with MVS is consistent, but up to scale, i.e. with the unknown scale and the MVS estimate. Then, is a function of and also using Equation 3. Inspired by , we compute by minimising the image re-projection error namely (by first computing from the initial geometry estimate) over all pixels. This objective function is highly non-linear and non-convex (see Equation 3) but can by minimised using non-linear simplex111We used Matlab’s fminsearch with default parameters. by starting from a reasonable estimate obtained with a ruler measurement.
5 Experimental Part
With the aim to prove the capability of our approach to reconstruct 3D volumetric scene, we consider both synthetic and real data. We compare against  using the code from their website. It is worth to mention that differently from our method, their state-of-the-art approach for MVPS is based on a fully un-calibrated PS model.
For the synthetic case we use the Armadillo model from the Stanford 3D Scanning Repository222http://graphics.stanford.edu/data/3Dscanrep/. The virtual object was scaled to have approximate radius 20mm and the virtual camera of focal length 6mm was placed a several locations on a sphere of 45mm around the object. We rendered 12 views with 8 images each of resolution 1200x800x24bits per pixel (see Figure 1).
In order to quantitatively describe the dependency of the accuracy of the volumetric reconstruction to the initial estimate, we subsampled the initial mesh333For this purpose we used the quadric edge collapse decimation function of Meshlab. to 5 different meshes with number of triangle ranging from 250 to 30K (the original mesh was 150k triangles). For each of these meshes we added Gaussian noise to the vertex coordinates with std 0, 5, 10% of the average triangle size. Finally, we calculated the objects ’visual hull with naive voxel carving for a final experiment.
The evaluation metric is the RMS Hausdorff distance to the ground truth (computed with Meshlab). Results are shown in Figures2 and 4 and Table 1. The proposed approach outperforms  in all experiments.
5.1 Real Data
For acquiring real world data we used an active light system (see Figure 7) consisting of a FLIR camera BFS-U3-32S4C-C surrounded by OSRAM ultra bright LEDs for capturing data in the near-field. The images have been acquired while moving the object on a turning table, but without enhance the reconstruction by assuming the rotational movement of the objects. The multi-view data have been processed using VisualSFM [38, 39] and PMVS  for getting camera rotation and translation between the photometric stereo views as well as a low quality reconstruction to use as initial estimate. In addition, a few more images were captured in between the photometric stereo sequences (with neutral illumination) in order to make SFM more robust with respect to a too small overlap between images. To make the models obtained through MVS have less noise, we remove some noisy regions and background points far away from the scenes of interest. Then, we performed Poisson reconstruction  with a low level setting so as the initial estimate contains continues surfaces (and not point clouds). As Table 1 suggests, our method does not need a very accurate initial estimate. Finally, the initial SFD is computed as the distance transform of the initial surface.
Our real datasets include a marble Buddha statue, plaster bust of Queen Elisabeth (see Figure 5) and a combined scene with a swede next to a porcelain cup containing a small tree branch (see Figure 8). The reconstruction time using non-optimised Matlab code was 15-20 minutes on a modern i7 CPU with a peak memory consumption of 20-25GB. SDF was computed in around 4-6 M voxels (depending on dataset) of approximate size of 0.2mm.
The proposed approach outperforms  in all three datasets (Figures 6 and 9) and it is able to recover more detailed surfaces. One of the main reasons for the inability of  to recover very detailed reconstructions is because it is very limited by the very low quality initial estimates: they parametrise the initial surface into a 2D domain and then perform a single refinement step. In fact, triangulation artifacts can be seen at their reconstruction of the forehead of the queen (Figure 6 top right). The reason for performing a single step reconstruction is the fact that their 2D parametrisation and visibility estimation get very computationally expensive as a function of the resolution of the mesh. In contrast, our volumetric approach naturally handles multiscale estimation though arranging the voxels into an octree structure and so it is easy to repeatably refine the volume estimate until the voxels have 1-1 correspondence with image pixels. Finally, we note that the datasets presented in  have much higher quality initial estimates (their ’AccordionMan’ starts from 140k triangles vs the 8k triangles initial estimate for our ’Queen’ dataset).
We presented the first volumetric parameterisation based on the signed distance function for the MVPS problem. Very high accuracy is achieved by using an octree implementation for processing and ray-tracing the volume on a tree. While considering photometric stereo images, our fully differential formulation is albedo independent as it uses the irradiance equation ratio approach for the near-field photometric stereo presented in . One limitation of our approach comes from the irradiance modeling that takes into account Lambertian reflection only.
The main limitation of the proposed approach is the inability to cope with missing big portions of scene (this also true for most competing approaches e.g. [29, 44, 42]). For example, if the initial reconstruction is missing the hands of the Armadillo, they will not be recovered. This can potentially lead to to degradation of quality for the rest of the body as well, as the missing parts will lead to sub-optimal estimates of visibility. The theoretical justification for this is that our core assumption is only exactly true for points on the true surface, i.e. . Assuming continuity, we get if so we can perform our differential approach under the assumption that the set of voxels is relatively close to the true surface.
The main drawback of our method compared to mesh parameterisation techniques (e.g. ) is the elevated memory requirements. Even though the octree implementation minimises the number of voxels required, it is inevitable to need a few voxels per each potential surface point444 As the surface is the zero crossings of the SDF, at least a pair of opposite signed values are required per surface point. . In addition, the use of the variational optimisation is also memory expensive as the matrix enconding the neighbouring information about voxels needs to be stored in memory as well.
As future work, the image ratio based modeling can be extended in order to handle specular highlights using the model presented in . This requires to enhance the variational solver with the inclusion of a shininess parameter, as an additional unknown per voxel.
-  J. Ackermann, F. Langguth, S. Fuhrmann, A. Kuijper, and M. Goesele. Multi-view photometric stereo by example. In 3DV, 2014.
-  N. G. Alldrin and D. J. Kriegman. Toward Reconstructing Surfaces With Arbitrary Isotropic Reflectance : A Stratified Photometric Stereo Approach. In ICCV, 2007.
-  J. T. Barron and J. Malik. Shape, illumination, and reflectance from shading. PAMI, 2015.
M. Beljan, J. Ackermann, and M. Goesele.
Consensus multi-view photometric stereo.
DAGM Pattern Recognition, 2012.
-  A. Blake, A. Zisserman, and G. Knowles. Surface descriptions from stereo and shading. Image Vision Comput., 1985.
-  A. Delaunoy and E. Prados. Gradient flows for optimizing triangular mesh-based surfaces: Applications to 3d reconstruction problems dealing with visibility. IJCV, 2011.
-  C. H. Esteban, G. Vogiatzis, and R. Cipolla. Multiview photometric stereo. PAMI, 2008.
-  Y. Furukawa and J. Ponce. Accurate, dense, and robust multiview stereopsis. PAMI, 32(8):1362–1376, 2010.
-  G. D. Hager and P. N. Belhumeur. Efficient region tracking with parametric models of geometry and illumination. PAMI, 1998.
-  A. Harltey and A. Zisserman. Multiple view geometry in computer vision (2. ed.). Cambridge University Press, 2006.
-  T. Higo, Y. Matsushita, N. Joshi, and K. Ikeuchi. A hand-held photometric stereo camera for 3-d modeling. In ICCV, 2009.
-  B. K. P. Horn. Obtaining shape from shading information. The Psychology of Computer Vision, Winston, P. H. (Ed.), pages 115–155, 1975.
-  H. Jin, P. Favaro, and S. Soatto. Real-time feature tracking and outlier rejection with changes in illumination. In ICCV, 2001.
-  H. Jin, S. Soatto, and A. J. Yezzi. Multi-view stereo beyond lambert. In CVPR, 2003.
-  N. Joshi and D. J. Kriegman. Shape from varying illumination and viewpoint. In ICCV, 2007.
-  M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In Eurographics symposium on Geometry processing. Eurographics Association, 2006.
-  M. Kazhdan, A. Klein, K. Dalal, and H. Hoppe. Unconstrained isosurface extraction on arbitrary octrees. In ESGP, 2007.
-  J. Lambert. Photometria sive De mensura et gradibus luminis, colorum et umbrae. Sumptibus viduae Eberhardi Klett, typis Christophori Petri Detleffsen, 1760.
-  J. Lim, J. Ho, M. Yang, and D. J. Kriegman. Passive photometric stereo from motion. In ICCV, 2005.
-  L. Liu, L. Zhang, Y. Xu, C. Gotsman, and S. J. Gortler. A local/global approach to mesh parameterization. Comput. Graph. Forum, 2008.
-  R. Maier, K. Kim, D. Cremers, J. Kautz, and M. Niessner. Intrinsic3d: High-quality 3d reconstruction by joint appearance and geometry optimization with spatially-varying lighting. In ICCV, 2017.
-  R. Malladi, J. A. Sethian, and B. C. Vemuri. Shape modeling with front propagation: A level set approach. PAMI, 17(2):158–175, 1995.
-  R. Mecca, Y. Quéau, F. Logothetis, and R. Cipolla. A single lobe photometric stereo approach for heterogeneous material. SIAM Imaging Sciences, 2016.
-  R. Mecca, A. Wetzler, A. Bruckstein, and R. Kimmel. Near Field Photometric Stereo with Point Light Sources. SIAM Imaging Sciences, 7(4):2732–2770, 2014.
-  D. Nehab, S. Rusinkiewicz, J. Davis, and R. Ramamoorthi. Efficiently combining positions and normals for precise 3d geometry. ACM, 2005.
-  M. Nieß ner, M. Zollhöfer, S. Izadi, and M. Stamminger. Real-time 3d reconstruction at scale using voxel hashing. ACM, 2013.
-  T. Okatani and K. Deguchi. Optimal integration of photometric and geometric surface measurements using inaccurate reflectance/illumination knowledge. In CVPR, 2012.
-  S. Osher and R. Fedkiw. Level set methods and dynamic implicit surfaces, volume 153 of Applied mathematical sciences. Springer, 2003.
-  J. Park, S. N. Sinha, Y. Matsushita, Y. W. Tai, and I. S. Kweon. Robust multiview photometric stereo using planar mesh parameterization. PAMI, 2017.
-  Y. Quéau, B. Durix, T. Wu, D. Cremers, F. Lauze, and J. Durou. Led-based photometric stereo: Modeling, calibration and numerical solution. Journal of Mathematical Imaging and Vision, 2018.
-  R. Sabzevari, A. D. Bue, and V. Murino. Multi-view photometric stereo using semi-isometric mappings. In 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization Transmission, 2012.
-  S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR, 2006.
-  A. Sheffer, E. Praun, and K. Rose. Mesh parameterization methods and their applications. Foundations and Trends in Computer Graphics and Vision, 2006.
-  B. Shi, Z. Mo, Z. Wu, D. Duan, S. K. Yeung, and P. Tan. A benchmark dataset and evaluation for non-lambertian and uncalibrated photometric stereo. PAMI, 2018.
-  A. Treuille, A. Hertzmann, and S. M. Seitz. Example-based stereo with general brdfs. In ECCV, 2004.
-  D. Vlasic, P. Peers, I. Baran, P. E. Debevec, J. Popovic, S. Rusinkiewicz, and W. Matusik. Dynamic shape capture using multi-view photometric stereo. ACM, 28(5), 2009.
-  R. J. Woodham. Photometric method for determining surface orientation from multiple images. Optical Engineering, 19(1):134–144, 1980.
-  C. Wu. Towards linear-time incremental structure from motion. In 3D Vision-3DV 2013, 2013 International conference on, pages 127–134. IEEE, 2013.
-  C. Wu, S. Agarwal, B. Curless, and S. M. Seitz. Multicore bundle adjustment. In CVPR, pages 3057–3064. IEEE, 2011.
-  C. Wu, Y. Liu, Q. Dai, and B. Wilburn. Fusing multiview and photometric stereo for 3d reconstruction under uncalibrated illumination. IEEE Trans. Vis. Comput. Graph., 2011.
-  C. Wu, K. Varanasi, Y. Liu, H. Seidel, and C. Theobalt. Shading-based dynamic shape refinement from multi-view video under general illumination. In ICCV, 2011.
-  C. Wu, M. Zollhöfer, M. Nieß ner, M. Stamminger, S. Izadi, and C. Theobalt. Real-time shading-based refinement for consumer depth cameras. ACM, 2014.
-  L. Zhang, B. Curless, A. Hertzmann, and S. M. Seitz. Shape and motion under varying illumination: unifying structure from motion, photometric stereo, and multiview stereo. In ICCV, 2003.
-  Z. Zhou, Z. Wu, and P. Tan. Multi-view photometric stereo with spatially varying isotropic materials. In CVPR, 2013.
-  M. Zollhöfer, A. Dai, M. Innmann, C. Wu, M. Stamminger, C. Theobalt, and M. Nießner. Shading-based refinement on volumetric signed distance functions. ACM, 2015.