 # LED-based Photometric Stereo: Modeling, Calibration and Numerical Solution

We conduct a thorough study of photometric stereo under nearby point light source illumination, from modeling to numerical solution, through calibration. In the classical formulation of photometric stereo, the luminous fluxes are assumed to be directional, which is very difficult to achieve in practice. Rather, we use light-emitting diodes (LEDs) to illuminate the scene to reconstruct. Such point light sources are very convenient to use, yet they yield a more complex photometric stereo model which is arduous to solve. We first derive in a physically sound manner this model, and show how to calibrate its parameters. Then, we discuss two state-of-the-art numerical solutions. The first one alternatingly estimates the albedo and the normals, and then integrates the normals into a depth map. It is shown empirically to be independent from the initialization, but convergence of this sequential approach is not established. The second one directly recovers the depth, by formulating photometric stereo as a system of PDEs which are partially linearized using image ratios. Although the sequential approach is avoided, initialization matters a lot and convergence is not established either. Therefore, we introduce a provably convergent alternating reweighted least-squares scheme for solving the original system of PDEs, without resorting to image ratios for linearization. Finally, we extend this study to the case of RGB images.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

3D-reconstruction is one of the most important goals of computer vision. Among the many techniques which can be used to accomplish this task, shape-from-shading

Horn1989a and photometric stereo Woodham1980a are photometric techniques, as they use the relationship between the gray or color levels of the image, the shape of the scene, supposedly opaque, its reflectance and the luminous flux that illuminates it.

Let us first introduce some notations that will be used throughout this paper. We describe a point on the scene surface by its coordinates in a frame originating from the optical center of the camera, such that the plane is parallel to the image plane and the axis coincides with the optical axis and faces the scene (cf. Fig. 1). The coordinates of a point in the image (pixel) are relative to a frame whose origin is the principal point , and whose axes and are parallel to and , respectively. If refers to the focal length, the conjugation relationship between and is written, in perspective projection:

 ⎧⎪ ⎪⎨⎪ ⎪⎩x=zfu,y=zfv. (1.1) Figure 1: Schematic representation of the geometric setup. A point x=[x,y,z]⊤∈R3 on the scene surface and a pixel p=[u,v]⊤∈R2 in the image plane are conjugated according to Eq. (1.1). Eq. (2.1) states that, when the scene is illuminated by a LED located in xs∈R3, the gray level I(p) of the pixel p conjugated to xis a function of the angle between the lighting vector s(x) and the normal n(x) to the surface in x (illuminance), of the angle θ between the principal direction ns of the LED and s(x) (anisotropy), of the distance ∥x−xs∥ between the surface point and the light source location (inverse-of-square falloff), and of the albedo in x (Lambertian reflectance).

The 3D-reconstruction problem consists in estimating, in each pixel of a part of the image domain, its conjugate point in 3D-space. Eq. (1.1) shows that it suffices to find the depth to determine from . The only unknown of the problem is thus the depth map , which is defined as follows:

 z:Ω⊂R2→R+p=[u,v]⊤↦z(p). (1.2)

We are interested in this article in 3D-reconstruction of Lambertian surfaces by photometric stereo. The reflectance in a point of such a surface is completely characterized by a coefficient , called albedo, which is 0 if the point is black and 1 if it is white. Photometric stereo is nothing else than an extension of shape-from-shading: instead of a single image, the former uses 3 shots , taken from the same angle, but under varying lighting. Considering multiple images allows to circumvent the difficulties of shape-from-shading: photometric stereo techniques are able to unambiguously estimate the 3D-shape as well as the albedo i.e., without resorting to any prior.

A parallel and uniform illumination can be characterized by a vector oriented towards the light source, whose norm is equal to the luminous flux density. We call the lighting vector. For a Lambertian surface, the classical modeling of photometric stereo is written, in each pixel , as the following system111The equalities (1.3) are in fact proportionality relationships: see the expression (2.12) of .:

 Ii(p)=ρ(x)si⋅n(x),i∈{1,…,m}, (1.3)

where denotes the gray level of under a parallel and uniform illumination characterized by the lighting vector , denotes the albedo in the point conjugate to , and denotes the unit-length outgoing normal to the surface in this point. Since there is a one-to-one correspondence between the points and the pixels , we write for convenience and , in lieu of and . Introducing the notation , System (1.3) can be rewritten in matrix form:

 I(p)=Sm(p), (1.4)

where vector and matrix are defined as follows:

 I(p)=⎡⎢ ⎢⎣I1(p)⋮Im(p)⎤⎥ ⎥⎦andS=⎡⎢ ⎢⎣s1⊤⋮sm⊤⎤⎥ ⎥⎦. (1.5)

As soon as non-coplanar lighting vectors are used, matrix has rank 3. The (unique) least-squares solution of System (1.4) is then given by

 m(p)=S†I(p), (1.6)

where is the pseudo-inverse of . From this solution, we easily deduce the albedo and the normal:

 ρ(p)=∥m(p)∥andn(p)=m(p)∥m(p)∥. (1.7)

The normal field estimated in such a way must eventually be integrated so as to obtain the depth map, knowing that the boundary conditions, the shape of domain  as well as depth discontinuities significantly complicate this task Durou2016 .

To ensure lighting directionality, as is required by Model (1.3), it is necessary to achieve a complex optical setup Moreno2006 . It is much easier to use light-emitting diodes (LEDs) as light sources, but with this type of light sources, we should expect significant changes in the modeling, and therefore in the numerical solution. The aim of our work is to conduct a comprehensive and detailed study of photometric stereo under point light source illumination such as LEDs.

#### Related works.

Modeling the luminous flux emitted by a LED is a well-studied problem, see for instance Moreno2008 . One model which is frequently considered in computer vision is that of nearby point light source. This model involves an inverse-of-square law for describing the attenuation of lighting intensity with respect to distance, which has long been identified as a key feature for solving shape-from-shading Iwahori1990 and photometric stereo Clark1992 . Attenuation with respect to the deviation from the principal direction of the source (anisotropy) has also been considered Bennahmias2007 .

If the surface to reconstruct lies in the vicinity of a plane, it is possible to capture a map of this attenuation using a white planar reference object. Conventional photometric stereo Woodham1980a can then be applied to the images compensated by the attenuation maps Angelopoulou2014a ; McGunnigle2003 ; Sun2013 . Otherwise, it is necessary to include the attenuation coefficients in the photometric stereo model, which yields a nonlinear inverse problem to be solved.

This is easier to achieve if the parameters of the illumination model have been calibrated beforehand. Lots of methods exist for estimating a source location Ackermann2013 ; Aoto2012 ; Ciortan2016 ; Giachetti2015 ; Hara2005 ; Powell2001 ; Shen2011 ; Takai2009 . Such methods triangulate this location during a calibration procedure, by resorting to specular spheres. This can also be achieved online, by introducing spheres in the scene to reconstruct Liao2016 . Calibrating anisotropy is a more challenging problem, which was tackled recently in Nie2016 ; Xie2015b by using images of a planar surface. Some photometric stereo methods also circumvent calibration by (partly or completely) automatically inferring lighting during the 3D-reconstruction process Koppal2007 ; Liao2016 ; Logothetis2017 ; Migita2008 ; Papadhimitri2014b ; SSVM2017 .

Still, even in the calibrated case, designing numerical schemes for solving photometric stereo under nearby point light sources remains difficult. When only two images are considered, the photometric stereo model can be simplified using image ratios. This yields a quasilinear PDE Mecca2015 ; Mecca2014b which can be solved by provably convergent front propagation techniques, provided that a boundary condition is known. To improve robustness, this strategy has been adapted to the multi-images case in Logothetis2017 ; Logothetis_2016 ; Mecca2016 ; CVPR2016 , using variational methods. However, convergence guarantees are lost. Instead of considering such a differential approach, another class of methods Ahmad2014 ; Bony2013 ; Collins2012 ; Huang2015 ; Kolagani1992 ; Nie2016b ; Papadhimitri2014b ; Yeh2016 rather modify the classical photometric stereo framework Woodham1980a , by alternatingly estimating the normals and the albedo, integrating the normals into a depth map, and updating the lighting based on the current depth. Yet, no convergence guarantee does exist. A method based on mesh deformation has also been proposed in Xie2015 , but convergence is not established either.

#### Contributions.

In contrast to existing works which focus either on modeling, calibrating or solving photometric stereo with near point light sources such as LEDs, the objective of this article is to propose a comprehensive study of all these aspects of the problem. Building upon our previous conference papers CVPR2016 ; SSVM2017 ; CVPR2017 , we introduce the following innovations:

• We present in Section 2 an accurate model for photometric stereo under point light source illumination. As in recent works Logothetis2017 ; Logothetis_2016 ; Mecca2015 ; Mecca2014b ; Mecca2016 ; Nie2016b ; Nie2016 ; Xie2015b , this model takes into account the nonlinearities due to distance and to the anisotropy of the LEDs. Yet, it also clarifies the notions of albedo and of source intensity, which are shown to be relative to a reference albedo and to several parameters of the camera, respectively. This section also introduces a practical calibration procedure for the location, the orientation and the relative intensity of the LEDs.

• Section 3 reviews and improves two state-of-the-art numerical solutions in several manners. We first modify the alternating method Ahmad2014 ; Bony2013 ; Collins2012 ; Huang2015 ; Kolagani1992 ; Nie2016b ; Papadhimitri2014b ; Yeh2016 by introducing an estimation of the shape scale, in order to recover the absolute depth without any prior. We then study the PDE-based approach which employs image ratios for eliminating the nonlinearities Logothetis2017 ; Logothetis_2016 ; Mecca2016 ; CVPR2016 , and empirically show that local minima can be avoided by employing an augmented Lagrangian strategy. Nevertheless, neither of these state-of-the-art methods is provably convergent.

• Therefore, we introduce in Section 4 a new, provably convergent method, inspired by the one recently proposed in SSVM2017 . It is based on a tailored alternating reweighted least-squares scheme for approximately solving the non-linearized system of PDEs. Following CVPR2017 , we further show that this method is easily extended in order to address shadows and specularities.

• In Section 5, we build upon the analysis conducted in CVPR2016 in order to tackle the case of RGB-valued images, before concluding and suggesting several future research directions in Section 6.

## 2 Photometric Stereo under Point Light Source Illumination

Conventional photometric stereo Woodham1980a assumes that the primary luminous fluxes are parallel and uniform, which is difficult to guarantee. It is much easier to illuminate a scene with LEDs.

Keeping this in mind, we have developed a photometric stereo-based setup for 3D-reconstruction of faces, which includes LEDs222We use white LUXEON Rebel LEDs: http://www.luxeonstar.com/luxeon-rebel-leds. located at about from the scene surface (see Fig. 2-a). The face is photographed by a Canon EOS 7D camera with focal length . Triggering the shutter in burst mode, while synchronically lighting the LEDs, provides us with images such as those of Figs. 2-b, 2-c and 2-d. In this section, we aim at modeling the formation of such images, by establishing the following result:

If the LEDs are modeled as anisotropic (imperfect Lambertian) point light sources, if the surface is Lambertian and if all the automatic settings of the camera are deactivated, then the formation of the images can be modeled as follows, for :

 Ii(p)=Ψi¯¯¯ρ(p)[nis⋅(x−xis)∥x−xis∥]μi{(xis−x)⋅n(p)}+∥xis−x∥3, (2.1)

where:

• is the “corrected gray level” at pixel conjugate to a point located on the surface (cf. Eq. (2.12));

• is the intensity of the -th source multiplied by an unknown factor, which is common to all the sources and depends on several camera parameters and on the albedo of a Lambertian planar calibration pattern (cf. Eq. (2.14));

• is the albedo of the surface point conjugate to pixel , relatively to (cf. Eq. (2.22));

• is the (unit-length) principal direction of the -th source, its location (cf. Fig. 2), and its anisotropy parameter (cf. Fig. 3 and Eq. (2.5));

• is the positive part operator, which accounts for self-shadows:

 {x}+=max{x,0}. (2.2)

In Eq. (2.1), the anisotropy parameters are (indirectly) provided by the manufacturer (cf. Eq. (2.6)), and the other LEDs parameters , and can be calibrated thanks to the procedure described in Section 2.2. The only unknowns in System (2.1) are thus the depth of the 3D-point conjugate to , its (relative) albedo and its normal . The estimation of these unknowns will be discussed in Sections 3 and 4. Before that, let us show step-by-step how to derive Eq. (2.1). Figure 2: (a) Our photometric stereo-based experimental setup for 3D-reconstruction of faces using a Canon EOS 7D camera (highlighted in red) and m=8 LEDs (highlighted in blue). The walls are painted in black in order to avoid the reflections between the scene and the environment. (b-c-d) Three out of the m=8 images obtained by this setup.

### 2.1 Modeling the Luminous Flux Emitted by a LED

For the LEDs we use, the characteristic illuminating volume is of the order of one cubic millimeter. Therefore, in comparison with the scale of a face, each LED can be seen as a point light source located at . At any point , the lighting vector is necessarily radial i.e., collinear with the unit-length vector . Using spherical coordinates of in a frame having as origin, it is written

 s(x)=−Φ(θ,ϕ)r2ur, (2.3)

where denotes the intensity of the source333The intensity is expressed in lumen per steradian () i.e., in candela ()., and the attenuation is a consequence of the conservation of luminous energy in a non-absorbing medium. Vector is purposely oriented in the opposite direction from that of the light, in order to simplify the writing of the Lambertian model.

Model (2.3) is very general. We could project the intensity on the spherical harmonics basis, which allowed Basri et al. to model the luminous flux in the case of uncalibrated photometric stereo Basri2003 . We could also sample in the vicinity of a plane, using a plane with known reflectance Angelopoulou2014a ; McGunnigle2003 ; Sun2013 .

Using the specific characteristics of LEDs may lead to a more accurate model. Indeed, most of the LEDs emit a luminuous flux which is invariant by rotation around a principal direction indicated by a unit-length vector  Moreno2008 . If is defined relatively to , this means that is independent from . The lighting vector in induced by a LED located in is thus written

 s(x)=Φ(θ)∥xs−x∥2xs−x∥xs−x∥. (2.4)

The dependency on of the intensity characterizes the anisotropy of the LED. The function is generally decreasing over (cf. Fig. 3). Figure 3: Intensity patterns of the LEDs used (source: http://www.lumileds.com/uploads/28/DS64-pdf). (a) Anisotropy function Φ(θ)/Φ0 as a function of θ. (b) Polar representation. These diagrams show us that θ1/2=π/3, which corresponds to μ=1 according to Eq. (2.6) (Lambertian source).

An anisotropy model satisfying this constraint is that of “imperfect Lambertian source”:

 Φ(θ)=Φ0cosμθ, (2.5)

which contains two parameters and , and models both isotropic sources () and Lambertian sources (). Model (2.5) is empirical, and more elaborate models are sometimes considered Moreno2008 , yet it has already been used in photometric stereo Logothetis2017 ; Logothetis_2016 ; Mecca2016 ; Mecca2015 ; Nie2016b ; Nie2016 ; SSVM2017 ; Xie2015b , including the case where all the LEDs are arranged on a plane parallel to the image plane, in such a way that Mecca2014b . Model (2.5) has proven itself and, moreover, LEDs manufacturers provide the angle  such that , from which we deduce, using (2.5), the value of :

 μ=−log(2)log(cosθ1/2). (2.6)

As shown in Fig. 3, the angle is for the LEDs we use. From Eq. (2.6), we deduce that , which means that these LEDs are Lambertian. Plugging the expression (2.5) of into (2.4), we obtain

 s(x)=Φ0cosμθxs−x∥xs−x∥3, (2.7)

where we explicitly keep to address the most general case. Model (2.7) thus includes seven parameters: three for the coordinates of , two for the unit vector , plus and . Note that appears in this model through the angle .

In its uncalibrated version, photometric stereo allows the 3D-reconstruction of a scene surface without knowing the lighting. Uncalibrated photometric stereo has been widely studied, including the case of nearby point light sources Huang2015 ; Koppal2007 ; Migita2008 ; Papadhimitri2014b ; Yeh2016 , but if this is possible, we should rather calibrate the lighting444It is also necessary to calibrate the camera, since the 3D-frame is attached to it. We assume that this has been made beforehand..

### 2.2 Calibrating the Luminous Flux Emitted by a LED

Most calibration methods of a point light source Ackermann2013 ; Aoto2012 ; Ciortan2016 ; Giachetti2015 ; Hara2005 ; Powell2001 ; Shen2011 ; Takai2009 do not take into account the attenuation of the luminous flux density as a function of the distance to the source, nor the possible anisotropy of the source, which may lead to relatively imprecise results. To our knowledge, there are few calibration procedures taking into account these phenomena. In Xie2015b , Xie et al. use a single pattern, which is partially specular and partially Lambertian, to calibrate a LED. We intend to improve this procedure using two patterns, one specular and the other Lambertian. The specular one will be used to determine the location of the LEDs by triangulation, and the Lambertian one to determine some other parameters by minimizing the reprojection error, as recently proposed by Pintus et al. in Pintus2016 .

#### Specular Spherical Calibration Pattern.

The location of a LED can be determined by triangulation. In Powell2001 , Powell et al. advocate the use of a spherical mirror. To estimate the locations of the LEDs for our setup, we use a billiard ball. Under perspective projection, the edge of the silhouette of a sphere is an ellipse, which we detect using a dedicated algorithm Patraucean2012 . It is then easy to determine the 3D-coordinates of any point on the surface, as well as its normal, since the radius of the billiard ball is known. For each pose of the billiard ball, detecting the reflection of the LED allows us to determine, by reflecting the line of sight on the spherical mirror, a line in 3D-space passing through . In theory, two poses of the billiard ball are enough to estimate , even if two lines in 3D-space do not necessarily intersect, but the use of ten poses improves the robustness of the estimation.

#### Lambertian Model.

To estimate the principal direction and the intensity in Model (2.7), we use a Lambertian calibration pattern. A surface is Lambertian if the apparent clarity of any point located on it is independent from the viewing angle. The luminance , which is equal to the luminous flux emitted per unit of solid angle and per unit of apparent surface, is independent from the direction of emission. However, the luminance is not characteristic of the surface, as it depends on the illuminance (denoted from French “ clairement”), that is to say on the luminous flux per unit area received by the surface in . The relationship between luminance and illuminance555A luminance is expressed in (or ), an illuminance in , or lux (). is written, for a Lambertian surface:

 L(x)=ρ(x)πE(x), (2.8)

where the albedo is defined as the proportion of luminous energy which is reemitted i.e., if is white, and if it is black.

The parameter is enough to characterize the reflectance666The reflectance is generally referred to as the bidirectional reflectance distribution function, or BRDF. of a Lambertian surface. In addition, the illuminance at a point of a (not necessarily Lambertian) surface with normal , lit by the lighting vector , is written777Negative values in the right hand side of Eq. (2.9) are clamped to zero in order to account for self-shadows.

 E(x)={s(x)⋅n(x)}+. (2.9)

Focusing the camera on a point of the scene surface, the illuminance of the image plane, at pixel conjugate to , is related to the luminance by the following “almost linear” relationship Horn1986 :

 ϵ(p)=βcos4α(p)L(x), (2.10)

where is a proportionality coefficient characterizing the clarity of the image, which depends on several factors such as the lens aperture, the magnification, etc. Regarding the factor , where is the angle between the line of sight and the optical axis, it is responsible for darkening at the periphery of the image. This effect should not be confused with vignetting, since it occurs even with ideal lenses Gardner1947 .

With current photosensitive receptors, the gray level at pixel is almost proportional888Provided that the RAW image format is used. to its illuminance , except of course in case of saturation. Denoting  this coefficient of quasi-proportionality, and combining equalities (2.8), (2.9) and (2.10), we get the following expression of the gray level in a pixel conjugate to a point located on a Lambertian surface:

 J(p)=γβcos4α(p)ρ(x)π{s(x)⋅n(x)}+. (2.11)

We have already mentioned that there is a one-to-one correspondence between a point and its conjugate pixel , which allows us to denote and instead of and . As the factor is easy to calculate in each pixel of the photosensitive receptor, since , we can very easily compensate for this source of darkening and will manipulate from now on the “corrected gray level”:

 I(p)=J(p)cos4α(p)=γβρ(p)π{s(x)⋅n(p)}+. (2.12)

#### Lambertian Planar Calibration Pattern.

To estimate the parameters and in Model (2.7) i.e., to achieve photometric calibration, we use a second calibration pattern consisting of a checkerboard printed on a white paper sheet, which is itself stuck on a plane (cf. Fig. 4

), with the hope that the unavoidable outliers to the Lambertian model will not influence the accuracy of the estimates too much. Figure 4: Two out of the q poses of the Lambertian planar calibration pattern used for the photometric calibration of the LEDs. The parts of the white cells which are used for estimating the LEDs principal directions and intensities are highlighted in red.

The use of a convex calibration pattern (planar, in this case) has a significant advantage: the lighting vector at any point of the surface is purely primary i.e., it is only due to the light source, without “bouncing” on other parts of the surface of the target, provided that the walls and surrounding objects are covered in black (see Fig. 2-a). Thanks to this observation, we can replace the lighting vector in Eq. (2.12) by the expression (2.7) which models the luminous flux emitted by a LED. From (2.7) and (2.12), we deduce the gray level of the image of a point located on this calibration pattern, illuminated by a LED:

 I(p)=γβρ(p)πΦ0cosμθ{(xs−x)⋅n(p)}+∥xs−x∥3. (2.13)

If poses of the checkerboard are used, numerous algorithms exist for unambiguously estimating the coordinates of the points of the pattern, for the different poses . These algorithms also allow the estimation of the normals (we omit the dependency in of , since the pattern is planar), and the intrinsic parameters of the camera999To perform these operations, we use the Computer Vision toolbox from Matlab.. As for the albedo, if the use of white paper does not guarantee that , it still seems reasonable to assume i.e., to assume a uniform albedo in the white cells. We can then group all the multiplicative coefficients of the right hand side of Eq. (2.13) into one coefficient

 Ψ=γβρ0πΦ0. (2.14)

With this definition, and knowing that is the angle between vectors and , Eq. (2.13) can be rewritten, in a pixel of the set containing the white pixels of the checkerboard in the pose (these pixels are highlighted in red in the images of Fig. 4):

 (2.15)

To be sure that in Eq. (2.15), is independent from the pose , we must deactivate all automatic settings of the camera, in order to make and constant.

Since is already estimated, and the value of is known, the only unknowns in Eq. (2.15) are and . Two cases may occur:

• If the LED to calibrate is isotropic i.e., if , then it is useless to estimate , and can be estimated in a least-squares sense, by solving

 (2.16)

whose solution is given by

 Ψ=q∑j=1∑p∈ΩjIj(p){(xs−xj)⋅nj}+∥xs−xj∥3q∑j=1∑p∈Ωj[{(xs−xj)⋅nj}+∥xs−xj∥3]2. (2.17)
• Otherwise (if ), Eq. (2.15) can be rewritten

 Ψ1μnsms⋅(xj−xs)=[Ij(p)∥xs−xj∥3+μ{(xs−xj)⋅nj}+]1μ. (2.18)

The least-squares estimation of vector defined in (2.18) is thus written

 min msq∑j=1∑p∈Ωj⎡⎢⎣ms⋅(xj−xs)−[Ij(p)∥xs−xj∥3+μ{(xs−xj)⋅nj}+]1μ⎤⎥⎦2. (2.19)

This linear least-squares problem can be solved using the pseudo-inverse. From this estimate, we easily deduce those of parameters and :

 ns=ms∥ms∥andΨ=∥ms∥μ. (2.20)

In both cases, it is impossible to deduce from the estimate of that of , because in the definition (2.14) of , the product is unknown. However, since this product is the same for all LEDs (deactivating all automatic settings of the camera makes and constant), all the intensities , , are estimated up to a common factor.

Fig. 5 shows a schematic representation of the experimental setup of Fig. 2-a, where the LEDs parameters were estimated using our calibration procedure. Figure 5: Two views of a schematic representation of the experimental setup of Fig. 2-a. The camera center is located in (0,0,0). A black marker characterizes the location xs of each LED (unit mm), the orientation of a blue arrow its principal direction ns, and the length of this arrow its intensity Φ0 (up to a common factor).

### 2.3 Modeling Photometric Stereo with Point Light Sources

If the luminous flux emitted by a LED is described by Model (2.7), then we obtain from (2.13) and (2.14) the following equation for the gray level at pixel :

 I(p)=Ψρ(p)ρ0[ns⋅(x−xs)∥x−xs∥]μ{(xs−x)⋅n(p)}+∥xs−x∥3. (2.21)

Let us introduce a new definition of the albedo relative to the albedo of the Lambertian planar calibration pattern:

 ¯¯¯ρ(p)=ρ(p)ρ0. (2.22)

By writing Eq. (2.21) with respect to each LED, and by using Eq. (2.22), we obtain, in each pixel , the system of equations (2.1), for .

To solve this system, the introduction of the auxiliary variable may seem relevant, since this vector is not constrained to have unit-length, but we will see that this trick loses part of its interest. Defining the following vectors, :

 (2.23)

and neglecting self-shadows (), then System (2.1) is rewritten in matrix form:

 I(p)=T(x)¯¯¯¯¯m(p), (2.24)

where has been defined in (1.5) and is defined as follows:

 T(x)=⎡⎢ ⎢⎣t1(x)⊤⋮tm(x)⊤⎤⎥ ⎥⎦. (2.25)

Eq. (2.24) is similar to (1.4). Knowing the matrix field would allow us to estimate its field of pseudo-inverses in order to solve (2.24), just as calculating the pseudo-inverse of allows us to solve (1.4). However, the matrix field depends on , and thus on the unknown depth. This simple difference induces major changes when it comes to the numerical solution, as discussed in the next two sections.

## 3 A Review of Two Variational Approaches for Solving Photometric Stereo under Point Light Source Illumination, with New Insights

In this section, we study two variational approaches from the literature for solving photometric stereo under point light source illumination.

The first one inverts the nonlinear image formation model by recasting it as a sequence of simpler subproblems Ahmad2014 ; Bony2013 ; Collins2012 ; Huang2015 ; Kolagani1992 ; Nie2016b ; Papadhimitri2014b ; Yeh2016 . It consists in estimating the normals and the albedo, assuming that the depth map is fixed, then integrating the normals into a new depth map, and to iterate. We show in Section 3.1 how to improve this standard method in order to estimate absolute depth, without resorting to any prior.

The second approach first linearizes the image formation model by resorting to image ratios, then directly estimates the depth by solving the resulting system of PDEs in an approximate manner Logothetis2017 ; Logothetis_2016 ; Mecca2016 ; CVPR2016 . We show in Section 3.2 that state-of-the-art solutions, which resort to fixed point iterations, may be trapped in local minima. This shortcoming can be avoided by rather using an augmented Lagrangian algorithm.

As in these state-of-the-art methods, self-shadows will be neglected throughougt this section i.e., we abusively assume . To enforce robustness, we simply follow the approach advocated in Bringier2012 , which systematically eliminates, in each pixel, the highest gray level, which may come from a specular highlight, as well as the two lowest ones, which may correspond to shadows. More elaborate methods for ensuring robustness will be discussed in Section 4.

Apart from robustness issues, we will see that the state-of-the-art methods studied in this section remain unsatisfactory, because their convergence is not established.

### 3.1 Scheme Inspired by the Classical Numerical Solution of Photometric Stereo

For solving Problem (2.24), it seems quite natural to adapt the solution (1.6) of the linear model (1.4). To linearize (2.24), we have to assume that matrix is known. If we proceed iteratively, this can be made possible by replacing, at iteration , by . This very simple idea has led to several numerical solutions Ahmad2014 ; Bony2013 ; Collins2012 ; Huang2015 ; Kolagani1992 ; Nie2016b ; Papadhimitri2014b ; Yeh2016 , which all require some kind of a priori knowledge on the depth. On the contrary, the scheme we propose here requires none, which constitutes a significant improvement. This new scheme consists in the following algorithm:

###### Algorithm 1 (alternating approach)
1:  Initialize . Set .
2:  loop
3:     Solve Problem (2.24) in the least-squares sense in each , replacing by , which provides a new estimation of :
 ¯¯¯¯¯m(k+1)(p)=T(x(k))†I(p). (3.1)
4:     Deduce a new estimation of the normal :
 n(k+1)(p)=¯¯¯¯¯m(k+1)(p)∥¯¯¯¯¯m(k+1)(p)∥. (3.2)
5:     Integrate the new normal field into an updated 3D-shape , up to a scale factor.
6:     Estimate this scale factor by nonlinear optimization.
7:     Set as long as .
8:

For this scheme to be completely specified, we need to set the initial 3D-shape . We use as initial guess a fronto-parallel plane at distance from the camera, being a rough estimate of the mean distance from the camera to the scene surface.

#### Integration of Normals.

Stages 3 and 4 of the scheme above are trivial and can be achieved pixelwise, but Stages 5 and 6 are trickier. From the equalities in (1.1), and by denoting the gradient of in , it is easy to deduce that the (non-unit-length) vector

 ¯¯¯¯n(p)=⎡⎢⎣f∂uz(p)f∂vz(p)−z(p)−p⋅∇z(p)⎤⎥⎦ (3.4)

is normal to the surface. Expression (3.4) shows that integrating the (unit-length) normal field allows to estimate the depth only up to a scale factor , since:

 n(p)∝⎡⎢⎣f∂uz(p)f∂vz(p)−z(p)−p⋅∇z(p)⎤⎥⎦∝⎡⎢⎣f∂u(κz)(p)f∂v(κz)(p)−(κz)(p)−p⋅∇(κz)(p)⎤⎥⎦. (3.5)

The collinearity of and leads to the system

 (3.6)

which is homogeneous in . Introducing the change of variable , which is valid since , (3.6) is rewritten

 (3.7)

The determinant of this system is equal to

 fn3(p)[un1(p)+vn2(p)+fn3(p)]=fn3(p)[¯¯¯¯p⋅n(p)], (3.8)

if we denote

 ¯¯¯¯p=[u,v,f]⊤. (3.9)

It is then easy to deduce the solution of (3.7):

 ∇~z(p)=−1¯¯¯¯p⋅n(p)[n1(p)n2(p)]. (3.10)

Let us now come back to Stages 5 and 6 of Algorithm 1. The new normal field is , from which we can deduce the gradient thanks to Eq. (3.10). By integrating this gradient between a pixel , chosen arbitrarily inside , and any pixel , and knowing that , we obtain:

 z(k+1)(p)=z(k+1)(p0)exp{∫pp0∇~z(k+1)(q)⋅dq}. (3.11)

This integral can be calculated along one single path inside going from to , but since the gradient field is never rigorously integrable in practice, this calculus usually depends on the choice of the path Wu1988 . The most common parry to this well-known problem consists in resorting to a variational approach, see for instance Durou2016 for some discussion.

Expression (3.11) confirms that the depth can only be calculated, from , up to a scale factor equal to . Let us determine this scale factor by minimization of the reprojection error of Model (2.24) over the entire domain . Knowing that, from (1.1) and (3.9), we get , this comes down to solving the following nonlinear least-squares problem:

 z(k+1)(p0)=argmin w∈R+Ealt(w):=∑p∈Ω∥∥I(p) −T(wfexp{∫pp0∇~z(k+1)(q)⋅dq}¯¯¯¯p)¯¯¯¯¯m(k+1)(p)∥∥2, (3.12)

which allows us to eventually write the 3D-shape update (Stages 5 and 6):

 x(k+1)=z(k+1)(p0)fexp{∫pp0∇~z(k+1)(q)⋅dq}¯¯¯¯p. (3.13)

#### Experimental Validation.

Despite the lack of theoretical guarantee, convergence of this scheme is empirically observed, provided that the initial 3D-shape is not too distant from the scene surface. For the curves in Fig. 6, several fronto-parallel planes with equation were tested as initial guess. The mean distance from the camera to the scene being approximately , it is not surprising that the fastest convergence is observed for this value of . Besides, this graph also shows that under-estimating the initial scale quite a lot is not a problem, whereas over-estimating it severely slows down the process. Figure 6: Evolution of the energy Ealt of the alternating approach, defined in (3.12), in function of the iterations, when the initial 3D-shape is a fronto-parallel plane with equation z≡z0. The used data are the m=8 images of the plaster statuette of Fig. 2. The proposed scheme consists in alternating normal estimation, normal integration and scale estimation (cf. Algorithm 1). It converges towards the same solution (at different speeds), for the five tested values of z0.

Fig. 7 allows to compare the 3D-shape obtained by photometric stereo, from sub-images of size in full resolution (bounding box of the statuette), which contain pixels inside , with the ground truth obtained by laser scanning, which contains points. The points density is thus almost the same on the front of the statuette, since we did not reconstruct its back. However, our result is achieved in less than ten seconds (five iterations of a Matlab code on a recent i7 processor), instead of several hours for the ground truth, while we also estimate the albedo. Figure 7: (a) 3D-reconstruction and (b) albedo obtained with Algorithm 1. (c) Ground truth 3D-shape obtained by laser scanning. Photometric stereo not only provides a 3D-shape qualitatively similar to the laser scan, but also provides the albedo. Figure 8: (a) Histogram of point-to-point distances between the alternating 3D-reconstruction and the ground truth (cf. Fig. 7). The median value is 1.3 mm. (b) Spatial distribution of these distances. The histogram peak is not located in zero. As we will see in Section 3.2, this bias can be avoided by resorting to a differential approach based on PDEs.

Fig. 8-a shows the histogram of point-to-point distances between our result (Fig. 7-a) and the ground truth (Fig. 7-c). The median value is . The spatial distribution of these distances (Fig. 8-b), shows that the largest distances are observed on the highest slopes of the surface. This clearly comes from the facts that, even for a diffuse material such as plaster, the Lambertian model is not valid under skimming lighting, and that self-shadows were neglected.

More realistic reflectance models, such as the one proposed by Oren and Nayar in Oren1995 , would perhaps improve accuracy of the 3D-reconstruction in such points, and we will see in Section 4 how to handle self-shadows. But, as we shall see now, bias also comes from normal integration. In the next section, we describe a different formulation of photometric stereo which permits to avoid integration, by solving a system of PDEs in .

### 3.2 Direct Depth Estimation using Image Ratios

The scheme proposed in Section 3.1 suffers from several defects. It requires to integrate the gradient at each iteration. This is not achieved by the naive formulation (3.12), but using more sophisticated methods which allow to overcome the problem of non-integrability Durou2009 . Still, bias due to inaccurate normal estimation should not have to be corrected during integration. Instead, it seems more justified to directly estimate the depth map, without resorting to intermediate normal estimation. This can be achieved by recasting photometric stereo as a system of quasilinear PDEs.

#### Differential Reformulation of Problem (2.24).

Let us recall (cf. Eq. (1.1)) that the coordinates of the 3D-point conjugate to a pixel are completely characterized by the depth :

 x=z(p)f[pf]. (3.14)

The vectors defined in (2.23) thus depend on the unknown depth values . Using once again the change of variable 101010Without this change of variable, one would obtain a system of homogeneous PDEs in lieu of (3.23), which would need regularization to be solved, see CVPR2016 ., we consider from now on each , , as a vector field depending on the unknown map :

 ti(~z):Ω→R3p↦ti(~z)(p)=Ψi[−nis⋅vi(~z)(p)∥vi(~z)(p)∥]μivi(~z)(p)∥vi(~z)(p)∥3, (3.15)

where each field depends in a nonlinear way on the unknown (log-) depth map , through the following vector field:

 vi(~z):Ω→R3p↦vi(~z)(p)=xis−exp(~z(p))f[pf]. (3.16)

Knowing that the (non-unit-length) vector defined in (3.4), divided by , is normal to the surface, and still neglecting self-shadows, we can rewrite System (2.1), in each pixel :

 Ii(p)=¯¯¯ρ(p)d(~z)(p)ti(~z)(p)⋅[f∇~z(p)−1−p⋅∇~z(p)], i∈{1,…,m}, (3.17)

with

 d(~z)(p)=√f2∥∇~z(p)∥2+(−1−p⋅∇~z(p))2. (3.18)

#### Partial Linearization of (3.17) using Image Ratios.

In comparison with Eqs. (2.1), the PDEs (3.17) explicitly depend on the unknown map , and thus remove the need for alternating normal estimation and integration. However, these equations contain two difficulties: they are nonlinear and cannot be solved locally. We can eliminate the nonlinearity due to the coefficient of normalization . Indeed, neither the relative albedo , nor this coefficient, depend on the index  of the LED. We deduce from any pair , , of equations from (3.17), the following equalities:

 ¯¯¯ρ(p)d(~z)(p) =Ii(p)ai(~z)(p)⋅∇~z(p)−bi(~z)(p) =Ij(p)aj(~z)(p)⋅∇~z(p)−bj(~z)(p), (3.19)

with the following definitions of and , denoting :

 ai(~z)(p) =f[ti1(~z)(p)ti2(~z)(p)]−ti3(~z)(p)p, (3.20) bi(~z)(p) =ti3(~z)(p). (3.21)

From the equalities (3.19), we obtain:

 [Ii(p)aj(~z)(p)−Ij(p)ai(~z)(p)]ai,j(~z)(p)⋅∇~z(p) =[Ii(p)bj(~z)(p)−Ij(p)bi(~z)(p)]bi,j(~z)(p). (3.22)

The fields and defined in (3.22) depend on but not on : Eq. (3.22) is thus a quasi-linear PDE in over . It could be solved by the characteristic strips expansion method Mecca2015 ; Mecca2014b if we were dealing with images only, but using a larger number of images is necessary in order to design a robust 3D-reconstruction method. Since we are provided with images, we follow Gotardo2015 ; Logothetis2017 ; Logothetis_2016 ; Mecca2016 ; CVPR2016 ; Smith2016 and write PDEs such as (3.22) formed by the pairs , . Forming the matrix field by concatenation of the row vectors , and the vector field by concatenation of the scalar values , the system of PDEs to solve is written:

 A(~z)∇~z=b(~z)over~{}% Ω. (3.23)

This new differential formulation of photometric stereo seems simpler than the original differential formulation (3.17), since the main source of nonlinearity, due to the denominator , has been eliminated. However, it still presents two difficulties. First, the PDEs (3.23) are generally incompatible and hence do not admit an exact solution. It is thus necessary to estimate an approximate one, by resorting to a variational approach. Assuming that each of the equalities in System (3.23) is satisfied up to an additive, zero-mean, Gaussian noise111111In fact, any noise assumption should be formulated on the images, and not on Model (3.23

), which was obtained by considering ratios of gray levels: if the noise on gray levels is Gaussian, then that on ratios is Cauchy-distributed

Hinkley1969 . Hence, the least-squares solution (3.24

) is the best linear unbiased estimator, but it is not the optimal solution.

, one should estimate such a solution by solving the following variational problem:

 min ~z:Ω→RErat(~z):=∥A(~z)∇~z−b(~z)∥2L2(Ω). (3.24)

Second, the PDEs (3.22) do not allow to estimate the scale of the scene. Indeed, when all the depth values simultaneously tend to infinity, then both members of (3.22) tend to zero (because the coordinates of do so, cf. (3.15)). Thus, a large, distant 3D-shape will always “better” fit these PDEs (in the sense of the criterion defined in Eq. (3.24)) than a small, nearby one (cf. Figs. 10 and 11). A “locally optimal” solution close to a very good initial estimate should thus be sought.

#### Fixed Point Iterations for Solving (3.24).

It has been proposed in Logothetis2017 ; Logothetis_2016 ; Mecca2016 ; CVPR2016 to iteratively estimate a solution of Problem (3.24), by uncoupling the (linear) estimation of from the (nonlinear) estimations of and of . This can be achieved by rewriting (3.24) as the following constrained optimization problem:

 min ~z:Ω→R∥A∇~z−b∥2L2(Ω)s.t.~{}{A=A(~z),b=b(~z), (3.25)

and resorting to a fixed point iterative scheme:

 ~z(k+1)= argmin ~z:Ω→R∥A(k)∇~z−b(k)∥2L2(Ω), (3.26) A(k+1)= A(~z(k+1)), (3.27) b(k+1)= b(~z(k+1)). (3.28)

In the linear least-squares variational problem (3.26), the solution can be computed only up to an additive constant. Therefore, the matrix of the system arising from the normal equations associated to the discretized problem will be symmetric, positive, but rank-1 deficient, and thus only semi-definite. Fig. 9 shows that this may cause the fixed point scheme not to decrease the energy after each iteration. This issue can be resolved by resorting to the alternating direction method of multipliers (ADMM algorithm), a standard procedure which dates back to the 70’s