Variational Uncalibrated Photometric Stereo under General Lighting

04/08/2019 ∙ by Bjoern Haefner, et al. ∙ Technische Universität München 0

Photometric stereo (PS) techniques nowadays remain constrained to an ideal laboratory setup where modeling and calibration of lighting is amenable. This work aims to eliminate such restrictions. To this end, we introduce an efficient principled variational approach to uncalibrated PS under general illumination, which is approximated through a second-order spherical harmonic expansion. The joint recovery of shape, reflectance and illumination is formulated as a variational problem where shape estimation is carried out directly in terms of the underlying perspective depth map, thus implicitly ensuring integrability and bypassing the need for a subsequent normal integration. We provide a tailored numerical scheme to solve the resulting nonconvex problem efficiently and robustly. On a variety of evaluations, our method consistently reduces the mean angular error by a factor of 2-3 compared to the state-of-the-art.



There are no comments yet.


page 1

page 3

page 7

page 8

page 10

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Photometric stereo techniques aim at acquiring both the shape and the reflectance of a scene. To this end, multiple images are acquired under the same viewing angle but varying lighting, and a physics-based image formation model is inverted. However, the classic way to solve this inverse problem requires lighting to be highly controlled, which restricts practical photometric stereo applications to laboratory setups where careful calibration of lighting must be carried out prior to any 3D-reconstruction.



Reflectance Shape
Figure 1: We present an efficient variational scheme to solve uncalibrated photometric stereo under general lighting. Given a set of input RGB images captured from the same viewing angle but under unknown, varying general illumination (top, images were acquired in an office under daylight, while freely moving a hand-held LED light source), fine-detailed reflectance and shape (bottom, we show the estimated albedo and perspective depth maps) are recovered by an end-to-end variational approach.

The objective of this research work is to simplify the overall photometric stereo pipeline, by providing an efficient solution to uncalibrated photometric stereo under general lighting, as illustrated in Figure 1.111We plan to make the code publicly available. In comparison with existing efforts in the same direction, the proposed one has the following advantages:

  • The joint estimation of shape, reflectance and general lighting is formulated as an end-to-end, mathematically transparent variational problem;

  • A real 3D-surface represented as a depth map is recovered, rather than possibly non-integrable normals;

  • It is robust, due to the use of Cauchy’s robust M-estimator and Huber-TV albedo regularization;

  • It is computationally efficient, thanks to a tailored lagged block coordinate descent scheme initialized using a simple balloon-like shape.

After reviewing related works in Section 2, we discuss in Section 3 the spherical harmonics-based image formation model considered in this work. It can be inverted using the variational approach to the joint recovery of depth, albedo and lighting which is discussed in Section 4. A dedicated numerical solution is then introduced in Section 5 and empirically evaluated in Section 6. Section 7 eventually draws the conclusion of this research.

2 Related Work

3D-models of scenes are essential in many applications such as visual inspection [Farooq2005] or computer-aided surgery using augmented reality [Collins2012]. A 3D-model consists of geometric (position, orientation, etc.) and photometric (color, texture, etc.) properties. Given a set of photographies, the aim of 3D scanning is to invert the image formation process in order to recover these geometric and photometric properties of the observed scene. This notion thus includes both those of 3D-reconstruction (geometry) and of reflectance estimation (photometry).

Many approaches to the problem of 3D-reconstruction from photographies have been studied, and they are grouped under the generic naming “shape-from-X”, where X stands for the clue which is being used (shadows [Shafer1983], contours [Brady1984], texture [Witkin1981], template [Bartoli2015], structured light [Geng2011], motion [Moons2008], focus [Nayar1994], silhouettes [Hernandez2004], etc.). Geometric shape-from-X techniques are based on the identification and analysis of feature point or areas in the image. In contrast, photometric techniques build upon the analysis of the quantity of light received by each photosite of the camera’s sensor. Among photometric techniques, shape-from-shading

is probably the most famous one. This technique, developed in the 70s by Horn

et al. [Horn1970], consists in 3D-reconstruction from a single image of a shaded scene. It is a classic ill-posed inverse problem whose numerical solving usually requires the surface’s reflectance to be known [Durou2008]. In order both to limit the ambiguities of shape-from-shading and to allow for automatic reflectance estimation, it has been suggested to consider not just one image of the scene, but several ones acquired from the same viewing angle but under varying lighting. This variant, which was introduced in the late 70s by Woodham [Woodham1978], is known as photometric stereo.

Among the various shape-from-X techniques mentioned above, photometric stereo is the only 3D-scanning technique i.e., the only one which is able to achieve both 3D-reconstruction and reflectance estimation. However, early photometric approaches strongly rely on the control of lighting. The latter is usually assumed for simplicity to be directional, although the case of nearby point light sources has recently regained some attention [logothetis2017semi, mecca2014near]. More importantly, lighting is assumed to be calibrated. Indeed, the uncalibrated problem is ill-posed: the underlying normal map can be estimated only up to a linear ambiguity [Hayakawa1994], which reduces to a generalized bas-relief one if integrability is enforced [Belhumeur1999]. To resolve the latter ambiguity, some prior on the scene’s surface or geometry must be introduced, see [Shi2018] for a recent survey. A natural way to enforce integrability consists in following a differential approach to photometric stereo [Chand13, mecca14] i.e., directly estimate the 3D-surface as a depth map instead of first estimating the surface normals and then integrating them. Such a differential approach to photometric stereo can be coupled with variational methods in order to iteratively refine depth, reflectance and lighting in a robust manner [CVPR2017]. In addition to the theoretical interest of enforcing integrability in order to limit ambiguities, differential approaches to photometric stereo have the advantage of bypassing the problem of integrating the estimated normal field, which is by itself a non-trivial problem [Queau_Survey]. Besides, any error in the estimated normal field might propagate during integration, and thus robustness to specularities or shadows must be enforced during normal estimation, see again [Shi2018] for some discussion.

All the research works mentioned in the previous paragraph assume that lighting is induced by a single light source. Nevertheless, many studies rather considered the case of more general illumination conditions, which finds a natural application in outdoor conditions [Sato1995]

. For instance, the apparent motion of the sun within a day induces changes in the illumination direction which, in theory, allow photometric stereo-based 3D-reconstruction. However, this apparent motion is close to being planar, and thus the rank of the set of illumination vectors is equal or close to 

 [Shen2014] (see also [Holdgeoffroy2015] for additional discussion on the stability of single-day photometric stereo). This situation is thus similar to the two-image case, which is known to be ill-posed since the early 90s [Kozera1993b, Onn1990, Yang1992], although it is still an active research area [Kozera2018]. In order to limit the instabilities due to this issue, one possibility is to consider images acquired over many seasons as in [Abrams2012, Ackermann2012]

, or to resort to deep neural networks 

[Holdgeoffroy2018]. Another one is to consider a non-directional illumination model to represent natural illumination, as for instance in [Jung2019]. Modeling natural illumination is a promising track, since such a model would not be restricted to sunny days, and images acquired under cloudy days are known to yield more accurate 3D-reconstructions [Holdgeoffroy2015].

However, the previous approaches to photometric stereo under natural illumination assume calibrated lighting, where calibration is deduced from time and GPS coordinates or from a calibration target. The case of both general and uncalibrated lighting is much more challenging and has been fewly explored, apart from studies restricted to sparse 3D-reconstructions [Shen2009] or relying on the prior knowledge of a rough geometry [Ackermann2012b, Peng2017, Shi2014]. Uncalibrated photometric stereo under natural illumination has been revisited recently in [Mo2018], using a spatially-varying equivalent directional lighting model. However, results were limited to the recovery of possibly non-integrable surface normals. Instead, the method which we propose in the present paper directly recovers the underlying surface represented as a depth map. Following the seminal work of Basri and Jacobs [Basri2007], it considers the spherical harmonics representation of general lighting in lieu of the equivalent directional approximation, as discussed in the next section.

3 Image Formation Model

In photometric stereo (PS), we are given a number of observations , each representing a multi-channel image (i.e. ) over a masked pixel domain . Assuming that the object being pictured is Lambertian, the surface’s reflectance is represented by the albedo , and the general image formation model is as follows, for all , , and :


Here is the unit sphere in , represents the channel-wise intensity of the incident light, and and are the channel-wise albedos and the unit-length surface normals, respectively, at the surface point conjugate to pixel . The operation in (1) encodes self-shadows. The overall integral collects elementary luminance contributions arising from all incident lighting directions . In the setup of uncalibrated PS, the quantities , , in addition to , are unknown.

Equivalent directional lighting [holdgeoffroy-15] approximates (1) via


where represents the mean lighting over the visible hemisphere at . The field is spatially variant but can be approximated by directional lighting over small local patches. Over each patch, one is thus faced with the ambiguities of directional uncalibrated PS [Hayakawa1994]. State-of-the-art patch-wise methods [Mo2018] first solve this problem over each patch, then connect the patches to form a complete normal field up to rotation, and eventually estimate the rotation which best satisfies the integrability constraint. Errors may however get propagated during the sequence, resulting in a possibly non-integrable normal field.

Instead of such an equivalent directional lighting model, we rather consider a spherical harmonic approximation (SHA) of general lighting [Basri2003, Basri2007]. By defining the half-cosine kernel as


we can view (1) as an analog of a convolution:


Invoking the Funk-Hecke theorem, we obtain the following harmonic expansion analogous to Fourier series:


Here the spherical harmonics form an orthonormal basis on , and and are the expansion coefficients of and with respect to . Since most energy in the expansion (5) concentrates on low-order terms [Basri2003], we obtain the second-order SHA by truncating the series up to the first nine terms (i.e., ):


The first-order SHA refers to the truncation up to the first four terms (i.e., ). It is shown in [Basri2003] that, for distant lighting, at least of the resulting irradiance is captured by the first-order SHA, and by the second-order SHA (cf. Figure 2 for a visualization).

Model (1) Model (7) (1st order) Model (7) (2nd order)
Figure 2: Illustration of RGB () environment lighting , the resulting images (assuming white albedos and a spherical shape) under the image formation model (1) and its approximation by spherical harmonics. The approximation by the second-order spherical harmonics is nearly perfect.

Plugging (6) and specifics of spherical harmonics [Basri2003] into (4), we finalize our image formation model as:


Here represents the second-order harmonic images, and represents the harmonic lighting vector whose entries have absorbed and constant factors of . A key advantage of the SHA (7) over the equivalent directional lighting model (2) lies in the spatial invariance of the lighting vectors , which yields a less ill-posed inverse problem [Basri2007]. The counterpart is the nonlinear dependency upon the normal components, which we will handle in Section 5 using a tailored numerical solution. In the next section, we build upon the key observations that integrability [Belhumeur1999] and perspective projection [papadhimitri2013new] both largely reduce the ambiguities of uncalibrated PS to derive a variational approach to inverting the SHA (7).

4 Variational Uncalibrated PS

In this section, we shall propose a joint variational model for uncalibrated PS. To this end, let a 3D-frame

be attached to the camera, with the optical center, the -axis aligned with the optical axis such that for any 3D point in front of the camera. Further let a 2D-frame be attached to the focal plane which is parallel to the -plane and contains the masked pixel domain . Under perspective projection, the surface geometry is modeled as a map given by


with the depth map and


the calibrated camera’s intrinsics matrix. In the following we denote for convenience .

Assuming that is differentiable, the surface normal at point is the unit vector oriented towards the camera such that , which yields the following parameterization of the normal by the depth:


Note that the dependence of on is linear.

Based on the forward model (7) and the parameterization (11) of normals, we formulate the joint recovery of reflectance, lighting and geometry as the following variational problem:


In the first term above, we use Cauchy’s M-estimator to penalize the data-fitting discrepancy:


It is indeed well-known that Cauchy’s estimator, being non-convex, is robust against outliers; see for instance 

[CVPR2017] in the context of PS. The scaling parameter is used in all experiments.

The second term in (13) represents a Huber total-variation (TV) regularization on each albedo map , with the Huber loss defined by


and being fixed in the experiments. It turns out that the Huber TV imposes desirable smoothness on the albedo maps and in turn improves the joint estimation overall. Eventually, is a weight parameter which balances the data-fitting term and the Huber TV one. Its value was empirically set to (see Section 6 for some discussion).

We remark that in (13), geometry is directly optimized in terms of the depth (rather than indirectly in terms of the normal ). This is typically known as a differential approach to PS in the literature [Chand13, logothetis2017semi, mecca14, mecca2014near, CVPR2017], with the benefit of both implicitly ensuring integrability (thus limiting ambiguities) and avoiding possibly error-prone integration of normals into depths as a post-processing step.

5 Solver and Implementation

To solve the variational problem (13) numerically, we follow a “discretize-then-optimize” approach. There, is replaced by , being the number of pixels inside , which yields discretized vectors . To alleviate notational burden, we sometimes refer to a pixel by its index and sometimes by its position . The spatial gradient is discretized using a forward difference stencil.

We shall apply a lagged block coordinate descent (LBCD) method to find a local minimum of the objective function in (23). Due to the (highly) non-convex nature of (23), initialization of optimization variables has a strong influence on the final solution. In our implementation, we initialize for all and for all . Moreover, during the first eight iterations we freeze the second-order spherical harmonics coefficients i.e., we reconstruct using only first-order spherical harmonic approximation as a warm start.

Based on the observation that most real-world scenes are convex, we initialize the depth as a balloon-like surface, as discussed in the following.

5.1 Depth Initialization

It is readily seen that a trivial constant initialization of the depth yields uniform vertically aligned normals and, hence, zero entries in the initial harmonic images . This would cause non-meaningful updates on albedos and lighting vectors ; cf. Figure 3 for an illustration.

To solve this issue, we specialize the depth initialization which undergoes two phases:

1. Following [Oswald2012], we generate a balloon-like depth map under orthographic projection.

2. We then convert the orthographic depth to a perspective depth via normal integration [Queau_Survey].

Phase 1 is pursued via the following constrained convex optimization:


Intuitively, the model seeks for which has minimal surface area subject to a constant volume . A global minimizer of this model can be efficiently computed by simple projected gradient iterations:


where and with the spectral norm. The volume constant

is a hyperparameter which is empirically chosen, see Section 

6 for discussion.

Next, we convert the orthographic depth to a perspective depth . Note that complies with the orthographic projection, under which a 3D-point is represented by


and the corresponding surface normal to the surface at conjugate to pixel is given by


Since is invariant to the projection model, Eq. (11) also implies that


where stands for the log-perspective depth. This further implies the formula for :


which can be integrated to obtain (and hence ). The overall pipeline in Phase 2 is summarized as follows:

  1. (): Compute by (20).

  2. (): Compute by (22).

  3. (): Perform integration [Queau_Survey] to obtain . Return as the initialized (perspective) depth.

Trivial and its result Our and its result
Figure 3: Impact of depth initialization: a trivial constant initialization on the left vs. our initialization on the right and its corresponding resulting geometry estimates.

5.2 Lagged Block Coordinate Descent

Even with a reasonable initialization, the numerical resolution of Problem (23) remains challenging. Due to the appearances of the spherical harmonic approximation and the Cauchy’s M-estimator , the objective in (23) is highly nonlinear and nonconvex. To tackle these challenges, here we present a lagged block coordinate descent (LBCD) method which performs efficiently in practice.

To derive LBCD, we introduce an auxiliary variable such that . This enables us to rewrite (11) as . Then we formulate the following constrained optimization problem:


where is the residual function defined by:


Upon initialization, the proposed LBCD proceeds as follows. At iteration , we lag one iteration behind, i.e.,


and then sequentially update each of the three blocks (namely , and ). In each resulting subproblem, we solve (lagged) weighted least squares problems as an approximation of the Cauchy loss and/or the Huber loss. This is detailed in the following:

  • (Update ): We evaluate the residual


    and then set up the (lagged) weight factors for both the Cauchy loss and the Huber loss as


    The albedos are updated as the solution to the following linear weighted least-squares problem:


    which is carried out by conjugate gradient (CG).

  • (Update ): The lighting subproblem is similar to the one for albedos, except for absence of the Huber TV term. Upon evaluation of the residual and the weight factor , we update by solving the following linear weighted least-squares problem via CG:

  • (Update ): The depth subproblem requires additional efforts. With and evaluated after the -update, we are faced with the following weighted least squares problem:


    where the dependence of on is still nonlinear. Therefore, we further linearize with respect to and arrive at the following update:


    where is the Jacobian of the map at . The resulting linearized least-squares problem is again solved by CG. In our experiments, we additionally incorporate backtracking line search in the -update to ensure a monotonic decrease of the energy.

6 Experimental Validation

3D Shapes

Armadillo Joyful Yell Lucy Thai Statue Exemplary environment maps


Bars Constant Ebsd Hippie Lena Pattern Rectcircle Voronoi White
Figure 4: The four 3D-shapes and nine albedo maps we used to create 36 (3D-shape, albedo) datasets. For each dataset, images were rendered using different environment maps such as those shown on the top right.

This section is concerned with the evaluation of the proposed nonconvex variational approach to uncalibrated photometric stereo under general lighting.

6.1 Synthetic Experiments

To validate the impact of the initial volume in (16), the tunable hyper-parameter , and the number of input images  in (13), we consider 36 challenging synthetic datasets. We use four different depth maps (“Joyful Yell” [Bendansie2015], “Lucy” [Levoy2005data], “Armadillo” [Levoy2005data] and “Thai Statue” [Levoy2005data]) and nine different albedo maps and each of those 36 combinations is rendered as described in (1) using different environment maps222Environment maps are downloaded from, cf. Figure 4. The resulting 25 RGB images per dataset are used as input, along with the intrinsic camera parameters and a binary mask . A quantitative evaluation on the triplet is carried out on four randomly chosen datasets (Armadillo & White albedo, Joyful Yell & Ebsd albedo, Lucy & Hippie albedo, and Thai Statue & Voronoi albedo), comparing the impact of each value of on the resulting mean angular error (MAE) between ground truth and estimated normals.

First, we validate the choice of the input volume using the initially fixed values of and . As the volume depends on the size of the mask, we consider a linear parametrization and evaluate a range of ratios . Figure 5 (left) indicates that the optimal value of is dataset-dependent. For synthetic datasets we always selected this optimal value, yet for real-world data no such evaluation is possible and must be tuned manually. Since the ballooning-based depth initialization can be carried out in real-time (implementation is parallelized in CUDA), the user has an immediate feedback on the initial depth and thus a plausible initial shape is easily drawn. Humans excel at estimating size and shape of objects [Baldwin2016] and real-world experiments will show that a manual choice of can result in appealing geometries.

Next, we evaluate the impact of , cf. Figure 5 (right). As can be seen, the depth estimate seems to deteriorate for too small and too large values of , whereas seems to provide good depth estimates across all albedo maps. Therefore we fix for all our upcoming experimental evaluation.

Unsurprisingly, the MAE is inversely proportional to the number of input images, but runtime increases (linearly) with , cf. Figure 6. We found that represents a good trade-off between runtime and accuracy, and fix for all our further experiments. Our Matlab implementation needs about minutes on a computer with an Intel processor.

Figure 5: Impact of the initial volume as well as on the accuracy of the estimated depth. Based on these experiments we choose , , , and for all experiments, where .
Figure 6: Impact of the number of images on the mean angular error (MAE) and the runtime. Based on these insights we choose for our experiments.

Having fixed the choice of , we can now evaluate our approach against other state-of-the-art methods. We compare our results against those obtained by an uncalibrated photometric stereo approach assuming directional lighting [Favaro2012], and another one assuming general (first-order spherical harmonics) illumination yet relying on an input shape prior (e.g., from an RGB-D sensor) [Peng2017]. As this limiting assumption on the access to a sensor-based depth prior is not always given and to make comparison fair, we input as depth prior to this method the ballooning initialization described in Section 5.1. Furthermore, we compare against another uncalibrated photometric stereo work under natural illumination [Mo2018]333Code associated with [Favaro2012] and [Peng2017] can be found online, and the results obtained by [Mo2018] were provided by the authors., which resorts to the equivalent directional lighting instead of spherical harmonics, cf. Section 3. Table 1 shows the median and mean MAEs over all 36 datasets (a more detailed table can be found in the supplementary material). On these datasets, it can be seen that our method quantitatively outperforms the current state-of-the-art by a factor of . This gain is also evaluated qualitatively in Figure 7, which shows a selection of two results.

Approach [Favaro2012] [Peng2017] [Mo2018] Ours
Median 27.16 21.14 34.06 9.17
Mean 34.15 21.18 35.53 10.72
Table 1: Median and mean of the mean angular errors (MAE) over all 36 datasets. The proposed approach overcomes the state-of-the-art by a factor of .
[Favaro2012] [Peng2017] [Mo2018] Ours GT

Joyful Yell & Lena


Thai Statue & White

Figure 7: Results of state-of-the-art approaches and our approach on two out of the 36 synthetic datasets. Numbers show the mean angular error (MAE) in degrees.

6.2 Real-World Experiments

For real-world data we use the publicly available dataset of [Haefner2018b]. It offers eight challenging real-world datasets of objects with complex geometry and albedo captured under daylight and a freely moving LED, along with intrinsics matrix and masks . Results are presented in Figure 8. Despite relying on a directional lighting model, the approach of [Favaro2012] produces reasonable results on some datasets (Face1, Ovenmitt or Shirt), but it fails on others. As [Peng2017] assumes a reliable prior on depth in order to perform a photometric refinement, this approach is biased towards its initialization and thus, only when the depth prior is very close to the objects’ rough shape (Ovenmitt, Shirt, Tabletcase, Vase) a meaningful geometry is recovered. The approach of [Mo2018] estimates a possibly non-integrable normal field only, and it can be seen that after integration the depth map might not be satisfactory. As our approach optimizes over depth directly, such issues are not apparent and we are able to recover fine-scale geometric details throughout all tests.

[Favaro2012] [Peng2017] [Mo2018] Ours

Face 1

Face 2







Figure 8: Results of state-of-the-art approaches and our approach on challenging real-world datasets. While the competing approaches fail on some datasets, our approach consistently yields satisfactory results.

7 Conclusion

We proposed a variational approach to uncalibrated photometric stereo (PS) under general lighting. Assuming a perspective camera setup, our method jointly estimates shape, reflectance and lighting in a robust manner. The possible non-integrability of normals is bypassed by the direct estimation of the underlying depth map, and robustness is ensured by resorting to Cauchy’s M-estimator and Huber-TV albedo regularization. Although the problem is nonconvex and thus numerically challenging and initialization-dependent, we tackled it efficiently through a tailored lagged block coordinate descent algorithm and ballooning-based depth initialization. Over a series of evaluations on synthetic and real data, we demonstrated that our method outperforms existing methods in terms of MAE by a factor of and provides highly detailed reconstructions even in challenging real-world settings.

In future research, a more automated balloon-like depth initialization is desirable. Exploring the theoretical foundations (uniqueness of a solution) of differential perspective uncalibrated PS under spherical harmonic lighting and analyzing the convergence properties of the proposed numerical scheme constitute two other promising perspectives.

Appendix A Further Details on Synthetic Experiments

To provide further insights on the synthetic experiments (in Section 6.1), we visualize the environment lighting , , used to render each image. Figure 9 shows all environment maps444All environment maps were downloaded from

Figure 9: All environment maps ( view) used throughout the synthetic evaluation.

The impact of each incident lighting , , is illustrated in Figure 10 showing the Joyful Yell with a White () albedo. Thus, color changes in the images are caused by lighting only, as depicted in model (1) and (7) in the main paper.

Figure 10: Illustration of the input data. The Joyful Yell dataset with White albedo to show the impact of the different environment maps used throughout the synthetic experimental validation.

Table 2 shows the mean angular error (MAE) of each dataset on the state-of-the-art approaches [Favaro2012, Mo2018, Peng2017] and our proposed methodology. It can be seen that our approach consistently overcomes [Favaro2012, Mo2018, Peng2017] by a factor of . Only the Pattern albedo seems to bias the resulting depth negatively, yet even in this case our approach estimates the geometry more faithfully than the current state-of-the-art.

Dataset [Favaro2012] [Peng2017] [Mo2018] Ours
Shape Albedo


Bars 26.22 27.84 36.91 16.78
Constant 25.84 26.64 36.87 13.97
Ebsd 25.34 26.88 27.80 14.26
Hippie 28.21 27.30 25.82 14.52
Lena 27.07 27.33 28.36 14.78
Pattern 45.87 26.82 24.01 19.06
Rectcircle 26.97 26.71 36.23 14.06
Voronoi 25.62 26.91 50.70 14.07
White 26.19 26.64 52.04 14.13

Joyful Yell

Bars 21.84 16.26 31.80 8.69
Constant 23.95 14.93 33.47 5.96
Ebsd 26.08 15.63 15.91 7.28
Hippie 28.67 16.23 22.96 7.49
Lena 21.33 16.33 19.70 9.21
Pattern 26.07 18.76 26.67 16.97
Rectcircle 35.27 15.19 52.41 7.34
Voronoi 22.27 16.42 45.74 6.57
White 27.12 14.32 33.06 6.20


Bars 49.13 21.90 36.51 8.16
Constant 54.98 19.89 36.57 8.71
Ebsd 62.33 20.81 23.56 9.61
Hippie 58.61 21.29 32.38 7.87
Lena 64.01 22.24 30.93 9.56
Pattern 48.83 22.25 32.68 17.78
Rectcircle 24.68 20.99 43.13 8.98
Voronoi 61.53 22.10 48.14 7.59
White 64.43 19.33 44.76 8.76

Thai Statue

Bars 25.53 21.91 66.17 8.55
Constant 27.20 18.91 38.47 9.58
Ebsd 27.85 20.22 34.11 9.47
Hippie 21.91 21.86 30.62 8.83
Lena 33.53 19.66 34.00 9.19
Pattern 26.77 22.06 28.81 15.27
Rectcircle 29.36 19.92 43.86 8.84
Voronoi 30.65 21.56 36.58 8.69
White 28.02 18.64 37.31 9.16
Median 27.16 21.14 34.06 9.17
Mean 34.15 21.18 35.53 10.72
Table 2: Quantitative comparison between our method and other state-of-the-art methods on multiple challenging synthetic datasets.

Two more qualitative results on synthetic data are shown in Figure 11. While [Favaro2012] gives more meaningful results on Armadillo with Constant albedo, depth deteriorates strongly on Lucy with Hippie albedo. Methods of [Mo2018, Peng2017] both result in rather flattened shapes (cf. Lucy). Most accurate results are achieved using the proposed method where fine geometric details, as well as non flattened depth estimates are shown.

[Favaro2012] [Peng2017] [Mo2018] Ours GT

Armadillo & Constant


Lucy & Hippie

Figure 11: Results of state-of-the-art approaches and our approach on two out of the 36 synthetic datasets. Numbers show the mean angular error (MAE) in degrees.

Appendix B Further Details on Real-World Results

Supplementary to the real-world experiments (in Section 6.2), Figure 12 shows alternative viewpoints of the real-world results. The estimated albedos, which are mapped onto the surfaces, appear satisfactory.

Face 1 Face 2 Rucksack Backpack
Ovenmitt Shirt Tabletcase Vase
Figure 12: Estimated results (with albedo mapped onto the surface) from real-world data under alternative viewpoints.