3D Face Reconstruction Using Color Photometric Stereo with Uncalibrated Near Point Lights

by   Zhang Chen, et al.

We present a new color photometric stereo (CPS) method that can recover high quality, detailed 3D face geometry in a single shot. Our system uses three uncalibrated near point lights of different colors and a single camera. We first utilize 3D morphable model (3DMM) and semantic segmentation of facial parts to achieve robust self-calibration of light sources. We then address the spectral ambiguity problem by incorporating albedo consensus, albedo similarity, and proxy prior into a unified framework. We avoid the need for spatial constancy of albedo and use a new measure for albedo similarity that is based on the albedo norm profile. Experiments show that our new approach produces state-of-the-art results in single image with high-fidelity geometry that includes details such as wrinkles.



There are no comments yet.


page 5

page 6

page 7

page 8


Lightweight Photometric Stereo for Facial Details Recovery

Recently, 3D face reconstruction from a single image has achieved great ...

Sparse Photometric 3D Face Reconstruction Guided by Morphable Models

We present a novel 3D face reconstruction technique that leverages spars...

Self-supervised High-fidelity and Re-renderable 3D Facial Reconstruction from a Single Image

Reconstructing high-fidelity 3D facial texture from a single image is a ...

A Self-Supervised Bootstrap Method for Single-Image 3D Face Reconstruction

State-of-the-art methods for 3D reconstruction of faces from a single im...

Facial Attribute Transformers for Precise and Robust Makeup Transfer

In this paper, we address the problem of makeup transfer, which aims at ...

Face Shape and Reflectance Acquisition using a Multispectral Light Stage

In this thesis, we discuss the design and calibration (geometric and rad...

ResDepth: Learned Residual Stereo Reconstruction

We propose an embarrassingly simple, but very effective scheme for high-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

State-of-the-art photometric stereo solutions for 3D face reconstruction [3, 2, 1, 37] are capable of producing movie-quality, photo-realistic results. However, these systems tend to be bulky and expensive and generally require taking multiple shots. Even with elaborate time-multiplexing, it is difficult to capture fine facial geometry movements unless using an ultra-fast speed camera coupled with high precision synchronized light sources. The light sources and cameras also require accurate calibration to avoid distortions in the final reconstruction.

In this paper, we present a novel one-shot solution based on uncalibrated color photometric stereo method that simply uses a camera and three near point light sources of different color. Our approach eliminates the need of time multiplexing, and therefore can be used to recover dynamic facial motions.

For objects with non-gray albedo, color photometric stereo is inherently under-determined due to spectral inconsistencies of surface reflectance: albedo is not identical under different spectra and therefore there are more unknown variables than there are equations. We address the spectral ambiguity problem by incorporating albedo consensus, albedo similarity, and proxy prior into a unified framework, without enforcing spatial constancy of albedo. We also present a new measure for albedo similarity based on albedo norm profile. The proposed albedo similarity and proxy prior effectively correct distortions caused by incorrect albedo consensus in prior art. Experiments show that our new approach can produce state-of-the-art results in one-shot with high-fidelity geometry that includes details such as wrinkles.

Our technical contributions are:

  • A self-calibration method utilizing 3DMM proxy face for color photometric stereo with near point lights.

  • A per-pixel formulation for solving normal and albedo from color photometric stereo.

  • A framework that incorporates albedo similarity and proxy prior with albedo consensus to produce accurate 3D reconstruction.

2 Related Work

Structured light [61, 62] and multi-view stereo [16] have been used to reconstruct faces. While they can accurately reconstruct coarse shapes, they are less successful in recovering high frequency details such as wrinkles. On the other hand, photometric stereo [56] is capable of extracting such high frequency details. Techniques that combine stereo and photometric stereo exist [37, 19, 20], but it is at the expense of a complicated hardware setup.

Photometric Stereo (PS).

Traditional PS [56] uses 3 or more distant lights (with the same color) and sequentially creates different directional illumination by turning on only one light at a time. A sequence of images are captured, each with a different light source. The surface orientation map can then be inferred from image intensities via an over-determined linear system. Normal integration is then applied to obtain a 2.5D reconstruction. We refer readers to [22, 4] for a complete review of classical PS methods. The distant light requirement has since been relaxed; much work has been done using more practical near point light sources [12, 24, 48, 38, 55, 5, 59, 35, 33, 44, 34]. Notably, Liu et al. [34] use an LED ring with a radius of only 30 centered at camera lens. Alternative self-calibrating methods [6, 50, 58, 36, 40] provide simpler and more flexible solutions under various assumptions [51]. It is also possible to use uncalibrated near point light sources [31, 41, 10] but they all require sequential capture.

Color Photometric Stereo (CPS).

CPS has the key benefit of acquiring only one image and hence can be directly used to reconstruct dynamic objects. Most existing approaches use red, green, and blue lights along with a color camera [15, 30, 57]. Hernández et al. [23] apply the technique to dynamic cloth reconstruction, where they use a planar board with cloth sample fixed in the center to calibrate the coupled matrix containing reflectance, camera response, lighting spectrum, and lighting directions. Vogiatzis and Hernández [54] first construct a coarse 3D face using structure from motion and then impose constant chromaticity constraint for shape refinement. Klaudiny et al. [29]

use a specular sphere to estimate lighting directions. To ensure constant chromaticity, they apply uniform make-up to faces. Bringier

et al. [8] explicitly calibrate the spectral response of camera and assume gray color or known uniform color.

To eliminate the need of constant chromaticity, there are methods [14, 28] that combine spectral and time-multiplexing; optical flow is then used to align adjacent frames. Jankó et al. [26] make use of temporal constancy of surface reflectance to eliminte the need of time-multiplexing but still require the use of an image sequence. Gotardo et al. [20] simultaneously solve for color photometric stereo, optical flow, and stereo matching within each 3-frame time window but requires using 9 color lights. Rahman et al. [45] arrange complementary color lights on a ring but their approach requires using two images under complementary illuminations as input. Anderson et al. [7] assume piecewise constant chromoticity by segmenting a scene into different chromaticities. To calibrate chromaticities, they also require a stereo camera pair to obtain coarse geometry.

Fyffe et al. [18] extended the usual three color channels to six by using two RGB cameras and a pair of Dolby dichroic filters. An extension of their work [17] employed polarized color gradient illumination but requires an ultra complex setup with 2040 LEDs light sources. Chakrabarti and Sunkavalli [11] observe that the reflectance and normal within a uniform color region can be uniquely recovered from spectrally demultiplexed image by assuming piecewise constant albedo. Ozawa et al. [39] densely discretize albedo chromaticity and utilize consensus on albedo norms to reconstruct objects with spatially varying albedo. Most of these approaches assume directional lighting and require pre-calibrating them. It is possible to use near light sources [13] but still require pre-calibration. In contrast, our technique assumes unknown light positions and spatially varying albedo. The former enables more feasible capture and the latter fulfils the physical property of real faces.

Single Image Techniques.

There are methods for inferring face geometry from a single unconstrained image. We refer readers to [63] for an overview of state-of-the-art methods. However, their accuracy is incomparable to multi-view stereo and photometric stereo. Piotraschke and Blanz [43] demonstrate the usefulness of semantic segmentation to improve reconstruction quality. In our work, we use the 3D morphable model [42] to obtain an initial proxy face for light source calibration.

Shape-from-shading and deep learning based approaches have also been adopted to recover details

[9, 46, 49, 47, 53, 21, 32]. Jiang et al. [27] combined local corrective deformation fields with photometric consistency constraints. Yamaguchi et al. [60] use a large corpus of high-fidelity face capture from the USC Light Stage [19] to learn the mapping from texture to highly-detailed displacement map. These solutions can provide visually pleasing results but accuracy depends heavily on illumination.

3 Color Photometric Stereo with Near Point Lights

Traditional color photometric stereo uses three distant lights with different lighting directions and spectrum (usually red, green and blue) together with an RGB camera to spectrally multiplex different illumination in a single image. By assuming distant lights, each surface point is illuminated by three directional lighting with direction and spectral distribution , where and is the wavelength. We denote the normal and reflectance function at any pixel as and , respectively. Let with be the spectral response of each camera color channel; for a Lambertian surface, the image pixel intensity can be expressed as


We denote as the albedo matrix whose element at th row and th column is


Each element of represents the albedo under one light-channel pair. Letting and , we can rewrite Eq. 1 in matrix form as


Note that for distant lights, is identical for all pixels. As a result, with initial coarse normal , one can self-calibrate the product of and by assuming constant albedo or constant chromoticity [54]. However, for near point lights, lighting direction is spatially-varying. By further taking into account the inverse square illumination attenuation due to distance, we obtain


where is the 3D position of th light source and is the corresponding 3D position for pixel at .

Figure 1: Self-calibration of near point light positions using a proxy face. (a) Parameters involved in estimating . (b) Regions (gray) on the face used for RANSAC-based sampling of pixels.

4 Near Point Light Self-Calibration

In order to self-calibrate the light source positions, we require a coarse proxy mesh, from which we obtain initial rough estimates for normal and position at every pixel . Unlike other methods that use multi-view stereo [54] or stereo matching [7] to obtain the proxy mesh, our approach makes use of the 3D morphable model (3DMM) [42] and needs only one image as input. To compensate for inaccuracies in the proxy mesh, we use RANSAC followed by hypothesis merging to robustly estimate light source positions. We provide details of our method in the following two sections.

4.1 Proxy Mesh Generation

3DMM is a deformable template for the mesh of a human face. It consists of Principal Component Analysis (PCA) linear basis along three dimensions: shape, expression, and albedo. Since we are concerned with only shape and expression associated with the proxy mesh, we omit the albedo dimension. 3DMM interprets the face mesh

as a linear combination of shape and expression bases:


where are PCA means and are th PCA bases of shape and expression space, respectively. is the number of mesh vertices, and are th coefficients for the linear combination of the bases. We adopt the Basel Face Model [42] for 3DMM, and use the iterative linear method from [25] to jointly solve for PCA coefficients and camera parameters (intrinsics and extrinsics). We then rasterize the generated proxy mesh to recover the initial normal and 3D position for each pixel. While the proxy mesh resembles a human face with a reasonable pose, its geometry is usually inaccurate.

4.2 Estimation of Light Source Positions

As with [11, 39], we assume that there is a bijection between light source and camera channel, i.e., the spectrum of each light source can only be observed in its corresponding camera channel. As a result, the albedo matrix is diagonal; for simplicity, let . Eq. 3 becomes


where is the Hadamard product operator. For two pixels with equal albedo in the th channel, i.e., , we have


where is the th row of . Note that is equivalent to the lighting direction of th light source. Substituting Eq. 4 into Eq. 7 and moving all variables to the left hand side, we obtain the following equality:


Once are extracted, we now recover , which has unknowns. We require at least constraints, which means a minimum of pixels with equal albedo in the th channel. Since there is no correlation between different lights or channels in Eq. 8

, we can estimate the positions of different lights independently. However, since the albedo is unknown, we cannot deterministically locate pixels with equal albedo. Our solution is to employ RANSAC to randomly sample quadruplets of pixels. Since we only require each sampled quadruplet to have equal albedo in one channel, there is still a high probability that at least one sampling provides a qualified quadruplet.

Notice that in Eq. 8, the numerators have a higher order of distance between light source and surface point than those in the denominators. This biases the solution towards closer light positions. We instead use an unbiased form of Eq. 8 to measure the residual between two pixels :


For each quadruplet , a hypothesis of the light position is computed by solving


which is a squared sum of residuals between each pair of pixels in a quadruplet. We use the Levenberg-Marquardt algorithm to solve the nonlinear optimization.

In voting for a hypothesis, a pixel is considered an inlier if the squared sum of residuals between it and the pixels in satisfies


where is a threshold and set as in our experiments.

Instead of using all pixels for sampling and voting, we use only pixels on left cheek, right cheek, and forehead. This is to avoid potential highly non-Lambertian regions such as facial hair and shadow. The segmentation of these regions only need to done once on a 3DMM mean face, which can then be projected to different face images [10].

Unlike standard RANSAC which chooses the hypothesis with most number of inliers as the final estimate, we perform an additional filtering and merging process on all the hypotheses. The reason is that the 3DMM-based proxy mesh is inaccurate even as low-frequency geometry. As a result, the initial normals deviate from true normals at most pixels, making consensus less concentrated and also drifting away from the correct hypothesis. Therefore, we take a set of hypotheses into account to produce a more robust estimate.

In the filtering step, we determine a plausible region for hypotheses and ignore all hypotheses outside this region. We first use the four-point algorithm in [54] to produce the calibration matrix, which is the product of dominant albedo and directional lighting directions. We then factor out the dominant albedo and extract lighting direction for each light by simply normalizing each row of the calibration matrix. Hypothesis (for the th light source position) is dropped if it does not satisfy


where is the mean 3D position of all pixels.

Eq. 12 forms a cone region with half-angle around ; all hypotheses outside this region are ignored. We use in our experiments. Subsequently, we merge remaining hypotheses with weighted linear combination to obtain final estimate for a light source position:


where is the number of inliers for hypothesis .

5 Face Reconstruction

Once the light source positions have been determined, we then estimate per-pixel photometric normal. With the albedo unknown, this problem is pixel-wise underdetermined (from Eq. 6). This is because there are degrees of freedom ( for albedo and for normal) but only constraints. It has been shown [11, 39] that three pixels with equal albedo and linearly independent normals can uniquely determine the albedo and normals at these pixels. To exploit this property, Chakrabarti and Sunkavalli [11] model albedo as piece-wise constant and use a polynomial model for surface depth. However, their method tends to produce overly-smoothed results. On the other hand, Ozawa et al. [39]

develop an iterative voting scheme based on albedo norms to simultaneously classify pixels into different albedos and compute their normals. Since their method assumes no spatial constancy on albedo, high-frequency details can be recovered. However, their method suffers from incorrect consensus, which leads to errors.

By comparison, our method has a pixel-wise formulation which incorporates albedo consensus, albedo similarity between pixels as well as proxy mesh for high-quality reconstruction. From Eq. 6, we can decompose albedo into albedo chromaticity and albedo norm :


where is the Hadamard division operator. Consequently, we only need to solve for albedo chromaticity because albedo norm and normal can then be trivially computed.

To make the problem more tractable, as with [11, 39], we discretize albedo chromaticity in the space of positive unit sphere into candidates . Then, for each pixel , we solve for its albedo chromaticity using


where is the albedo consensus term, the albedo similarity term, and the proxy prior term. and modulate the influence of similarity term and proxy term at different pixels. After solving for albedo chromaticity at each pixel, we can then compute the normal and use Poisson integration to obtain geometry. Compared with proxy mesh, our final reconstruction is more accurate for both macro- (shape, expression) and micro- (wrinkles, etc.) geometries.

5.1 Albedo Consensus

Figure 2: Effect of consensus term, illustrated on a face with ground truth. (a) Albedo distribution. (b) Two pixels that contribute to a consensus. The close-ups show the magnitude of negative consensus term in chromaticity space at the two pixels. The skin pixel is accurately estimated while the lip pixel is not. (c,d) Distribution of ground truth pixels that form consensus with the two pixels. (e) Normal error map for using only consensus term.

Albedo consensus measures the number of pixels that have similar albedo norm under an albedo chromaticity candidate [39]. Pixels with the same albedo must have the same albedo norm under correct albedo chromaticity; they may also have the same albedo norms under incorrect albedo chromaticities. In the extreme case where all the pixels share the same albedo, the correct albedo chromaticity can be estimated by finding the one that produces the strongest consensus on the albedo norm.

To compute the consensus term, for each albedo chromaticity candidate , we find the corresponding albedo norms of all pixels and build a histogram for it with the bin width being [39]. Let be the th bin under , its cardinality, and the index for the bin that contains the albedo norm of pixel under . We define


where is the total number of pixels.

For a multi-colored surface, however, consensus may lead to incorrect estimation at some pixels. This is because a pixel can be interpreted by any albedo chromaticity and corresponding albedo norm. There may exist situations where, under consensus albedo chromaticity, a pixel with a different albedo has a similar albedo norm with consensus. For human faces, the albedo distribution tends to spread out instead of being of a single albedo, as shown in Fig. 2a. Consensus usually arrives at a reasonable estimation for major clusters because the number of inliers tend to be large, which improves robustness. On the other hand, for minor clusters, consensus tends to provide unreliable estimation as shown in Fig. 2b-d, where the lip pixels can be better interpreted by an incorrect albedo chromaticity. We propose using albedo similarity and proxy prior to handle this problem.

5.2 Albedo Similarity

Directly inferring albedo similarity from image intensity is error-prone, since the difference in image intensity can be caused by either albedo or shading or both. Instead, the albedo norms of a pixel under all albedo chromaticities form an albedo norm profile. We reason that if two pixels have similar albedo norm profile, then they are likely to have similar albedos. From Eq. 14, letting (where is the th column of ) and , we have


The albedo norm profile of a pixel is controlled by . Hence, we measure the similarity between two pixels as


where is the Frobenius norm. The similarity term is computed as


which is the mean similarity between a pixel and its same-bin pixels under the th albedo chromaticity candidate.

We further multiply a per-pixel weight to the similarity term to suppress its effect at pixels where the similarity term is large for all albedo chromaticity candidates. More specifically, we compute the weight as


5.3 Proxy Prior

The proxy albedo chromaticity map can be computed from the proxy mesh using Eq. 6 and is used to penalize implausible estimations produced by the consensus term. The proxy term is expressed as


where is the proxy albedo chromaticity at pixel . We apply this term only to pixels where the consensus term gives estimations largely deviated from proxy albedo chromaticity. Otherwise, it will bias reconstruction towards the proxy mesh. We multiply this term with a per-pixel weight:


where is the estimated albedo chromaticity at pixel using the consensus term alone.

Figure 3: Effect of changing light source distances. (a) Rendered images under the first 8 light source distances (distance increases from left to right and from top to bottom). Comparisons on (b) self-calibration and (c) reconstruction error at different light source distances, including against VH12 [54].
Figure 4: Comparison of reconstructed geometry with VH12 [54] at different light source distances. The colored numbers are the mean normal errors.

6 Experimental Results

In this section, we first report results on synthetic face images generated using a high-quality face dataset and synthetic lighting. We then show results for real data captured using our setup. To self-calibrate each light, we use 2,000 iterations for RANSAC. The reconstruction parameters are set as follows: . We discretize albedo chromaticity in spherical coordinates as .

Figure 5: Comparison with competing techniques (VH12 [54], CK16 [11], OS18 [39]

) using data from the ICT-3DRFE dataset. GT is ground truth. The mean normal and geometry errors are listed in the odd and even rows, respectively. More results can be found in the supplementary file.

Figure 6: Error statistics on ICT-3DRFE dataset. (a) Self-calibration errors. (b) Reconstruction errors of VH12 [54], CK16 [11], OS18 [39], our “Consensus + Similarity” and our “Consensus + Similarity + Proxy”.

6.1 Synthetic Data

To evaluate our method objectively, we apply it to synthetic input images with known ground truth. The synthetic images are generated by rendering high-quality face data from the USC Light Stage [37] under near point lighting at different distances and orthographic projection. The synthetic image resolution is .

The synthetic lights are distributed with equal azimuth angles between neighboring lights, and at the same elevation angle of . The distance between each light and the face center is identical. The distance is specified in terms of the vertical span of the face and it ranges from 0.5 to 10, with an increment of 0.5. The rendered images for the first 8 distances are shown in Fig. 3a. During rendering, we retain self-shadow while ignoring other shadowing effects on the background. We also prevent saturation by scaling each image so that the maximum pixel intensity is 255.

Fig. 3b compares the calibration errors for vanilla RANSAC and our method. We compute the relative position error as Euclidean position error normalized by light source distance. We can see that vanilla RANSAC is less accurate with large fluctuations in error over distance. By comparison, our calibration results are more accurate and robust to changing light source distance, with the relative position error mostly around 0.1 under most distances.

We also compare the reconstruction accuracy of our method using our estimated light positions with [54] in Fig. 3c. [54] assumes directional lighting with single albedo chromaticity, and uses the same proxy face for self-calibration. We can see that our method consistently performs better, even at distance 10 (where lighting is almost directional). There is considerable shape deformation for [54] across the different distances, while our method produces reasonable shapes starting from distance 1.5. At very close distances such as 0.5 and 1, both methods do not perform well due to significant self-shadowing.

Fig. 3c also shows comparisons with using ground truth light positions and mean albedo chromaticity (which are the conditions that should result in the best accuracy under the single chromaticity assumption). In this case, our method using ground truth light positions out-performs the others by a significant margin under almost all distances; this shows the importance of varying albedo chromaticity. The degraded accuracy at distance 0.5 is due to self-shadowing.

We further evaluate our method using the ICT-3DRFE dataset [52, 37], which contains highly-detailed albedo and geometry for 23 subjects (22 with 15 expressions each, and one with 12 expressions, with a total of 342 face inputs). The dataset has vastly different skin albedo as well as face geometry. We rendered images at light source distance 2.0 and added Gaussian noise () to simulate real images. Results are shown in Fig. 6.

As shown in Fig. 6a, our self-calibration method greatly improves over vanilla RANSAC. In Fig. 6b, we compare the accuracy of our face reconstruction method with those of [54, 11, 39]. Since [11] requires directional lighting directions as input, we compute approximated lighting directions as the rays from face center to ground truth light positions. [39] originally assumes directional lighting, but is adapted by us to work for near point lighting.

For our method, we show results for two variants (“Consensus + Similarity” and “Consensus + Similarity + Proxy”) to analyze the influence of each term. For [39] and the two variations of our method, we use the light positions estimated in our self-calibration. We compute geometry error as the depth error of integrated geometry normalized by depth span of ground truth geometry. Methods using the near point light model outperform those using directional light model in terms of normal error. Each of our proposed term improves over using consensus only.

Figure 7: Reconstruction results of VH12 [54], CK16 [11], OS18 [39] and our method on real data. More examples are in the supplementary file.
Figure 8: Reconstruction results for a video clip of a face with changing expression. Each frame is processed independently.

Although [11] handles multi-chromaticity, it performs worse than [54]. A possible reason is its polynomial model for depth is not suitable for complex geometry. While [54] has lower geometry error than using consensus only, our full method improves over this metric and yields the best accuracy. Fig. 5 shows 2 examples of comparison. Note that our method works reasonably well in the lip and eyebrow regions. These regions are challenging because they contain non-dominant albedos, which causes incorrect consensus. Please see the supplementary material for more results.

6.2 Real Data

To collect real data, we built a color photometric capture system (see the supplementary file for a photo of the system). It consists of 3 LED (red, green, blue) near point lights and a PointGrey Flea3 FL3-U3-88S2C color camera (4096 2160). The distance between the light sources and subject is roughly 70cm. We mounted orthogonal linear polarizers in front of the light sources and camera to reduce specular reflection. The final mesh consists of about 3,000,000 vertices. The whole process takes about 12 minutes on MATLAB, with self-calibration taking 8 minutes and face reconstruction taking 4 minutes.

We captured faces of different people and expressions; Fig. 7 shows results for two examples (one of a woman, the other a man with an exaggerated expression). Results from competing techniques (VH12 [54], CK16 [11], and OS18 [39]) feature local and global geometric distortion as well as over-smoothing. These results also have issues at the lips, and this is because the albedo at the lips differ from those at the rest of the face. Please refer to the supplementary material for more results. We have also captured a video clip of a face with changing expressions and reconstructed each frame independently. Fig. 8 shows results for 5 representative frames. The mouth interior was not reconstructed well due to significant self-shadowing.

7 Concluding Remarks

We have presented a novel color photometric stereo method with only 3 uncalibrated near point lights. Our method is capable of reconstructing high-quality face geometry from a single image. Self-calibration of the near point lights relies on the geometric prior from the 3DMM proxy face. We apply RANSAC, followed by hypothesis merging to robustly estimate light positions. We also propose a per-pixel formulation that incorporates albedo consensus, albedo similarity, and proxy prior to handle the ill-posedness of color photometric stereo. Synthetic and real experiments show that our method outperform previous CPS methods that similarly use a single image as input.

In our work, we did not exploit the albedo prior of human faces, which may further improve the accuracy of self-calibration and face reconstruction. Another possible future work would be extending our method to general objects by harnessing information from depth sensors.


  • [1] https://www.eisko.com/.
  • [2] http://www.3dmd.com/.
  • [3] http://www.di4d.com/.
  • [4] J. Ackermann, M. Goesele, et al. A survey of photometric stereo techniques. Foundations and Trends® in Computer Graphics and Vision, 9(3-4):149–254, 2015.
  • [5] J. Ahmad, J. Sun, L. Smith, and M. Smith.

    An improved photometric stereo through distance estimation and light vector optimization from diffused maxima region.

    Pattern Recognition Letters, 50:15–22, 2014.
  • [6] N. G. Alldrin, S. P. Mallick, and D. J. Kriegman. Resolving the generalized bas-relief ambiguity by entropy minimization. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pages 1–7. IEEE, 2007.
  • [7] R. Anderson, B. Stenger, and R. Cipolla. Color photometric stereo for multicolored surfaces. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2182–2189. IEEE, 2011.
  • [8] B. Bringier, D. Helbert, and M. Khoudeir. Photometric reconstruction of a dynamic textured surface from just one color image acquisition. JOSA A, 25(3):566–574, 2008.
  • [9] C. Cao, D. Bradley, K. Zhou, and T. Beeler. Real-time high-fidelity facial performance capture. ACM Transactions on Graphics (ToG), 34(4):46, 2015.
  • [10] X. Cao, Z. Chen, A. Chen, X. Chen, S. Li, and J. Yu. Sparse photometric 3d face reconstruction guided by morphable models. In CVPR, 2018.
  • [11] A. Chakrabarti and K. Sunkavalli. Single-image rgb photometric stereo with spatially-varying albedo. In 3DV, 2016, pages 258–266. IEEE, 2016.
  • [12] J. J. Clark. Active photometric stereo. In Computer Vision and Pattern Recognition, 1992. Proceedings CVPR’92., 1992 IEEE Computer Society Conference on, pages 29–34. IEEE, 1992.
  • [13] T. Collins and A. Bartoli. 3d reconstruction in laparoscopy with close-range photometric stereo. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 634–642. Springer, 2012.
  • [14] B. De Decker, J. Kautz, T. Mertens, and P. Bekaert. Capturing multiple illumination conditions using time and color multiplexing. IEEE, 2009.
  • [15] M. S. Drew and L. L. Kontsevich. Closed-form attitude determination under spectrally varying illumination. In CVPR, 1994.
  • [16] Y. Furukawa and J. Ponce. Dense 3d motion capture for human faces. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1674–1681. IEEE, 2009.
  • [17] G. Fyffe and P. Debevec. Single-shot reflectance measurement from polarized color gradient illumination. In Computational Photography (ICCP), 2015 IEEE International Conference on, pages 1–10. IEEE, 2015.
  • [18] G. Fyffe, X. Yu, and P. Debevec. Single-shot photometric stereo by spectral multiplexing. In Computational Photography (ICCP), 2011 IEEE International Conference on, pages 1–6. IEEE, 2011.
  • [19] A. Ghosh, G. Fyffe, B. Tunwattanapong, J. Busch, X. Yu, and P. Debevec. Multiview face capture using polarized spherical gradient illumination. In ACM Transactions on Graphics (TOG), volume 30, page 129. ACM, 2011.
  • [20] P. F. Gotardo, T. Simon, Y. Sheikh, and I. Matthews. Photogeometric scene flow for high-detail dynamic 3d reconstruction. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 846–854. IEEE, 2015.
  • [21] Y. Guo, J. Zhang, J. Cai, B. Jiang, and J. Zheng. Cnn-based real-time dense face reconstruction with inverse-rendered photo-realistic face images. IEEE transactions on pattern analysis and machine intelligence, 2018.
  • [22] S. Herbort and C. Wöhler. An introduction to image-based 3d surface reconstruction and a survey of photometric stereo methods. 3D Research, 2(3):4, 2011.
  • [23] C. Hernández, G. Vogiatzis, G. J. Brostow, B. Stenger, and R. Cipolla. Non-rigid photometric stereo with colored lights. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. IEEE, 2007.
  • [24] T. Higo, Y. Matsushita, N. Joshi, and K. Ikeuchi. A hand-held photometric stereo camera for 3-d modeling. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1234–1241. IEEE, 2009.
  • [25] P. Huber, G. Hu, J. R. Tena, P. Mortazavian, W. P. Koppen, W. J. Christmas, M. Rätsch, and J. Kittler. A multiresolution 3d morphable face model and fitting framework. In VISIGRAPP, 2016.
  • [26] Z. Jankó, A. Delaunoy, and E. Prados. Colour dynamic photometric stereo for textured surfaces. In Asian Conference on Computer Vision, pages 55–66. Springer, 2010.
  • [27] L. Jiang, J. Zhang, B. Deng, H. Li, and L. Liu. 3d face reconstruction with geometry details from a single image. IEEE Transactions on Image Processing, 27(10):4756–4770, 2018.
  • [28] H. Kim, B. Wilburn, and M. Ben-Ezra. Photometric stereo for dynamic surface orientations. In European Conference on Computer Vision, pages 59–72. Springer, 2010.
  • [29] M. Klaudiny, A. Hilton, and J. Edge. High-detail 3d capture of facial performance. In 3DPVT Conference, 2010.
  • [30] L. Kontsevich, A. Petrov, and I. Vergelskaya. Reconstruction of shape from shading in color images. JOSA A, 11(3):1047–1052, 1994.
  • [31] S. J. Koppal and S. G. Narasimhan. Novel depth cues from uncalibrated near-field lighting. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. IEEE, 2007.
  • [32] Y. Li, L. Ma, H. Fan, and K. Mitchell. Feature-preserving detailed 3d face reconstruction from a single image. In Proceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production, page 1. ACM, 2018.
  • [33] J. Liao, B. Buchholz, J.-M. Thiery, P. Bauszat, and E. Eisemann. Indoor scene reconstruction using near-light photometric stereo. IEEE Trans. Image Processing, 26(3):1089–1101, 2017.
  • [34] C. Liu, S. G. Narasimhan, and A. W. Dubrawski. Near-light photometric stereo using circularly placed point light sources. In 2018 IEEE International Conference on Computational Photography (ICCP), pages 1–10. IEEE, 2018.
  • [35] F. Logothetis, R. Mecca, and R. Cipolla. Semi-calibrated near field photometric stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, volume 3, page 8, 2017.
  • [36] F. Lu, Y. Matsushita, I. Sato, T. Okabe, and Y. Sato. Uncalibrated photometric stereo for unknown isotropic reflectances. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1490–1497, 2013.
  • [37] W.-C. Ma, T. Hawkins, P. Peers, C.-F. Chabert, M. Weiss, and P. Debevec. Rapid acquisition of specular and diffuse normal maps from polarized spherical gradient illumination. In Proceedings of the 18th Eurographics conference on Rendering Techniques, pages 183–194. Eurographics Association, 2007.
  • [38] R. Mecca, A. Wetzler, A. M. Bruckstein, and R. Kimmel. Near field photometric stereo with point light sources. SIAM Journal on Imaging Sciences, 7(4):2732–2770, 2014.
  • [39] K. Ozawa, I. Sato, and M. Yamaguchi. Single color image photometric stereo for multi-colored surfaces. Computer Vision and Image Understanding, 2018.
  • [40] T. Papadhimitri and P. Favaro. A closed-form, consistent and robust solution to uncalibrated photometric stereo via local diffuse reflectance maxima. International journal of computer vision, 107(2):139–154, 2014.
  • [41] T. Papadhimitri and P. Favaro. Uncalibrated near-light photometric stereo. 2014.
  • [42] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter.

    A 3d face model for pose and illumination invariant face recognition.

    In Advanced video and signal based surveillance, 2009. AVSS’09. Sixth IEEE International Conference on, pages 296–301. Ieee, 2009.
  • [43] M. Piotraschke and V. Blanz. Automated 3d face reconstruction from multiple images using quality measures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3418–3427, 2016.
  • [44] Y. Quéau, B. Durix, T. Wu, D. Cremers, F. Lauze, and J.-D. Durou. Led-based photometric stereo: modeling, calibration and numerical solution. Journal of Mathematical Imaging and Vision, 60(3):313–340, 2018.
  • [45] S. Rahman, A. Lam, I. Sato, and A. Robles-Kelly. Color photometric stereo using a rainbow light for non-lambertian multicolored surfaces. In Asian Conference on Computer Vision, pages 335–350. Springer, 2014.
  • [46] E. Richardson, M. Sela, and R. Kimmel. 3d face reconstruction by learning from synthetic data. In 3DV, 2016, pages 460–469. IEEE, 2016.
  • [47] E. Richardson, M. Sela, R. Or-El, and R. Kimmel. Learning detailed face reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1259–1268, 2017.
  • [48] F. Sakaue and J. Sato. A new approach of photometric stereo from linear image representation under close lighting. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 759–766. IEEE, 2011.
  • [49] M. Sela, E. Richardson, and R. Kimmel.

    Unrestricted facial geometry reconstruction using image-to-image translation.

    In Proceedings of the IEEE International Conference on Computer Vision, pages 1576–1585, 2017.
  • [50] B. Shi, Y. Matsushita, Y. Wei, C. Xu, and P. Tan. Self-calibrating photometric stereo. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 1118–1125. IEEE, 2010.
  • [51] B. Shi, Z. Mo, Z. Wu, D. Duan, S. K. Yeung, and P. Tan. A benchmark dataset and evaluation for non-lambertian and uncalibrated photometric stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2018.
  • [52] G. Stratou, A. Ghosh, P. Debevec, and L.-P. Morency. Effect of illumination on automatic expression recognition: a novel 3d relightable facial database. In Face and Gesture 2011, pages 611–618. IEEE, 2011.
  • [53] A. Tuấn Trần, T. Hassner, I. Masi, E. Paz, Y. Nirkin, and G. Medioni. Extreme 3d face reconstruction: Seeing through occlusions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3935–3944, 2018.
  • [54] G. Vogiatzis and C. Hernández. Self-calibrated, multi-spectral photometric stereo for 3d face capture. International Journal of Computer Vision, 97(1):91–103, 2012.
  • [55] A. Wetzler, R. Kimmel, A. M. Bruckstein, and R. Mecca. Close-range photometric stereo with point light sources. In 3DV, 2014, volume 1, pages 115–122. IEEE, 2014.
  • [56] R. J. Woodham. Photometric method for determining surface orientation from multiple images. Optical engineering, 19(1):191139, 1980.
  • [57] R. J. Woodham. Gradient and curvature from the photometric-stereo method, including local confidence estimation. JOSA A, 11(11):3050–3068, 1994.
  • [58] Z. Wu and P. Tan. Calibrating photometric stereo by holistic reflectance symmetry analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1498–1505, 2013.
  • [59] W. Xie, C. Dai, and C. C. Wang. Photometric stereo with near point lighting: A solution by mesh deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4585–4593, 2015.
  • [60] S. Yamaguchi, S. Saito, K. Nagano, Y. Zhao, W. Chen, K. Olszewski, S. Morishima, and H. Li. High-fidelity facial reflectance and geometry inference from an unconstrained image. ACM Transactions on Graphics (TOG), 37(4):162, 2018.
  • [61] L. Zhang, N. Snavely, B. Curless, and S. M. Seitz. Spacetime faces: High-resolution capture for~ modeling and animation. In Data-Driven 3D Facial Animation, pages 248–276. Springer, 2008.
  • [62] S. Zhang and P. S. Huang. High-resolution, real-time three-dimensional shape measurement. Optical Engineering, 45(12):123601, 2006.
  • [63] M. Zollhöfer, J. Thies, P. Garrido, D. Bradley, T. Beeler, P. Pérez, M. Stamminger, M. Nießner, and C. Theobalt. State of the art on monocular 3d face reconstruction, tracking, and applications. In Computer Graphics Forum, volume 37, pages 523–550. Wiley Online Library, 2018.