SPLINE-Net: Sparse Photometric Stereo through Lighting Interpolation and Normal Estimation Networks

05/10/2019 ∙ by Qian Zheng, et al. ∙ Microsoft Nanyang Technological University Peking University 5

This paper solves the Sparse Photometric stereo through Lighting Interpolation and Normal Estimation using a generative Network (SPLINE-Net). SPLINE-Net contains a lighting interpolation network to generate dense lighting observations given a sparse set of lights as inputs followed by a normal estimation network to estimate surface normals. Both networks are jointly constrained by the proposed symmetric and asymmetric loss functions to enforce isotropic constrain and perform outlier rejection of global illumination effects. SPLINE-Net is verified to outperform existing methods for photometric stereo of general BRDFs by using only ten images of different lights instead of using nearly one hundred images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The problem of photometric stereo  [33] inversely solves for the radiometric image formation model to recover surface normals from different appearances of objects under various lighting conditions with a fixed camera view. The classic method [33] assumes an ideal Lambertian image formation model without global illumination effects (such as inter-reflection and shadows), which deviates from the realistic scenario and prevents photometric stereo from being able to handle real-world objects. To make photometric stereo practical, the major difficulties lie in dealing with objects of general reflectance and global illumination effects. These can be achieved by either exploring analytical Bidirectional Reflectance Distribution Function (BRDF) representations (e.g.[12]) and general BRDF properties (e.g.[28]) to model non-Lambertian interactions of lighting and surface normal or suppressing global effects by treating them as outliers (e.g.[35]

). Recently, deep learning based approaches are introduced to solve these difficulties by implicitly learning both the image formation process and global illumination effects from training data (

e.g., [7, 16]).

Figure 1: An illustration of observation maps corresponding to two surface normals (a brief introduction of observation maps can be found in Section 3.2 and recent work [16]). (a) Two surface normals and their observation maps with dense lights, (b) sparse observation maps with 10 order-agnostic lights, (c) dense observation maps generated by our SPLINE-Net given sparse observation maps in (b) as inputs, and (d) ground truth of dense observation maps with 1000 lights. We use to represent viewing direction and to represent surface normal in our experiments.

According to a comprehensive benchmark evaluation [26] (including quantitative results for representative methods published before 2016) and the additional results reported in most recent works [7, 16, 38], a moderately dense lighting distribution (e.g., around 100 directional lights randomly sampled from the visible hemisphere) is required to achieve reasonably good normal estimation for objects with general materials (e.g., angular error around for a shiny plastic). This is because multi-illumination observations under a dense set of lights are required to fit the parameters in analytic BRDF models [12], to analyze general BRDF properties [28], to observe sufficient inliers and outliers [35]

, and to ensure the convergence of training neural networks 

[7]. How to achieve high accuracy in normal estimation for objects given general BRDFs with a sparse set of lights (e.g., 10), which we call sparse photometric stereo in this paper, is still an open yet challenging problem [26].

In this paper, we propose to solve Sparse Photometric stereo through Lighting Interpolation and Normal Estimation Networks, namely SPLINE-Net. The SPLINE-Net is composed of two sub-networks: the Lighting Interpolation Network (LI-Net) to generate dense observations given a sparse set of input lights and the Normal Estimation Network (NE-Net) to estimate surface normal from the generated dense observations. LI-Net takes advantage of a learnable representation for dense lighting called observation map [16], and we propose to deal with sparse observation maps as damaged paintings and generate dense observation through inpainting (as shown in Figure 1111Note that holes in ground truth of observation maps are produced by the discrete projection from a limited number of lights to a grid with limited resolution (e.g., from 1000 lighting directions to observation maps with size in this figure).). NE-Net then follows LI-Net to infer surface normal guided by dense observation maps. To accurately guide the lighting interpolation and normal estimation specially under the photometric stereo context, we propose a symmetric loss and an asymmetric loss to explicitly consider general BRDF properties and outlier rejections. More specifically, the symmetric loss is derived according to the property of isotropy for general reflectance, which constrains pixel values on a generated observation map to be symmetrically distributed w.r.t. an axis determined by the corresponding surface normal. The asymmetric loss is derived from contaminated observation maps with global illumination effects, which constrains the difference between values of symmetrically distributed pixels to be equal to a non-zero amount. SPINE-Net is validated to achieve superior normal estimation accuracy given a small number of input images (e.g., 10) comparing to state-of-the-art methods using a much larger number (e.g., 96), which greatly relieves data capture and lighting calibration labor for photometric stereo with general BRDFs. The contributions of this paper are two-fold:

  • We propose the SPLINE-Net to address the problem of photometric stereo with general BRDFs using a small number of images through an integrated learning procedure of lighting interpolation and normal estimation.

  • We show how symmetric and asymmetric loss functions can be formulated to facilitate the learning of lighting interpolation and normal estimation with isotropy constraint and outlier rejection of global illumination effects considered.

2 Related Works

In this section, we briefly review traditional methods and deep learning based methods for non-Lambertian photometric stereo with general materials and known lightings. For other generalizations of photometric stereo, we refer readers to survey papers in [13, 1, 26].

Traditional methods.

The classical method [33] for the problem of photometric stereo is to assume a Lambertian surface reflectance and recover surface normals pixel-wisely. Such an assumption is too strong to provide an accurate recovery in real-world due to densely observed non-Lambertian reflectance caused by materials with diverse reflectance and global illumination effects. In order to address non-Lambertian reflectance from broad classes of materials, modern algorithms attempt to use a mathematically tractable form to describe BRDF. Analytic models exploit all available data to fit a nonlinear analytic BRDF, such as Blinn-Phong model [31], Torrance-Sparrow model [11], Ward model and its variations [10, 12, 2], specular spike model [9, 36], and microfacet BRDF model [8]. Empirical models consider general properties of a BRDF, such as isotropy, monotonicity. Some basic derivations for isotropy BRDFs are provided in [4, 29, 6]. Excellent performance has been achieved by methods based on empirical models, including combining isotropy and monotonicity with visibility constraint [14], using isotropic constraint for the estimation of elevation angle [3, 27, 20], and approximating isotropic BRDFs by bivariate functions [23, 17, 28]. However, most of these methods based on analytic models and empirical models are pixel-wise so that they cannot explicitly consider global illumination effects such as inter-reflection and cast shadows. Outlier rejection based methods are developed to suppress global illumination effects by considering them as outliers. Earlier works select a subset of Lambertian images from inputs for the accurate recovery of surface normals [22, 32, 21, 37, 35]. Recent methods apply robust analysis by assuming non-Lambertian reflectance is sparse [34, 18]. However, these methods still rely on the existence of a dominant Lambertian reflectance component.

Figure 2: The framework of the proposed SPLINE-Net. The lighting interpolation network generates dense observation maps given sparse observation maps as inputs. The normal estimation network estimates surface normals given and as inputs. Both of networks are trained in a supervised manner where ground truth of observation maps and surface normals are known.

Deep learning based methods.

Recently, with the great success in both high-level and low-level computer vision tasks achieved by neural networks, researchers have introduced deep learning based methods to solve the problem of photometric stereo. Instead of explicit modeling image formation process and global illumination effects as in traditional methods, deep learning based methods attempt to learn such information from data. DPSN 

[24] is the first attempt and it uses a deep fully-connect network to regress surface normals from given observations captured under pre-defined lightings in a supervised manner. However, pre-definition of lightings limits its practicality for photometric stereo where the number of inputs often varies. PS-FCN [7]

is proposed to address such a limitation and handle images under various lightings in an order-agnostic manner by aggregating features of inputs using the max-pooling operation. CNN-PS 

[16] is another work to accept order-agnostic inputs by introducing observation map, which is a fixed shape representation invariant to inputs. Besides neural networks trained in a supervised manner, Taniai and Maehara [30]

presented an unsupervised learning framework where surface normals and BRDFs are recovered by minimizing a reconstruction loss between inputs and images synthesized based on a rendering equation.

Only a few earlier works address the problem of photometric stereo with general reflectance using a small number of images in the literature (e.g., analytic model based method [12], shadow analysis based method [5]). Our paper revisits this problem due to its low costs for the labor of data capture and lighting calibration.

3 The Proposed SPLINE-Net

In this section, we introduce our solution to the problem of photometric stereo with general reflectance using a small number of images. We first present the framework of our SPLINE-Net in Section 3.1. Then we detailed the symmetric loss and the asymmetric loss in Section 3.2.

3.1 Framework

As illustrated in Figure 2, our SPLINE-Net, which consists of a Lighting Interpolation Network (LI-Net) and a Normal Estimation Network (NE-Net), is optimized in a supervised manner. LI-Net (represented as a regression function ) interpolates dense observation maps from sparse observation maps (i.e., sparse sets of lights),

(1)

Such densely interpolated observation maps are then concatenated to original inputs and help estimate surface normals in NE-Net (represented as a regression function )

(2)

LI-Net and NE-Net are trained in an alternating iteratively manner, where fixing one network when optimizing the other. Specifically, we update LI-Net once after updating NE-Net five times. The loss function for each network composes of a reconstruction loss, a symmetric loss, and an asymmetric loss.

Lighting Interpolation Network.

The basic idea of LI-Net is to inpaint sparse observation maps to obtain dense ones, based on learnable properties of observation maps (e.g., spatial continuity). LI-Net is designed using an encoder-decoder structure due to its excellent image generation capacity [19, 39]. The loss function of LI-Net is formulated as,

(3)

The reconstruction loss is defined as222We experimentally find L1 and L2 distances provide similar results and here we compute reconstruction loss using L1 distance.,

(4)

where , and are ground truth of a surface normal and its corresponding dense observation map, respectively, is a binary mask indicating positions of non-zero value of , represents element-wise multiplication. and are our symmetric and asymmetric loss to be introduced in Section 3.2.

Normal Estimation Net.

We use the same architecture as that in [16] (a variation of DenseNet [15]) for NE-Net due to its excellent capacity to model the relation between observation maps and surface normals. The loss function of NE-Net is formulated as,

(5)

where and are symmetric loss and asymmetric loss, and reconstruction loss is,

(6)
Figure 3: An illustration of an orthogonal projection from a hemisphere surface to its base (gray) and an interpretation of isotopic property for a dense observation map. Color-dotted lines in (b) are projected by lines in (a) of the same colors. (a) Front view: and represent two lighting directions which are symmetric about the plane spanned by viewing direction and surface normal  (orange plane). (b) Top view: the irradiance values, whose positions are projections of and , are numerically equal due to isotropy.

3.2 Symmetric Loss and Asymmetric Loss

In this section, we first revisit the observation map introduced by [16]. Then, we further investigate its characteristics by considering isotropy BRDFs and global illumination effects. Finally, we introduce our symmetric and asymmetric loss functions.

Observation maps.

As introduced in [16], each point on a surface normal map corresponds to an observation map (as shown in Figure 3 (a)). Elements on such a map describe observed irradiance values under different lighting directions. These lighting directions are mapped to positions of elements, which is an orthogonal projection. As illustrated in Figure 3 (a), a dense observation map can be regarded as generated by projecting a hemisphere surface to its base plane, where each point on the hemisphere surface represents a direction of lighting and its projecting value describes an observed irradiance value under such a light. Such a projecting relation motivates us to introduce isotropy to narrow the solution space of our SPLINE-Net.

Isotropic BRDFs and global illumination effects.

Isotropy BRDFs for general materials have the property that reflectance values are numerically equal if lighting directions are symmetric about the plane spanned by and as shown in Figure 3 (a). Considering the relation of the one-to-one mapping between lighting directions and positions of observed irradiance values, these values are numerically equal if their positions are symmetrically distributed regarding to the axis projected by surface normals in observation maps, as shown by Figure 3 (b). However, such symmetric patterns can be destroyed by global illumination effects due to the fact that observation maps are pixel-wisely generated. Therefore, unpredictable shapes produce cast shadows or inter-reflection under certain lighting directions and lead to sudden changes of irradiance values on observation maps. Figure 4 illustrates examples of isotropy, cast shadow, inter-reflection.

Figure 4: Six dense observation maps of data buddha from CyclesPS [16] with (a) isotropic reflectance, (b) cast shadows, and (c) inter-reflection. Red-dotted lines indicate directions of their corresponding surface normals.

Symmetric and asymmetric loss functions.

In order to further narrow solution spaces for the lighting interpolation of dense observation maps to facilitate accurate estimation of the surface normal, we propose symmetric and asymmetric loss functions to exploit above observations for LI-Net and NE-Net. More specifically, given a dense observation map and its corresponding surface normal , the symmetric loss is introduced to force the isotropic properties of general BRDFs which are valid on various real-world reflectance. That is, it constrains irradiance values, which are symmetrically distributed w.r.t. an axis determined by its surface normal (red-dotted lines in Figure 4), to be numerically equal,

(7)

where function mirrors the observation map w.r.t. the axis determined by . Different from symmetric loss, the asymmetric loss is introduced to model the asymmetric pattern brought by outliers such as global illuminations. It constrains the difference between values of symmetrically distributed pixels to be equal to a non-zero amount ,

(8)

where is a weight parameter, function

performs an average pooling operation with stride of 2 to ensure spatial continuity of observation maps. Empirically, we set

for all experiments. Both of and aim to better fit observations of symmetric and asymmetric patterns (as illustrated in Figure 4) during training. We integrate symmetric and the asymmetric loss functions to optimize LI-Net by setting,

(9)

and to optimize NE-Net by setting,

(10)

We empirically set due to the fact that global illumination effects are always observed for small regions of real-world objects.

4 Experiments

In this section, we report our experimental results on one synthetic dataset and one real dataset in Section 4.1 and Section 4.2, respectively. We further analyze the effectiveness of LI-Net, symmetric loss, and asymmetric loss by ablation studies in Section 4.3.

Settings and implementation details.

A recent survey work [26] implies that photometric stereo with general BRDFs show significant performance drop if only around 10 images are provided. Therefore, we define 10 as the number of sparse lights and use 10 randomly sampled lights as inputs to SPLINE-Net for both training and testing. The resolution of our observation map is set to and the batch size is set to , which are the same as those in [16], for easier comparisons. All our experiments are performed on a machine with one single GeForce GTX 1080 Ti and 12GB RAM. Adam optimizer is used to optimize our networks with default parameters and .

Datasets and evaluation.

We use CyclesPS data provided by [16] as our training data. There are totally 45 training data including 15 shapes with 3 categories of reflectance (diffuse, metallic, and specular). Our testing sets are built based on public evaluation datasets. That is, we construct 100 instances for each testing data from these datasets. Each instance contains images illuminated under 10 randomly selected lights to cover as many lighting conditions as possible. Quantitative results are averaged over 100 instances for each testing data.

Compared methods.

We compare our method with five methods, including linear least squares based Lambertian photometric stereo method LS [33], an outlier rejection based method IW12 [18], two state-of-the-art methods exploring general BRDF properties ST14 [28] and IA14 [17], and a deep learning based method CNN-PS [16].333We have conducted evaluations using the same training data and testing data for PS-FCN [7]. However, performance of PS-FCN [7] on CyclesPS-Test [16] (19.29 and 18.10) and Diligent [26] (24.41) is not as expected. The major reason is that PS-FCN [7] requires training data with various shapes while our training data CyclesPS [16] only contains three shapes with diverse reflectance. For a fair comparison, we don’t compare with PS-FCN [7] in our experiments.

We re-train CNN-PS [16] by taking ten observed irradiance values as inputs to deal with the problem of photometric stereo using a small number of images.444Considering the overall quantitative results of CNN-PS [16] with default settings (taking 50 observed irradiance values as inputs) on CyclesPS-Test [16] (31.08 and 34.90) and Diligent [26] (14.10), we report results re-trained by our setting.

paperbowl sphere turtle Avg. paperbowl sphere turtle Avg.
M S M S M S M S M S M S
LS [33] 41.47 35.09 18.85 10.76 27.74 19.89 25.63 43.09 37.36 20.19 12.79 28.51 21.76 27.28
IW12 [18] 46.68 33.86 16.77 2.23 31.83 12.65 24.00 48.01 37.10 21.93 3.19 34.91 16.32 26.91
ST14 [28] 42.94 35.13 22.58 4.18 34.30 17.01 26.02 44.44 37.35 25.41 4.89 36.01 19.06 27.86
IA14 [17] 48.25 43.51 18.62 11.71 30.59 23.55 29.37 49.01 45.37 21.52 13.63 32.82 26.27 31.44
CNN-PS [16] 37.14 23.40 17.44 6.99 22.86 10.74 19.76 38.45 26.90 18.25 9.04 23.91 14.36 21.82
SPLINE-Net 29.87 18.65 6.59 3.82 15.07 7.85 13.64 33.99 23.15 9.21 6.69 17.35 12.01 17.07
Table 1: Quantitative comparisons in terms of angular error on CyclesPS-Test dataset [16]. Results of three shapes (paperbowl, sphere, turtle) with metallic (M) and specular (S) materials are reported. Note that all results are averaged over 100 random trials. For subset L17 (left) and subset L305 (right), results over 6 different data are averaged (Avg.) for each method.
Figure 5: Visual quality comparisons of normal maps and angular error maps (in degrees) on data sphere with specular materials (top) and paperbowl with metallic materials (bottom) from L17, CyclesPS-Test [16].
Methods ball bear buddha cat cow goblet harvest pot1 pot2 reading Avg.
LS [33] 4.41 9.05 15.62 9.03 26.42 19.59 31.31 9.46 15.37 20.16 16.04
IW12 [18] 3.33 7.62 13.36 8.13 25.01 18.01 29.37 8.73 14.60 16.63 14.48
ST14 [28] 5.24 9.39 15.79 9.34 26.08 19.71 30.85 9.76 15.57 20.08 16.18
IA14 [17] 12.94 16.40 20.63 15.53 18.08 18.73 32.50 6.28 14.31 24.99 19.04
CNN-PS [16] 17.86 13.08 19.25 15.67 19.28 21.56 21.52 16.95 18.52 21.30 18.50
LI-Net+NE-Net 6.06 7.01 10.69 8.38 10.39 11.37 19.02 9.42 12.34 16.18 11.09
Nets+Sym. 5.04 5.89 10.11 7.79 9.38 10.84 19.03 8.91 11.47 15.87 10.43
SPLINE-Net 4.96 5.99 10.07 7.52 8.80 10.43 19.05 8.77 11.79 16.13 10.35
Table 2: Quantitative comparisons in terms of angular error on Diligent dataset [26]. Note that all results are averaged over 100 random trials. Results over 10 different data are averaged (Avg.) for each method.
Figure 6: Visual quality comparisons of normal maps and angular error maps (in degrees) on data cow (top), pot1 from Diligent [26].

4.1 Synthetic Data

CyclesPS-Test is a testing dataset published by [16], which consists of two subsets generated under different numbers of lightings (17 or 305). Both subsets (denoted as L17 and L305 in this paper) contain three shapes, paperbowl, sphere, and turtle. Each of these shapes is spatially divided into 100 parts and each part is rendered by various reflectance of either metallic or specular, to approximate general reflectance with diverse materials. There are 6 data in each of these two subsets. For all these 12 data, we construct 100 instances, each of which contains 10 randomly selected images, to build the testing set.

As can be observed from Table 1, the metallic materials are more challenging as compared with specular materials when using a small number of images due to the more abrupt changes in BRDFs. Even for the simple shape like sphere containing few global illumination effects, all methods fail to estimate accurate surface normals for metallic materials. The performance advantage of the proposed SPLINE-Net is superior, i.e., the overall performance (13.64 and 17.07) is much better than the second best one achieved by CNN-PS [16] (19.76 and 21.82). Interestingly, two traditional methods, IW12 [18] and ST14 [28], outperform other methods on data sphere with specular materials. However, they are not able to achieve accurate results on complex shapes like paperbowl or turtle, or metallic materials due to the ignorance of global illumination effects, while our method consistently achieves the best performance.

Visual quality comparisons in Figure 5 further validates the effectiveness of our method. Even though the overall performance is worse than that of IW12 [28] on data sphere, our method deals with specular reflectance in a more robust manner. The error maps on a more difficult shape paperbowl show that our method consistently produces the best estimation for most regions. Both quantitative performance and visualization results on synthetic data demonstrate the effectiveness of our method to address the problem of photometric stereo with general BRDFs using a small number of images.

Figure 7: Illustration of observation maps. From left to right columns: sparse observation maps (inputs, 10 order-agnostic lights), dense observation maps generated by LI-Net+NE-Net, Nets+Sym., SPLINE-Net, and ground truth (1000 lights).
Figure 8: Visual quality comparisons of normal maps and angular error maps (in degrees) on data goblet (top), and reading (bottom) from Diligent [26].
Figure 9: Visual quality comparisons of normal maps on diffuse-dominant data buddha (top), and sheep (bottom).

4.2 Real Data

Diligent [26] is a benchmark dataset consists of 10 real data with various reflectance. Each data provides 16-bit images from 96 known lighting directions distributed on a grid spanning . Similarly, for each of these 10 data, we construct 100 instances, each of which contains 10 randomly selected images, to build the testing set.

The quantitative results are reported in Table 2. The proposed SPLINE-Net demonstrates obvious superiority over most data except for ball and pot1, which get similar or worse results (4.96 and 8.77) as compared to two traditional methods LS [33] (4.41 and 9.46) and IW12 [18] (3.33 and 8.73). The reason is that these two data are diffuse-dominant so that traditional methods with Lambertian assumption can fit well even for a small number of observed irradiance values. However, our data-driven approach considers general reflectance and global illumination effects evenly during model optimization and hence may underfit Lambertian surfaces with simple shapes. Unlike excellent performance on synthetic data, CNN-PS [16] achieve less accuracy on real data, i.e., its performance on real data is even not comparable to traditional methods. We think that the reason is mainly due to the problem of overfitting during training, i.e., synthetic data for testing are constructed in a similar manner as training data. Our method achieves the best accuracy for most of data, such as cow (metallic paint materials), goblet, and harvest (most of regions contain inter-reflection or cast shadows).

Visual quality comparisons on data cow and pot1 are shown in Figure 6. Our method provides much more accurate results due to its excellent modeling capacity for metallic materials on cow and such a performance advantage is consistent with those on synthetic data with metallic materials as reported in Table 1. Even though our method achieves similar performance on pot1 for center regions as compared to other methods (LS [33], IW12 [18], ST14 [28]), our method provides more accurate estimation for boundaries (e.g., regions of spout and kettle-holder). Both the quantitative results and visual quality results on real data also validate the effectiveness of our method.

4.3 Ablation Studies

In this section, we perform ablation studies to further investigate the contribution of important components in SPLINE-Net. Specifically, the effectiveness of LI-Net, symmetric loss, and asymmetric loss are independently studied. Unless otherwise stated, all methods in this section use exactly the same settings as in Section 4.1 and Section 4.2. We use the same testing set as in Section 4.2 which is built from real dataset Diligent [26] for evaluation.

Effectiveness of LI-Net.

Considering the same network structure of NE-Net and that in CNN-PS [16] and the fact that our SPLINE-Net is composed of LI-Net and NE-Net, we compare our SPLINE-Net without symmetric loss or asymmetric loss (denoted as ‘LI-Net+NE-Net’) with CNN-PS [16] to validate the effectiveness of LI-Net. Their quantitative and qualitative performance are shown in Table 2 and Figure 8, respectively. As can be observed, LI-Net+NE-Net significantly outperforms CNN-PS [16], which verifies the effectiveness of using LI-Net to generate dense observation maps to estimate surface normals.

Effectiveness of symmetric loss.

We then validate the effectiveness of symmetric loss by comparing the performance of LI-Net+NE-Net with that of the method with an additional symmetric loss (denoted as ‘Nets+Sym.’). Such an experiment is to verify the effectiveness of enforcing isotropy property. Figure 7 shows that symmetric loss can help generate more reliable dense observation maps. The quantitative performance can be consistently improved for all data by introducing symmetric loss as displayed in Table 2. Visual comparisons in Figure 8 intuitively shows such an advantage for general materials (rectangles in orange).

Effectiveness of asymmetric loss.

We perform a comparison between Nets+Sym. and SPLINE-Net in this experiment to verify the effectiveness of our method for the consideration of global illumination effects. Asymmetric loss can help produce more reliable dense observation maps as can be observed from observation maps in Figure 7. Interestingly, our methods (Nets+Sym. and SPLINE-Net) successfully inpaint regions damaged by global illumination effects as shown in the second row of Figure 7. The overall quantitative performance is improved by introducing asymmetric loss as shown in Table 2. Most of improvements are observed for data with heavy global illumination effects, e.g., goblet (inter-reflection), harvest (cast-shadows). Visual comparisons in Figure 8 intuitively shows such an advantage (rectangles in red).

5 Conclusion

This paper proposes SPLINE-Net to address the problem of photometric stereo with general reflectance and global illumination effects using a small number of images. The basic idea of SPLINE-Net is to generate dense lighting observations from a sparse set of lights to guide the estimation of surface normals. The proposed SPLINE-Net is further constrained by the proposed symmetric and asymmetric loss functions to enforce isotropic constrain and perform outlier rejection of global illumination effects.

Limitations.

Interestingly, even though deep learning based methods achieve superior performance for non-Lambertian reflectance, their performance drops for diffuse-dominant surfaces that can be well fitted by traditional methods with Lambertian assumption. Figure 9 illustrates results of four traditional methods and three deep learning based methods (including PS-FCN [7]) on two real data555Buddha is courtesy of Dan Goldman and Steven Seitz (found from http://www.cs.washington.edu/education/courses/csep576/05wi//projects/project3/project3.htm). Sheep is from [25]. with diffuse surfaces. Such results, which are consistent with those of ball and pot1 in Table 2, indicate the limitation of deep learning methods for diffuse surfaces. To explicitly consider diffuse surface at the same time maintain the performance advantage on non-Lambertian surfaces for deep learning based methods can be one of further works.

References

  • [1] J. Ackermann and M. Goesele. A survey of photometric stereo techniques. Foundations and Trends in Computer Graphics and Vision, 9(3-4):149–254, 2015.
  • [2] J. Ackermann, F. Langguth, S. Fuhrmann, and M. Goesele. Photometric stereo for outdoor webcams. In

    Proc. of Computer Vision and Pattern Recognition

    , pages 262–269, 2012.
  • [3] N. Alldrin, T. Zickler, and D. Kriegman. Photometric stereo with non-parametric and spatially-varying reflectance. In Proc. of Computer Vision and Pattern Recognition, pages 1–8, 2008.
  • [4] N. G. Alldrin and D. J. Kriegman. Toward reconstructing surfaces with arbitrary isotropic reflectance: A stratified photometric stereo approach. In Proc. of Internatoinal Conference on Computer Vision, pages 1–8, 2007.
  • [5] M. Chandraker, S. Agarwal, and D. Kriegman. Shadowcuts: Photometric stereo with shadows. In Proc. of Computer Vision and Pattern Recognition, pages 1–8, 2007.
  • [6] M. Chandraker, J. Bai, and R. Ramamoorthi. On differential photometric reconstruction for unknown, isotropic BRDFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12):2941–2955, 2013.
  • [7] G. Chen, K. Han, and K.-Y. K. Wong. Ps-fcn: A flexible learning framework for photometric stereo. In Proc. of European Conference on Computer Vision, pages 3–19, 2018.
  • [8] L. Chen, Y. Zheng, B. Shi, A. Subpa-Asa, and I. Sato. A microfacet-based reflectance model for photometric stereo with highly specular surfaces. In Proc. of Internatoinal Conference on Computer Vision, pages 3162–3170, 2017.
  • [9] T. Chen, M. Goesele, and H.-P. Seidel. Mesostructure from specularity. In Proc. of Computer Vision and Pattern Recognition, volume 2, pages 1825–1832, 2006.
  • [10] H.-S. Chung and J. Jia. Efficient photometric stereo on glossy surfaces with wide specular lobes. In Proc. of Computer Vision and Pattern Recognition, pages 1–8, 2008.
  • [11] A. S. Georghiades. Incorporating the torrance and sparrow model of reflectance in uncalibrated photometric stereo. In Proc. of Internatoinal Conference on Computer Vision, volume 3, page 816, 2003.
  • [12] D. B. Goldman, B. Curless, A. Hertzmann, and S. M. Seitz. Shape and spatially-varying brdfs from photometric stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(6):1060–1071, 2010.
  • [13] S. Herbort and C. Wöhler. An introduction to image-based 3D surface reconstruction and a survey of photometric stereo methods. 3D Research, 2(3):1–17, 2011.
  • [14] T. Higo, Y. Matsushita, and K. Ikeuchi. Consensus photometric stereo. In Proc. of Computer Vision and Pattern Recognition, pages 1157–1164, 2010.
  • [15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In Proc. of Computer Vision and Pattern Recognition, pages 4700–4708, 2017.
  • [16] S. Ikehata. CNN-PS: CNN-based photometric stereo for general non-convex surfaces. In Proc. of European Conference on Computer Vision, pages 3–18, 2018.
  • [17] S. Ikehata and K. Aizawa. Photometric stereo using constrained bivariate regression for general isotropic surfaces. In Proc. of Computer Vision and Pattern Recognition, pages 2179–2186, 2014.
  • [18] S. Ikehata, D. Wipf, Y. Matsushita, and K. Aizawa. Robust photometric stereo using sparse regression. In Proc. of Computer Vision and Pattern Recognition, pages 318–325, 2012.
  • [19] J. Johnson, A. Alahi, and L. Fei-Fei.

    Perceptual losses for real-time style transfer and super-resolution.

    In Proc. of European Conference on Computer Vision, pages 694–711. Springer, 2016.
  • [20] S. Li and B. Shi. Photometric stereo for general isotropic reflectances by spherical linear interpolation. Optical Engineering, 54(8):083104, 2015.
  • [21] D. Miyazaki, K. Hara, and K. Ikeuchi. Median photometric stereo as applied to the segonko tumulus and museum objects. International Journal of Computer Vision, 86(2-3):229, 2010.
  • [22] Y. Mukaigawa, Y. Ishii, and T. Shakunaga. Analysis of photometric factors based on photometric linearization. JOSA A, 24(10):3326–3334, 2007.
  • [23] F. Romeiro, Y. Vasilyev, and T. Zickler. Passive reflectometry. In Proc. of European Conference on Computer Vision, pages 859–872, 2008.
  • [24] H. Santo, M. Samejima, Y. Sugano, B. Shi, and Y. Matsushita. Deep photometric stereo network. In Computer Vision Workshop (ICCVW), IEEE International Conference on, pages 501–509, 2017.
  • [25] B. Shi, Y. Matsushita, Y. Wei, C. Xu, and P. Tan. Self-calibrating photometric stereo. In Proc. of Computer Vision and Pattern Recognition, pages 1118–1125, 2010.
  • [26] B. Shi, Z. Mo, Z. Wu, D. Duan, S.-K. Yeung, and P. Tan. A benchmark dataset and evaluation for non-lambertian and uncalibrated photometric stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2):271–284, 2019.
  • [27] B. Shi, P. Tan, Y. Matsushita, and K. Ikeuchi. Elevation angle from reflectance monotonicity: Photometric stereo for general isotropic reflectances. In Proc. of European Conference on Computer Vision, pages 455–468. Springer, 2012.
  • [28] B. Shi, P. Tan, Y. Matsushita, and K. Ikeuchi. Bi-polynomial modeling of low-frequency reflectances. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1078–1091, 2014.
  • [29] P. Tan, L. Quan, and T. Zickler. The geometry of reflectance symmetries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):2506–2520, 2011.
  • [30] T. Taniai and T. Maehara. Neural inverse rendering for general reflectance photometric stereo. In

    International Conference on Machine Learning

    , pages 4864–4873, 2018.
  • [31] S. Tozza, R. Mecca, M. Duocastella, and A. Del Bue. Direct differential photometric stereo shape recovery of diffuse and specular surfaces. Journal of Mathematical Imaging and Vision, 56(1):57–76, 2016.
  • [32] F. Verbiest and L. Van Gool. Photometric stereo with coherent outlier handling and confidence estimation. In Proc. of Computer Vision and Pattern Recognition, pages 1–8, 2008.
  • [33] R. J. Woodham. Photometric method for determining surface orientation from multiple images. Optical Engineering, 19(1):191139–191139, 1980.
  • [34] L. Wu, A. Ganesh, B. Shi, Y. Matsushita, Y. Wang, and Y. Ma. Robust photometric stereo via low-rank matrix completion and recovery. In Proc. of Asian Conference on Computer Vision, pages 703–717. 2010.
  • [35] T.-P. Wu and C.-K. Tang.

    Photometric stereo via expectation maximization.

    IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3):546–560, 2010.
  • [36] S.-K. Yeung, T.-P. Wu, C.-K. Tang, T. F. Chan, and S. J. Osher. Normal estimation of a transparent object using a video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4):890–897, 2015.
  • [37] C. Yu, Y. Seo, and S. W. Lee. Photometric stereo from maximum feasible Lambertian reflections. In Proc. of European Conference on Computer Vision, pages 115–126, 2010.
  • [38] Q. Zheng, A. Kumar, B. Shi, and G. Pan. Numerical reflectance compensation for non-lambertian photometric stereo. IEEE Transactions on Image Processing, 2019.
  • [39] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros.

    Unpaired image-to-image translation using cycle-consistent adversarial networks.

    In Proc. of Internatoinal Conference on Computer Vision, pages 2223–2232, 2017.