1 Introduction
In 3D computer vision problems, the input data is often
unstructured (i.e., the number of input images is varying and the images are unordered). A good example is the multiview stereo problem where the scene geometry is recovered from unstructured multiview images. Due to this unstructuredness, 3D reconstruction from multiple images less relied on the supervised learningbased algorithms except for some structured problems such as binocular stereopsis
[1] and twoview SfM [2] whose number of input images is always fixed. However, recent advances in deep convolutional neural network (CNN) have motivated researchers to address unstructured 3D computer vision problems with deep neural networks. For instance, a recent work from Kar et al. [3] presented an endtoend learned system for the multiview stereopsis while Kim et al. [4]presented a learningbased surface reflectance estimation from multiple RGBD images. Either work intelligently merged all the unstructured input to a structured, intermediate representation (
i.e., 3D feature grid [3] and 2D hemispherical image [4]). ^{†}^{†}This work was supported by JSPS KAKENHI Grant Number JP17H07324.Photometric stereo is another 3D computer vision problem whose input is unstructured, where surface normals of a scene are recovered from appearance variations under different illuminations. Photometric stereo algorithms typically solved an inverse problem of the pointwise image formation model which was based on the Bidirectional Reflectance Distribution Function (BRDF). While effective, a BRDFbased image formation model generally cannot account the global illumination effects such as shadows and interreflections, which are often problematic to recover nonconvex surfaces. Some algorithms attempted the robust outlier rejection to suppress the nonLambertian effects
[5, 6, 7, 8], however the estimation failed when the nonLambertian observation was dominant. This limitation inevitably occurs due to the fact that multiple interactions of light and a surface are difficult to be modeled in a mathematically tractable form.To tackle this issue, this paper presents an endtoend CNNbased photometric stereo algorithm that learns the relationships between surface normals and their appearances without physically modeling the image formation process. For better scalability, our approach is still pixelwise and rather inherit from conventional robust approaches [5, 6, 7, 8], which means that we learn the network that automatically “neglects” the global illumination effects and estimate the surface normal from “inliers” in the observation. To achieve this goal, we will train our network on as much as possible synthetic patterns of the input that is “corrupted” by global effects. Images are rendered with different complex objects under the diverse material and illumination condition.
Our challenge is to apply the deep neural network to the photometric stereo problem whose input is unstructured. In similar with recent works [3, 4], we merge all the photometric stereo data to an intermediate representation called observation map that has a fixed shape, therefore is naturally fed to a standard CNN. As many photometric stereo algorithms were, our work is also primarily concerned with isotropic materials, whose reflections are invariant under rotation about the surface normal. We will show that this isotropy can be taken advantages of in a form of the rotational pseudoinvariance of the observation map for both augmenting the input data and reducing the prediction errors. To train the network, we create a synthetic photometric stereo dataset (CyclesPS) by leveraging the physicsbased Cycles renderer [9] to simulate the complex global light transport. For covering diverse realworld materials, we adopt the Disney’s principled BSDF [10] that was proposed for artists to render various scenes by controlling small number of parameters.
We evaluate our algorithm on the DiLiGenT Photometric Stereo Dataset [11] which is a real benchmark dataset containing images and calibrated lightings. We compare our method against conventional photometric stereo algorithms [12, 13, 14, 15, 5, 6, 16, 7, 17, 18, 8, 19, 20, 21] and show that our endtoend learningbased algorithm most successfully recovers the nonconvex, nonLambertian surfaces among all the algorithms concerned.
The summary of contributions is following:
(1) We firstly propose a supervised CNNbased calibrated photometric stereo algorithm that takes unstructured images and lighting information as input.
(2) We present a synthetic photometric stereo dataset (CyclesPS) with a careful injection of the global illumination effects such as cast shadows, interreflections.
(3) Our extensive evaluation shows that our method performs best on the DiLiGenT benchmark dataset [11] among various conventional algorithms especially when the surfaces are highly nonconvex and nonLambertian.
Henceforth we rely on the classical assumptions on the photometric stereo problem (i.e., fixed, linear orthographic camera and known directional lighting).
2 Related Work
Diverse appearances of real world objects can be encoded by a BRDF , which relates the observed intensity to the associated surface normal , the th incoming lighting direction , its intensity , and the outgoing viewing direction via
(1) 
where accounts for attached shadows and is an additive error to the model. Eq. (1) is generally called image formation model. Most photometric stereo algorithms assumed the specific shape of and recovered the surface normals of a scene by inversely solving Eq. (1) from a collection of observations under different lighting conditions . All the effects that are not represented by a BRDF (image noises, cast shadows, interreflections and so on) are typically put together in . Note that when the BRDF is Lambertian and the additive error is removed, it is simplified to the traditional Lambertian image formation model [12].
Since Woodham firstly introduced the Lambertian photometric stereo algorithm, the extension of its work to nonLambertian scenes has been a problems of significant interest. Photometric stereo approaches to dealing with nonLambertian effects are mainly categorized into four classes: (a) robust approach, (b) reflectance modeling with nonLambertian BRDF, (c) examplebased reflectance modeling and (d) learningbased approach.
Many photometric stereo algorithms recover surface normals of a scene via a simple diffuse reflectance modeling (e.g., Lambertian) while treating other effects as outliers. For instance, Wu et al. [5] have proposed a rankminimization based approach to decompose images into the lowrank Lambertian image and nonLambertian sparse corruptions. Ikehata et al. extended their method by constraining the rank3 Lambertian structure [6] (or the general diffuse structure [7]) for better computational stability. Recently, Queau et al. [8] have presented a robust variational approach for inaccurate lighting as well as various nonLambertian corruptions. While effective, a drawback of this approach is that if it were not for dense diffuse inliers, the estimation fails.
Despite their computational complexity, various algorithms arrange the parametric or nonparametric models of nonLambertian BRDF. In recent years, there has been an emphasis on representing a material with a small number of fundamental BRDF. Goldman
et al. [22] have approximated each fundamental BRDF by the Ward model [23] and Alldrin et al. [13] later extended it to nonparametric representation. Since the highdimensional illposed problem may cause the instability of the estimation, Shi et al. [18] presented a compact biquadratic representation of isotropic BRDF. On the other hand, Ikehata et al. [17] introduced the sumoflobes isotropic reflectance model [24] to account all frequencies in isotropic observations. For improving the efficiency of the optimization, Shen et al. [25] presented a kernel regression approach, which can be transformed to an eigen decomposition problem. This approach works well as far as a resultant image formation model is correct without model outliers.A few amount of photometric stereo algorithms are grouped into the examplebased approach, which takes advantages of the surface reflectance of objects with known shape, captured under the same illumination environment with the target scene. The earliest examplebased approach [26] requires a reference object whose material is exactly same with that of target object. Hertzmann et al. [27] have eased this restriction to handle uncalibrated scenes and spatially varying materials by assuming that materials can be expressed as a small number of basis materials. Recently, Hui et al. [20] presented an examplebased method without a physical reference object by taking advantages of virtual spheres rendered with various materials. While effective, this approach also suffers from model outliers and has a drawback that the lighting configuration of the reference scene must be taken over at the target scene.
Machine learning techniques have been applied in a few very recent photometric stereo works [21, 19]. Santo et al. [19]
presented a supervised learningbased photometric stereo method using a neural network that takes as input a normalized vector where each element corresponds to an observation under specific illumination. A surface normal is predicted by feeding the vector to one dropout layer and adjacent six dense layers. While effective, this method has limitation that lightings remain the same between training and test phases, making it inapplicable to the unstructured input. One another work by Taniai and Maehara
[21]presented an unsupervised learning framework where surface normals and BRDFs are predicted by the network trained by minimizing reconstruction loss between observed and synthesized images with a rendering equation. While their network is invariant to the number and permutation of the images, the rendering equation is still based on a pointwise BRDF and intolerant to the model outliers. Furthermore, they reported slow running time (
i.e., 1 hour to do 1000 SGD iterations for each scene) due to its selfsupervision manner.In summary, there is still a constant struggle in the design of the photometric stereo algorithm among its complexity, efficiency, stability and robustness. Our goal is to solve this dilemma. Our endtoend learningbased algorithm builds upon the deep CNN trained on synthetic datasets, abandoning the modeling of complicated image formation process. Our network accepts the unstructured input (i.e., our network is invariant to both number and order of input images) and works for various realworld scenes where nonLambertian reflections are intermingled with global illumination effects.
3 Proposed Method
Our goal is to recover surface normals of a scene of (a) spatiallyvarying isotropic materials and with (b) global illumination effects (e.g., shadows and interreflections) (c) where the scene is illuminated by unknown number of lights. To achieve this goal, we propose a CNN architecture for the calibrated photometric stereo problem which is invariant to both the number and order of input images. The tolerance to global illumination effects is learned from the synthetic images of nonconvex scenes rendered with the physicsbased renderer.
3.1 2D observation map for unstructured photometric stereo input
We firstly present the observation map which is generated by a pixelwise hemispherical projection of observations based on known lighting directions. Since a lighting direction is a vector spanned on a unit hemisphere, there is a bijective mapping from to (s.t., ) by projecting a vector onto the  coordinate system which is perpendicular to a viewing direction ().^{2}^{2}2We preliminarily tried the projection on the spherical coordinate system (), but the performance was worse than one on the standard xy coordinate system. Then we define an observation map as
(2) 
where “int” is an operator to round a floating value to an integer and is a scaling factor to normalize data (i.e., we simply use ). Once all the observations and lightings are stored in the observation map, we take it as an input of the CNN. Despite its simplicity, this representation has three major benefits. First, its shape is independent of the number and size of input images. Second, the projection of observations is orderindependent (i.e., the observation map does not change when swapping th and th images). Third, it is unnecessary to explicitly feed the lighting information into the network.
Fig. 1 illustrates examples of the observation map of two objects namely SPHERE and PAPERBOWL, one is purely convex and the other is highly nonconvex. Fig. 1(a) indicates that the target point could be on the convex surface since the values of the observation map gradually decrease to zero as the light direction is going apart from the true surface normal (). The local concentration of large intensity values also indicates the narrow specularity on the smooth surface. On the other hand, the abrupt change of values in Fig. 1(b) evidences the presence of cast shadows or interreflections on the nonconvex surface. Since there is no local concentration of intensity values, the surface is likely to be rough. In this way, an observation map reasonably encodes the geometry, material and behavior of the light at around a surface point.
3.2 Rotation pseudoinvariance for the isotropy constraint
An observation map is sparse in a general photometric stereo setup (e.g., assuming that and we have 100 images as input, the ratio of nonzero entries in is about
). The missing data is generally considered problematic as CNN input and often interpolated
[4]. However, we empirically found that smoothly interpolating missing entries degrades the performance since an observation map is often nonsmooth and zero values have an important meaning (i.e., shadows). Therefore we alternatively try to improve the performance by taking into account the isotropy of the material.Many realworld materials exhibit identically same appearance when the surface is rotated along a surface normal. The presence of this behavior is referred to as isotropy [29, 30]. Isotropic BRDFs are parameterized in terms of three values instead of four [31] as
(3) 
where is an arbitrary reflectance function.^{3}^{3}3Note that there are other parameterizations of an isotropic BRDF [32]. Combining Eq. (3) with Eq. (1), we get following image formation model.
(4) 
Note that lighting index and model error are omitted for brevity. Let’s consider the rotation of surface normal and lighting direction around the zaxis (i.e., viewing axis) as where and is an arbitrary rotation matrix. Then,
(6)  
(7) 
Feeding them into Eq. (4) gives following equation,
Therefore, the rotation of lighting and surface normal around axis does not change the appearance as illustrated in Fig. 2(a). Note that this theorem holds even for the indirect illumination in nonconvex scenes by rotating all the geometry and environment illumination around the viewing axis. This result is important for our CNNbased algorithm. We suppose that a neural network is a mapping function that maps (i.e., a set of images and lightings) to (i.e., a surface normal) and is a rotation operator of lighting/normal at the same angle around axis. From Eq. (3.2), we get . We call this relationship as rotational pseudoinvariance (the standard rotation invariance is ). Note that this rotational pseudoinvariance is also applied on the observation map since the rotation of lightings around the viewing axis results in the rotation of the observation map around the zaxis^{4}^{4}4Strictly speaking, we rotate the lighting directions instead of the observation map itself. Therefore, we do not need to suffer from the boundary issue unlike the standard rotational data augmentation..
We constrain the network with the rotational pseudoinvariance in the similar manner that the rotation invariance is achieved. Within the CNN framework, two approaches are generally adopted to encode the rotation invariance. One is applying rotations to the input image [33] and the other is applying rotations to the convolution kernels [34]. We adopt the first strategy due to its simplicity. Concretely, we augment the training set with many rotated versions of lightings and surface normal, which allows the network to learn the invariance without explicitly enforcing it. In our implementation, we rotate the vectors at regular intervals from 0 to 360.
3.3 Architecture details
In this section, we describe the framework of training and prediction. Given images and lightings, we produce observation maps followed by Eq. (2). Data is augmented to achieve the rotational pseudoinvariance by rotating both lighting and surface normal vectors around the viewing axis. Note that a color image is converted to a grayscale image. The size of the observation map () should be chosen carefully. As increases, the observation map becomes sparser. On the other hand, the smaller observation map has less respresentability. Considering this tradeoff, we empirically found that is a reasonable choice (we tried and showed the best performance when the number of images is less than one thousand).
A variation of densely connected convolutional neural network (DenseNet [28]) architecture is used to estimate a surface normal from an observation map. The network architecture is shown in Fig. 2
(b). The network includes two 2layer dense blocks, each consists of one activation layer (relu), one convolution layer (
) and a dropout layer (drop) with a concatenation from the previous layers. Between two dense blocks, there is a transition layer to change featuremap sizes via convolution and pooling. We do not insert a batch normalization layer that was found to degrade the performance in our experiments. After the dense blocks, the network has two dense layers followed by one normalization layer which convert a feature to an unit vector. The network is trained with a simple mean squared loss between predicted and ground truth surface normals. The loss function is minimized using Adam solver
[35]. We should note that since our input data size is relatively small (i.e., ), the choice of the network architecture is not a critical component in our framework.^{5}^{5}5We compared architectures of AlexNet, VGGNET and densenet as well as much simpler architectures with only two or three convolutoinal layers and the dense layer(s). Among the architectures we tested, the current architecture was slightly better.The prediction module is illustrated in Fig. 3. Given observation maps, we predict surface normals based on the trained network. Since it is practically impossible to train the perfect rotational pseudoinvariant network, estimated surface normals for differently rotated observation maps were not identical (typically the difference of angular errors between every two different rotations was less than 10%20% of their average). For further emphasizing the rotational pseudoinvariance, we again augment the input data by rotating lighting vectors at a certain angle and then merge the outputs into one. Suppose the surface normal () is a prediction from the input data rotated by , then we simply average the inversely rotated surface normals as follows,
(9)  
3.4 Training dataset (CyclesPS dataset)
In this section, we present our CyclesPS training dataset. DiLiGenT [11], the largest real photometric stereo dataset contains only ten scenes with fixed lighting configuration. Some works [18, 17, 19] attempted to synthesize images with MERL BRDF database [29], however only one hundred measured BRDFs cannot cover the tremendous realworld materials. Therefore, we decided to create our own training dataset that has diverse materials, geometries and illumination.
For rendering scenes, we collected high quality 3D models under royalty free license from the internet.^{6}^{6}6References to each 3D model are included in supplementary.
We carefully chose fifteen models for training and three models for test whose surface geometry is sufficiently complex to cover the diverse surface normal distribution. Note that we empirically found 3D models in ShapeNet
[36] which was used in a previous work [4] are generally too simple (e.g., models are often lowpolygonal, mostly planar) to train the network.The representation of the reflectance is also important to make the network robust to wide varieties of realworld materials. Due to its representability, we choose Disney’s principled BSDF [10] which integrates five different BRDFs controlled by eleven parameters (baseColor, subsurface, metallic, specular, specularTint, roughness, anisotropic, sheen, sheenTint, clearcoat, clearcoatGloss). Since our target is isotropic materials without subsurface scattering, we neglect parameters such as subsurface and anisotropic. We also neglect specularTint that artistically colorizes the specularity and clearcort and clearcoatGloss that does not strongly affect the rendering results. While principled BSDF is effective, we found that there are some unrealistic combinations of parameters that we want to skip (e.g., metallic = 1 and roughness = 0, or metallic = 0.5). For avoiding those unrealistic parameters, we divide the entire parameter sets into three categories, (a) Diffuse, (b) Specular and (c) Metallic. We generate three datasets individually and evenly merge them when training the network. The value of each parameter is randomly selected within specific ranges for each parameter (see Fig. 4(a)). To realize spatially varying materials, we divide the object region in the rendered image into (i.e., 5000 for the training data) superpixels and use the same set of parameters at pixels within a superpixel (See Fig. 4(b)).
For simulating complex light transport, we use Cycles [9] renderer bundled in Blender [37]. The orthographic camera and the directional light are specified. For each rendering, we choose a set of an object, BSDF parameter maps (one for each parameter), and lighting configuration (i.e
., Once roughly 1300 lights are uniformly distributed on the hemisphere, small random noises are added to each light). Once images were rendered, we create
CyclesPS dataset by generating observation maps pixelwisely. For making the network robust to the test data of any number of images, observation maps are generated from a pixelwisely different number of images. Concretely, when generating an observation map, we pick a random subset of images whose number is whithin to and whose corresponding elevation angle of the light direction is more than a random threshold value within to degrees.^{7}^{7}7The minimum number of images is 50 for avoiding too sparse observation map and we only picked the lights whose elevation angles were more than 20 degrees since it is practically less possible that the scene is illuminated from the side.The training process takes 10 epochs for 150 image sets (
i.e., 15 objects 10 rotations for the rotational pseudoinvariance). Each image set contains around 50000 samples (i.e., number of pixels in the object mask).4 Experimental Results
We evaluate our method on synthetic and real datasets. All experiments were performed on a machine with 3
GeForce GTX 1080 Ti and 64GB RAM. For training and prediction, we use Keras library
[38]with Tensorflow background and use default learning parameters. The training process took around 3 hours.
4.1 Datasets
We evaluated our method on three datasets, two are synthetic and one is real.
MERLSphere is a synthetic dataset where images are rendered with one hundred isotropic BRDFs in MERL database [29] from diffuse to metallic. We generated 32bit HDR images of a sphere () with a ground truth surface normal map and a foreground mask. There is no cast shadow and interreflection.
CyclesPSTest is a synthetic dataset of three objects, SPHERE, TURTLE and PAPERBOWL. TURTLE and PAPERBOWL are nonconvex objects where the interreflection and cast shadow appear on rendered images. This dataset was generated in the same manner with the CyclesPS training dataset except that the number of superpixels in the parameter map was and the material condition was either Specular or Metallic (Note that objects and parameter maps in CyclesPSTest are NOT in CyclesPS). Each data contains 16bit integer images with a resolution of under 17 or 305 known uniform lightings.
DiLiGenT [11] is a public benchmark dataset of 10 real objects of general reflectance. Each data provides 16bit integer images with a resolution of from different known lighting directions. The ground truth surface normals for the orthographic projection and the singleview setup are also provided.
4.2 Evaluation on MERLSphere dataset
We compared our method (with in Eq. (9)) against one of the stateoftheart isotropic photometric stereo algorithms (IA14 [17]^{8}^{8}8We used the authors’ implementation of [17] with and turning on the retroreflection handling. Attached shadows were removed by a simple thresholding. Note that our method takes into account all the input information unlike [17].) on the MERLSphere dataset. Without global illumination effects, we simply evaluate the ability of our network in representing wide varieties of materials compared to the sumoflobes BRDF [24] introduced in IA14. The results are illustrated in Fig. 5. We observed that our CNNbased algorithm performs comparably well, though not better than IA14, for most of materials, which indicates that Disney’s principled BSDF [10] covers various realworld materials. We should note that as was commented in [10], some of very shiny materials, particularly the metals (e.g., chromesteel and tungstencarbide), exhibited asymmetric highlights suggestive of lens flare or perhaps anisotropic surface scratches. Since our network was trained on purely isotropic materials, they inevitably degrade the performance.
4.3 Evaluation on CyclesPSTest dataset
To evaluate the ability of our method in recovering nonconvex surfaces, we tested our method on CyclesPSTest. Our method was compared against two robust algorithms IW12 [6] and IW14 [7]^{9}^{9}9We used authors’ implementation and set parameters of [6] as and parameters of [7] as ., two modelbased algorithms ST14 [18]^{10}^{10}10We used our implementation of [18] and set . and IA14 [17] and BASELINE [12]. When running algorithms except for ours, we discarded samples whose intensity values were less than in a 16bit integer image for the shadow removal. In this experiment, we also studied the effects of number of images and rotational merging in the prediction.^{11}^{11}11We still augument data by rotations in the training step. Concretely, we tested our method on 17 or 305 images with and in Eq. (9). We show the results in Table 1 and Fig. 6. We observed that all the algorithms worked well on the convex specular SPHERE dataset. However, when surfaces were nonconvex, all the algorithms except ours failed in the estimation due to strong cast shadow and interreflections. It is interesting to see that even the robust algorithms (IA12 [6] and IA14 [7]) could not deal with the global effects as outliers. We also observed that the rotational averaging based on the rotational pseudoinvariance definitely improved the accuracy, though not very much.
4.4 Evaluation on DiLiGenT dataset
Finally, we present a sidebyside comparison on the DiLiGenT dataset [11]. We collected existing benchmark results for the calibrated photometric stereo algorithms [12, 13, 14, 15, 5, 6, 16, 7, 17, 18, 8, 19, 20, 21]. Note that we compared the mean angular errors of [12, 13, 14, 15, 5, 16, 17, 18] reported in [11], ones reported in their own works [19, 20, 21] and ones from our experiment using authors’ implementation [6, 7, 8].^{12}^{12}12As for [8], we used the default setting of their package except that we gave the camera intrinsics provided by [11]
and changed the noise variance to zero.
The results are illustrated in Table 2. Due to the space limit, we only show the top10 algorithms^{13}^{13}13Please find the full comparison in our supplementary. w.r.t the overall mean angular, and BASELINE [12]. We observed that our method achieved the smallest errors averaged over 10 objects, best scores for 6 of 10 objects. It is valuable to note that other topranked algorithms [20, 21] are timeconsuming since HS17 [20] requires the dictionary learning for every different light configuration and TM18 [21] needs the unsupervised training for every estimation while our inference time is less than five seconds (when ) for each dataset on CPU. Taking a close look at each object, Fig. 7 provides some important insights. HARVEST is the most nonconvex scene in DiLiGenT and other stateofthe art algorithms (TM18 [21], IW14[7], ST14 [18]) failed in the estimation of normals inside the “bag” due to strong shadow and interreflections. Our CNNbased method estimated much more reasonable surface normals there thanks to the network trained based on the carefully created CyclesPS dataset. On the other hand, our method did not work best (though not bad) for READING which is another nonconvex scene. Our analysis indicated that this is because of the interreflection of highintensity narrow specularities that were rarely observed in our training dataset (Narrow specularities appear only when roughness in the principled BSDF is near zero).5 Conclusion
In this paper, we have presented a CNNbased photometric stereo method which works for various kind of isotropic scenes with global illumination effects. By projecting photometric images and lighting information onto the observation map, unstructured information is naturally fed into the CNN. Our detailed experimental results have shown the stateoftheart performance of our method for both synthetic and real data especially when the surface is nonconvex. To make better training set for handling narrow interreflections is our future direction.
References
 [1] Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A.: Endtoend learning of geometry and context for deep stereo regression. Proc. ICCV (2017)
 [2] Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: Sfmnet: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017)
 [3] Kar, A., Häne, C., Malik, J.: Learning multiview stereo machine. Proc. NIPS (2017)
 [4] Kim, K., Gu, J., Tyree, S., Molchanov, P., Niessner, M., Kautz, J.: A lightweight approach for onthefly reflectance estimation. Proc. ICCV (2017)
 [5] Wu, L., Ganesh, A., Shi, B., Matsushita, Y., Wang, Y., Ma, Y.: Robust photometric stereo via lowrank matrix completion and recovery. In: Proc. ACCV. (2010)
 [6] Ikehata, S., Wipf, D., Matsushita, Y., Aizawa, K.: Robust photometric stereo using sparse regression. In: Proc. CVPR. (2012)
 [7] Ikehata, S., Wipf, D., Matsushita, Y., Aizawa, K.: Photometric stereo using sparse bayesian regression for general diffuse surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 36(9) (2014) 1816–1831
 [8] Qu au, Y., Wu, T., Lauze, F., Durou, J.D., Cremers, D.: A nonconvex variational approach to photometric stereo under inaccurate lighting. In: Proc. CVPR. (2017)
 [9] Cycles. https://www.cyclesrenderer.org/
 [10] Burley, B.: Physicallybased shading at disney, part of practical physically based shading in film and game production. SIGGRAPH 2012 Course Notes (2012)
 [11] Shi, B., Mo, Z., Wu, Z., D.Duan, Yeung, S.K., Tan, P.: A benchmark dataset and evaluation for nonlambertian and uncalibrated photometric stereo. IEEE Trans. Pattern Anal. Mach. Intell. (2018) (to appear)
 [12] Woodham, P.: Photometric method for determining surface orientation from multiple images. Opt. Engg 19(1) (1980) 139–144
 [13] Alldrin, N., Zickler, T., Kriegman, D.: Photometric stereo with nonparametric and spatiallyvarying reflectance. In: Proc. CVPR. (2008)
 [14] Goldman, D.B., Curless, B., Hertzmann, A., Seitz, S.M.: Shape and spatiallyvarying brdfs from photometric stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(6) (2010) 1060–1071
 [15] Higo, T., Matsushita, Y., Ikeuchi, K.: Consensus photometric stereo. In: Proc. CVPR. (2010)
 [16] Shi, B., Tan, P., Matsushita, Y., Ikeuchi, K.: Elevation angle from reflectance monotonicity. In: Proc. ECCV. (2012)
 [17] Ikehata, S., Aizawa, K.: Photometric stereo using constrained bivariate regression for general isotropic surfaces. In: Proc. CVPR. (2014)
 [18] Shi, B., Tan, P., Matsushita, Y., Ikeuchi, K.: Bipolynomial modeling of lowfrequency reflectances. IEEE Trans. Pattern Anal. Mach. Intell. 36(6) (2014) 1078–1091

[19]
Santo, H., Samejima, M., Sugano, Y., Shi, B., Matsushita, Y.:
Deep photometric stereo network.
In: International Workshop on Physics Based Vision meets Deep Learning (PBDL) in Conjunction with IEEE International Conference on Computer Vision (ICCV). (2017)
 [20] Hui, Z., Sankaranarayanan, A.C.: Shape and spatiallyvarying reflectance estimation from virtual exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 39(10) (2017) 2060–2073
 [21] Taniai, T., Maehara, T.: Neural Inverse Rendering for General Reflectance Photometric Stereo. In: Proc. ICML. (2018)
 [22] Goldman, D., Curless, B., Hertzmann, A., Seitz, S.: Shape and spatiallyvarying brdfs from photometric stereo. In: Proc. ICCV. (October 2005)
 [23] Ward, G.: Measuring and modeling anisotropic reflection. Computer Graphics 26(2) (1992) 265–272
 [24] Chandraker, M., Ramamoorthi, R.: What an image reveals about material reflectance. In: Proc. ICCV. (2011)
 [25] Shen, H.L., Han, T.Q., Li, C.: Efficient photometric stereo using kernel regression. IEEE Transactions on Image Processing 26(1) (2017) 439–451
 [26] Silver, W.M.: Determining shape and reflectance using multiple images. Master’s thesis, MIT (1980)
 [27] Hertzmann, A., Seitz, S.: Examplebased photometric stereo: shape reconstruction with general, varying brdfs. IEEE Trans. Pattern Anal. Mach. Intell. 27(8) (2005) 1254–1264
 [28] G. Huang, Z. Liu, L.M.K.W.: Densely connected convolutional networks. In: Proc. CVPR. (2017)
 [29] Matusik, W., Pfister, H., Brand, M., McMillan, L.: A datadriven reflectance model. ACM Trans. on Graph. 22(3) (2003) 759–769
 [30] Alldrin, N., Kriegman, D.: Toward reconstructing surfaces with arbitrary isotropic reflectance: A stratified photometric stereo approach. In: Proc. ICCV. (2007)
 [31] Stark, M., Arvo, J., Smits, B.: Barycentric parameterizations for isotropic brdfs. IEEE Trans. on Visualization and Computer Graphics 11(2) (2011) 126–138
 [32] Montes, R., Urena, C.: An overview of brdf models. Technical report, LSI2012001 en Digibug Coleccion: TIC167  Articulos (2012)
 [33] Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In Proc. ICDAR (2003)
 [34] Schmidt, U., Roth, S.: Learning rotationaware features: From invariant priors to equivariant descriptors. Proc. CVPR (2012)
 [35] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In proc. ICLR (2014)
 [36] Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: An InformationRich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)
 [37] Blender. https://www.cyclesrenderer.org/
 [38] Chollet, F., et al.: Keras. https://github.com/kerasteam/keras (2015)
Comments
There are no comments yet.