3D face has been widely used in areas of face recognition and verification, expression recognition and facial animations. Face reconstruction is usually based on a scene that the subject is fixed and scanned with special equipment under a lab condition, video sequences or multi-view photographs. Recently, more challenging works are reconstruction from wild images with unknown camera calibration.
As a whole, there are two kinds of reconstruction technologies: high-cost equipment relied technologies under lab condition and low-cost 2D image based techniques out of lab. The former ones have implemented high quality face reconstruction such as laser , stereo [2, 3, 24], structural lighting , and light stages . These technologies always rely on high resolution camera, synchronized data or calibration. The latter kinds of reconstructions are based on 2D informations, such as single-image method , video sequences based method , multi-view photographs [16, 11], and even wild images based methods [15, 18]. In contrast, the latter are low-cost and convenient to be distributed. It is greatly significant but challengingly to research on face reconstruction from images.
. SFS assumes that the subject has a Lambertian surface and relatively distant illumination, to compute surface normals. SFM is based on assumption that it exists a coordinate transformation between image coordinate system and camera coordinate system, to compute the positions of surface points. And the idea of 3DMM is that human faces are within a linear subspace, and that any novel face shape can be represented by linear combination of shape eigenvectors deduced by PCA. SFS and SFM give the geometrical and physical descriptions of face shape and imaging, and 3DMM concentrates on the statistical explanation of 3D meshes or skeletons. We believe face reconstruction is rather a geometrical optimization problem than a statistical problem, as 3DMM is just an assist of geometrical method when building detailed shape,eg..
Motivated by the geometric and physical thought in SFS and SFM, we propose to use B-spline surface to represent face shape, because it is a continual surface model, while 3D mesh or skeleton model are just approximation of continuous face by using discontinuous fragments. Deep in relation between B-spline face and face images, we find that motion and shading in images are from 0th- and 1st- order derivatives, respectively, of B-spline face surface, by which a complete geometrical optimization for B-spline surface reconstruction can be introduced: 1) the dense shading information can be applied to optimize local B-spline shape; and 2) the sparse motion can be used to optimize the global. During optimization, 2nd-order informations are also used to guarantee a smooth warping and local information keeping. The processes 1) and 2) iterate until convergence. Significantly, the final B-spline face is high-order differential, and can approximate the true shape in arbitrary precision.
Shape in Shading and Structure in Motion. SFS has been widely used for reconstruction, e.g. single-view reconstruction , multiple frontal images based reconstruction , and wild image based reconstruction [15, 18]. As single-view is ill posed , a reference is always needed . For wild images, photometric stereo is applied to obtain accurate normals locally [15, 18]. SFM uses multiple frame or images to recover sparse 3D structure of feature points of an object . Spatial-transformation approach 
only estimates the depth of facial points. Bundle adjustment fits the large scale rigid object reconstruction, but it cannot generate the dense model of non-rigid face. Incremental SFM approach  is proposed to build a 3D generic face model for non-rigid face. The paper  optimizes the local information with normals from shading, based on a 3D feature points-driven global warping. Therefore, shading and motion are very important and different geometric information of face, will enhance the reconstruction when they are combined. In our method, they are integrated into B-spline face model as 1st- and 0th- order derivatives of spline surface.
Face Surface Modeling.
The surface modeling depends on the data input (point cloud, noise, outlier, etc), output (point cloud, mesh, skeleton), and types of shape(man-made shape, organic shape). Point cloud, skeleton and mesh grid are the widely used man-made shape type for face reconstruction. Spatial transformation method estimates positions of sparse facial feature points. Bundle adjustment  builds the dense point cloud for large scale rigid object with a great number of images. Jingu et al  reconstruct face dense mesh based on skeleton and 3DMM. Kemelmacher et al  apply integration of normals to get discrete surface points. Roth et al 
reconstruct face based on Laplace mesh editing. The method of surface-smoothness priors is also needed for reconstructing smooth mesh on point cloud, e.g. radial basis function and Poisson surface reconstruction . Due to the fact that point cloud and 3D mess are discontinuous geometric shape, they cannot approximate the true shape of face in arbitrary precision. In contrast, our technology is based on a continuous free-form surface that approximates a organic shape, and reconstruct B-spline face from images directly, instead of intermediate point data. What’s more, because of B-spline surface is a case of NURBS (Non-Uniform Rational B-Spline), the B-spline reconstruction also can be imported to 3D modeling software like Rhino3D for further editing, analysis, and transformation conveniently by adjusting the B-spline control points.
In our work, initial local normals and motion estimation are obtained from images by referring to the work , and they are updated during B-spline optimization. The B-spline face is optimized and updated locally by using normals and globally by 3D feature points respectively, and the process iterates until convergence.
In summary, there are two primary contributions:
1. The face is described as a B-spline face whose 1st- and 0th- order derivatives can be discovered from shading and motion informations respectively in the images; and iterative multi-least-square (IMLS) optimization algorithm is also given.
2. We introduce an automatic free-form surface modeling for face reconstruction from multiple images with variation such as different poses, illuminations, and expressions etc., even from images in the wild.
The proposed method is operated on a set of face images with different poses, illuminations, and expression, of an individual. In this section, face is modeled as a uniform B-spline surface, whose 1st- and 0th- order derivatives have intimate relations with normals in shading and 3D feature points from motion respectively in the images in Section 2.1. Therefore, local and global B-spline optimization by shading and motion are formulated in Section 2.2 and 2.3 severally. Finally, an IMLS optimization algorithm is given in Section 2.4.
2.1 Modeling B-spline Face
Assuming human face is a uniform B-spline surface, surface curvature and normal at each point are described by basis function and control points, and face can be reconstructed conveniently by computing the B-spline control points.
We model face as a uniform B-spline surface F of degree , with as the control points, and as knots spliting parameter plane in and direction. Then
Particularly, is -continuous, and z-component of a B-spline face surface and its high order partial derivatives w.r.t. and are shown by the maps in Fig.1. The continuous derivability means that F can approximate the true shape in arbitrary precision with deterministic -ordered partial derivative and , , and .
2.1.1 Surface normals from 1st-order derivative
The 1st-order partial derivatives of F w.r.t and are
respectively. And a more brief form of 1st-order derivatives at can be written linearly as
where vectorrepresents B-spline control points, and and are 3-by- matrixes stacking the 1st-order coefficient of control points, being referred as 1st-order coefficient matrixes.
Therefore, the surface normal vector at can be computed by the cross product
2.1.2 3D structure by 0th-order derivative
Particularly, 3D structure defined by feature points on the surface, as shown in Fig.2, is given intuitively as
Equ.4 can be rewritten linearly as
where is a 3-by- 0th-order coefficient matrix. is also a sparse matrix as every feature point is determined by only 16 control points according to .
Surface normals are the geometric details of a surface, and structure is a sparse but global shape of a face. They are 1st- and 0th- order information that can be represented by control parameter b, seen in Equ.2, Equ.3 and Equ.5. Conversely, B-spline parameter b can be rebuilt from normals and structure, to build the face surface of an individual.
2.2 Local B-spline Shape From Shading
The shading in images covers normals and albedo of a surface, which can be used to optimize B-spline control points of a B-spline face. Firstly, photometric stereo method is used to estimate the normal information by assuming that face images are captured from a B-spline shape by a camera model of scaled orthographic projection, relatively distant illumination and Lambertian reflectance. Secondly, a normals driven control points optimization is proposed to optimize the local shape of B-spline face.
2.2.1 Photometric stereo
Photometric stereo is also based on assumption that a set of images of a same person are captured under different illumination conditions, and that the shape of each point can be solved by the observed variation in shading of the images.
Being faced with the variations in the wild photos, it is difficult to define the ”shape” of a person’s face, for which the normal information is also undetermined. Our solution is to solve a face shape that is locally similar to as many photos as possible from the normalized wild images as done in the paper . Firstly, images are normalized and clipped to frontal-posed with pixels, and input into . Secondly, we estimate the initial shape and lighting by factorizing via SVD. and , where .
To approach the true normal information, we estimate the shape and ambiguity by following the approach in paper . Firstly, we update firstly by minimizing to get the initial normal , where , with initial . Secondly, we solve ambiguity by minimizing , where is the normal shape of a template and it is consistent with at level of pixel position. At last, the surface normals are obtained by .
Particularly, the consistent normal shape are from the normals of template , where symbol is compound operator of cross-product in Equ.3.
The definition of operation is , where w and v are vectors containing normals.
Therefore, the vecorization of is
, where vector is , and is a selection operator that selects rows of 1st-order coefficients corresponding to normals consistent with ; and , whose value depends on is a diagonal matrix that stores reciprocals of lengths of the selected normals; and represent the control points of B-spline template face.
Similarly, the obtained can also be vectorized as h that is . Then h can be used to optimize control points b for B-spline objective face conversely, which will be introduced in the following subsection.
2.2.2 Local B-spline optimization
According to the relation of normals vector h and B-spline control points b, B-spline face can be solved by
which is also referred to as 1st-order optimization.
However, there exists two issues: 1) the low-dimension h may not guarantee an unique solution of high-dimension b; and 2) the system is not simply linear, which is difficult to be solved.
Therefore, a frontal constraint with is applied to make a unique solution; And a strategy of approximating to linearization is also proposed to make a linear solution.
Frontal Constraint. As the normals are estimated actually from frontal-face images, a frontal constraint will enhance an effective solution of 1st-order optimization surely by
where matrix T is the stacked coefficients of control points, and is a selection operator that sets the coefficients corresponding to z- components of control points to zeros, retaining the coefficients corresponding to x- and y- components.
This constraint is also referred as 0th-order regularization for local B-spline optimization as following,
Particularly, the item is a linear form, but the first item is not a simple linear form. Therefore, an approximating to linearization is proposed.
Approximating to Linearization. According to the characteristics of the cross-product “”, the first item can be rewritten as a linear-like formulation:
Particularly, the operation makes vector a matrix , where , and
If b is a known parameter, e.g as , for and , the minimization of both will be a linear system.
In fact, formulation is used to optimize the control points in parameter space of by fixing , while is used to optimize the control points in parameter space of by fixing .
A practical skill is to optimize the control points on the two parameter spaces alternately and iteratively. The two iteration items for can be rewritten as
As shown in Fig.3, two partial derivatives and at are updated until converges to n. Finally, the whole surface approaches to the optimal.
By integrating with , the final formulation for local optimization is the two iteration items as following:
2.3 Global B-spline Shape From Motion
The motion informations across images cover the geometry projections of feature points of a 3D object on 2D projection planes, which also can be used to optimize B-spline control points of a B-spline face. What’s important, the motion informations feature their sparsity and globality. Firstly, we use SVD method to re-estimate the sparse motion information by assuming that face images are captured from a B-spline shape by a camera model of a scaled orthographic projection. Secondly, the re-estimated motion information is used to optimize the shape of B-spline face globally. The two steps also are conducted iteratively.
2.3.1 Re-estimating the Motion
According to relation between facial points on image and on B-spline face: , pose parameters , and
for each image are recovered based on the linear transformation and SVD method.
Note a vector stacking feature points of images as the the re-estimated 2D positions, and a projection matrix stacking views of parameters .
2.3.2 Global B-spline Warping
According to the projection relation between 2D feature points and and 3D feature points, warping is based on an optimized B-spline face that contains local information of objective face:
where is a selection operator that selects rows of coefficients corresponding to feature points, and is a matrix that stacks . Regularity item is the distance of local information between faces b and , and is a weight factor. The first item is also referred to 0st-order item.
To eliminate affect of geometric rotation brought by 0st-order warping which changes the local normals greatly, and to make sure a smoothness of shape, we regularize the 0st-order item by using 2nd-order derivative information.
2nd-order Regularization. According to Equ.2 that represents 1st-order partial derivative as , the 2nd-order partial derivative can be introduced as
in a similar way. The regularization item representing the 2nd-order approximation between template and objective face is given as:
which is also referred to as 2nd-order regularization.
With the 0th- and 2nd- order information, a globally warping of B-spline face can be formulated as
An IMLS (iterative multi-least-square) algorithm is proposed to optimize the B-spline face, seen in Algorithm 1. Processes of computing local B-spline shape from shading and warping B-spline globally via motion are carried on iteratively to get the final surface, differing from the work in  where the uses of shading and motion are in two separate processes. Particularly, Minimization (6) and Minimization (7) during local optimization, and Minimization (8) during global optimization are linear system with closed-from solutions of least square, seen in line 6,7,12 in Algorithm 1.
In this section experiments is resented to verify our method of MsSfMS. We first describe the pipeline to prepare a collection of face images of a same person for B-spline face reconstruction. Then quantitative and qualitative comparisons are demonstrated with baseline method on ground truth data  with various expressions, illuminations and poses. Finally, challenging reconstructions are also conducted and compared based on real unconstrained data taken from the Labeled Faces in Wild (LFW)  database.
3.1 Pipeline of experimental data
Ground truth data with expression The ground truth data is from the space-times faces  which contains 3D face models with different expressions. And different poses and illuminations can also be simulated by the spaces-times faces, seen in Fig.5. Images with various poses and illuminations are collected, and feature points manually labeled. The reconstruction is evaluated by the error to the ground truth model.
3.2 Standard images
We conduct 5 session of reconstructions: the first four is reconstruction expression S1, S2, S3 and S4 by using their corresponding images, and the firth reconstruction S5 is based on images with different expressions. Each session contains 40 images with various illumination and different poses. Reconstruction result is compared with the reimplemented method.
To compare the approaches numerically, we compute the shortest point-to-point distance from ground truth to reconstructed surface. Point clouds are sampled from B-spline face and aligned according to absolute orientation problem. Mean euclidean distance and the root mean square (RMS) of the distances, after normalized by the eye-to-eye distance, are reported in Table.1. Visual comparison is shown in Fig.4. Our results demonstrate promise performance. An important fact is that method cannot make a credible depth information and global shape, but our method solves the problem by local optimization and global warping.
3.3 Real unconstrained data
Our method is also tested based on real unconstrained data. Unlike the experiment in paper using hundreds of images, we conduct reconstruction with limited number of images, as so many images are not always available for some task like criminal investigation and small simple size problem. 35 images are collected from LFW for each of six persons (Arnold Schwarzenegger, Colin Powell, Donald Rumsfeld, George W.Bush, Hugo Chavez and Gloria Macapagal Arroyo), covering different pose and expressions. Reconstructions and comparisons are shown in Fig.6. Our method steadily produce good looking model from different viewpoints. Especially for case of Gloria Macapagal Arroyo, while the method is not working, our method generates satisfactory result. It is convinced that the reconstructions express the individual face structure well and fit well with pose-variety photos.
We presented an B-spline free-form surface modeling for face reconstruction,by leveraging the spline cues of motion and shading in a set of face images. The method works well for data with different poses and expressions, even for wild cases. The key contributions are that the proposed B-spline shape model describes the motion and shading geometrically in 0th- and 1st- order derivatives respectively, and that an optimization algorithm is also supplied for reconstruction. In the future, the model also can be extented for other research areas, such as expression tracking and facial animation by adjusting the control points locally, and recognition and verification based on local B-spline shape information.
-  S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski. Building rome in a day. Communications of the ACM, 54(10):105–112, 2011.
-  T. Beeler, B. Bickel, P. Beardsley, B. Sumner, and M. Gross. High-quality single-shot capture of facial geometry. ACM Transactions on Graphics (TOG), 29(4):40, 2010.
-  T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, R. W. Sumner, and M. Gross. High-quality passive facial performance capture using anchor frames. In ACM Transactions on Graphics (TOG), volume 30, page 75. ACM, 2011.
-  V. Blanz, A. Mehl, T. Vetter, and H. P. Seidel. A statistical method for robust 3d surface reconstruction from sparse data. In 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004. Proceedings. 2nd International Symposium on, pages 293–300, 2004.
-  V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 187–194, 1999.
-  X. P. Burgos-Artizzu, P. Perona, and P. Dollár. Robust face landmark estimation under occlusion. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1513–1520. IEEE, 2013.
-  J. C. Carr, R. K. Beatson, J. B. Cherrie, T. J. Mitchell, W. R. Fright, B. C. Mccallum, and T. R. Evans. Reconstruction and representation of 3d objects with radial basis functions. Computer Graphics Pages–76 Acm Siggraph, pages 67–76, 2001.
-  J. Fortuna and A. M. Martinez. Rigid structure from motion from a blind source separation perspective. International journal of computer vision, 88(3):404–424, 2010.
-  A. Ghosh, G. Fyffe, B. Tunwattanapong, J. Busch, X. Yu, and P. Debevec. Multiview face capture using polarized spherical gradient illumination. ACM Transactions on Graphics (TOG), 30(6):129, 2011.
-  J. Gonzalez-Mora, F. De la Torre, N. Guil, and E. L. Zapata. Learning a generic 3d face model from 2d image databases using incremental structure-from-motion. Image and Vision Computing, 28(7):1117–1129, 2010.
J. Heo and M. Savvides.
In between 3D Active Appearance Models and 3D Morphable Models.
Computer Vision and Pattern Recognition, 2009.
-  G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.
-  M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In Proceedings Symposium on Geometry Processing (SGP ’06, 32(3):10–10, 2006.
-  I. Kemelmacher, Shlizerman and R. Basri. 3d face reconstruction from a single image using a single reference face shape. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(2):394–405, 2011.
-  I. Kemelmacher, Shlizerman and S. M. Seitz. Face reconstruction in the wild. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1746–1753. IEEE, 2011.
-  H.-S. Koo and K.-M. Lam. Recovering the 3d shape and poses of face images based on the similarity transform. Pattern Recognition Letters, 29(6):712–723, 2008.
-  E. Prados and O. Faugeras. Shape from shading: a well-posed problem? In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 870–877. IEEE, 2005.
-  J. Roth, Y. Tong, and X. Liu. Unconstrained 3d face reconstruction. Trans. Graph, 33(4):43, 2014.
-  A. K. Roy-Chowdhury and R. Chellappa. Statistical bias in 3-d reconstruction from a monocular video. Image Processing, IEEE Transactions on, 14(8):1057–1062, 2005.
-  Z.-L. Sun, K.-M. Lam, and Q.-W. Gao. Depth estimation of face images using the nonlinear least-squares model. Image Processing, IEEE Transactions on, 22(1):17–30, 2013.
-  C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9(2):137–154, 1992.
-  H. Wang, H. Wei, and Y. Wang. Face representation under different illumination conditions. In Multimedia and Expo, 2003. ICME ’03. Proceedings. 2003 International Conference on, pages 285–288, 2003.
-  T. Weise, B. Leibe, and L. Van Gool. Fast 3d scanning with automatic motion compensation. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pages 1–8. IEEE, 2007.
-  C. Wu, C. Stoll, L. Valgaerts, and C. Theobalt. On-set performance capture of multiple actors with a stereo camera. ACM Transactions on Graphics (TOG), 32(6):161, 2013.
-  C. Yang, J. Chen, N. Su, and G. Su. Improving 3d face details based on normal map of hetero-source images. In 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 9–14, 2014.
-  A. L. Yuille, D. Snow, R. Epstein, and P. N. Belhumeur. Determining generative models of objects under varying illumination: Shape and albedo from multiple images using svd and integrability. International Journal of Computer Vision, 35(3):203–222, 1999.
-  L. Zhang, N. Snavely, B. Curless, and S. M. Seitz. Spacetime faces: High-resolution capture for~ modeling and animation. In Data-Driven 3D Facial Animation, pages 248–276. Springer, 2007.
-  R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from-shading: a survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 21(8):690–706, 1999.