I Introduction
Reconstructing nonrigid (i.e., moving and deforming) objects, such as people and animals, has wide varieties of applications, such as novel view synthesis [1, 2]. Being different from rigid object/scene reconstruction (e.g. [3, 4, 5, 6, 7, 8]), which can cast 3D reconstruction into an alignment problem, captured frames with nonrigid objects must be handled as a sequence instead of as different views of the same scene since their shape may change from one frame to the next. This makes the problem challenging.
Some approaches have addressed this problem without using any prior knowledge of the object [9, 10, 11, 12] or with using only the assumption of articulated objects (e.g. [13]). These approaches have an inherent weakness in synthesizing unobserved shapes and textures, which is critical for some applications because they often require view synthesis from arbitrary viewpoints, even from an unobserved direction.
If we know in advance the object we are going to capture, we can make use of some prior knowledge in order to address this issue. A 3D geometry template is one possible source of prior knowledge for fullbody reconstruction. Templatebased methods basically acquire a shape template of the target before actually capturing it in motion and subsequently fit the template to measurements obtained from cameras or RGBD sensors [13, 11]. This approach largely relies on nonrigid 3D registration and may suffer from insufficient constraints over a possible motion of the target objects, which may be trapped in a local minimum far from the global one.
For capturing humans, in particular, we can use human shape models (e.g. [14]) as prior knowledge. Particularly, statistic human shape models (e.g. [15, 16, 17, 18]) can serve as a strong regularizer on possible variations and deformations of human bodies, such as poses and body shape (e.g. tall, short, slim, and sturdy). These statistical models are trained with a number of fullbody measurements. Through reducing the number of parameters, it is more likely to find a local minimum sufficiently close to the optimal, even with a partial measurement.
Generally, human bodies exhibit nonrigid deformations according to their poses (e.g. bending arms deforms muscles, skin, and clothes), which we call posedependent deformations, and statistic models can describe such posedependent deformation only partially. Existing datasets, such as [19], can be used for training a statistic model but contain measurements of people only in skintight clothes; therefore, posedependent deformations of muscles and skin are encoded in the model, but those of clothes are not.
As mentioned above, the key role of such statistic human shape models is to interpolate unobserved surfaces in measurements of human bodies. However, people rarely wear skintight clothes in real situations, and the gap between real measurements and ones in the dataset may hinder from plausible interpolation (for example, clothing folds in unobserved volumes may be smoothed out during the fitting process). A statistic model at least cannot fill in unobserved surfaces with clothing folds, which may cause significant visual artifacts in rendered models.
This paper proposes a method for fullbody reconstruction of moving nonrigid 3D objects, primarily of humans, from RGBD measurements. Our method also uses a statistic model for rough reconstruction. In this sense, ours is similar to the method by Bogo et al. [17], in which they developed a multiresolution statistic model and a sophisticated technique for fitting that simultaneously optimizes shape, a single set of textures, and a displacement map. They tested their method with people in skintight clothes.
In contrast, our method is designed to handle people with loose clothes, in which a statistic model does not work very well. The main idea of our method is that estimation of posedependent deformations in unobserved surfaces, which are represented by a relatively small number of parameters using PCA analysis. Instead of finding an accurate 3D mesh, we use a rough 3D mesh as the base shape and apply precise deformations to it. By doing this, our method can handle posedependent deformations, such as clothing folds.
To achieve this, we propose to use the eigentexture method[20, 21], which embeds view and lightdependent texture representation of each triangle in 3D meshes in low dimensional spaces, provided that the textures and deformations have a certain regularity. Under the assumption that we can measure a human body in various poses from various directions in the course of measuring the person in motion and that deformation is solely dependent on poses, we can synthesize the posedependent components in a human body. The main contribution of this paper is summarized as follows:

We introduce eigentexturing to textured fullbody reconstruction in order to compress texture representation as well as to synthesize textures on unobserved surfaces.

We propose eigendeformation, which embeds the displacement between a statistic model and a fullyfitted 3D mesh into a low dimensional space, enabling the displacement estimation of unobserved surfaces with a relatively small number of parameters.

In order to estimate the parameters for eigentexturing and eigendeformation for unobserved body parts, we develop a neural network(NN)based coefficient regression so as to synthesize a texture and deformation for arbitrary poses as well as viewing directions.
Ii Overview
The difficulty in statistical shape modelbased fullbody reconstruction of a moving human body in loose clothing lies mostly in the reproduction of posedependent deformations that are not described by a statistic shape model. Our idea is to represent such deformations by texture and individual displacement of mesh vertices, both of which are embedded into low dimensional spaces (eigentexture and eigendeformation). The individual displacements represent the difference between the statistic shape model’s mesh and one that fully registered to the measurement and count for relatively large deformations while the texture reproduces the detail. In the same way as the eigentexture method [20, 21]
, our system can compress the storage size used for individual textures and displacements by using a small number of eigenvectors and their coefficients. In addition, it can interpolate unobserved surfaces by using the bases for full body.
Fig. 1 shows an overview of our fullbody reconstruction system. At the preprocessing stage, the system registers a statistic shape model to sequences of point clouds obtained from RGBD measurements. We use SMPL model for nonrigid registration. With nonrigid regisitration, we obtain parameters of the statistical model (body shape parameters and a set of joint angles ), as well as mesh that is fully registered to each point cloud (bottom left of Fig. 1). As displacements, we compute the difference between and , which can be done solely from . Then, the displacements are embedded into low dimensional subspace (eigendeformation) by using a similar way to the eigentexture method (top left of Fig. 1). At the rendering stage, is firstly recovered from , and then the textures and the displacements are reconstructed from the coefficients, which are estimated by using a NNbased regression. Adding all the displacements to , clothing mesh is reconstructed.
Iii Nonrigid registration with SMPL
We used the SMPL model [18] for nonrigid regisitration to simulation data and scanned data. First, we make a few correspondence points used as anchor points between SMPL model () and the simulation or scanned data () manually. According to the anchor points, we fit the SMPL model, which is used as the naked model in the regression stage as in Fig. 7(the second model from right), by estimating the pose of it. Then, we obtain additional correspondence points using boundaries, where we calculate closest points. Using those correspondences, the SMPL model is inflated to fit the target model. If this step is abandoned, the SMPL model is frequently collapsed due to large shape differences at initial registration (e.g.left side of arm corresponds to the right side of arm). Finally, we supersample the vertices and made correspondences between them ( and ) with the nearest neighbor method as in Fig. 7(the rightmost model). After these steps, we conduct nonrigid registration by minimizing following energy function:
(1)  
where is the set of anchor point indices, is the set of corresponding point indices, which are made in final step, is the set of all joint indices, is the set of vertex indices, is the kth vertex of the SMPL model, is penalty for inplausible joint angles of elbow and knee, is penalty for large changes in joint angles, is penalty for large values, and and are weight values.
Iv Eigentexture
One of the widely accepted ways to reproduce deformations, especially small ones, is to use texture mapping. With the assumption of skintight clothes as in Bogo et al. [17], the texture can be a static image; however, we relax this assumption using dynamically changing textures. As a representation of texture, we use eigendecomposition method for reducing the storage size required for texture images. In addition, our body scans obtained from two RGBD sensors usually have some unobserved areas. The system reconstructs plausible fullbody mesh thanks to the statistic shape model; however, the textures of triangles in such areas are not recoverable due to unavailability of prior knowledge on the texture. Eigentexture finds the manifold on which textures on the same triangle lie and thus can synthesize the texture for unobserved regions.
To extract the texture of a triangle, we first assess the visibility of each triangle in mesh . The system renders with the standard OpenGL pipeline to get the depth map in the camera coordinate system. We also render a triangle ID map by colorcoding the ID of each triangle. For each pixel with a certain triangle ID, the pixel is backprojected onto the corresponding triangle in and then is projected to the depth map. The pixel should be visible if the difference between the corresponding depth value in the depth map and the third coordinate of the point on the triangle is small with a certain threshold. We judge that a triangle is visible if all pixels in the triangle are visible since partly occluded triangle may significantly spoil the reconstruction. For each visible triangle , 3D positions , , and of its vertices are projected onto the RGB image to extract the texture for the triangle. Hereinafter the subscript may be omitted as long as it is not ambiguous.
Then, we apply eigendecomposition [20] to each triangle, which is briefly introduced here to make the paper selfcontained. Let be a matrix whose th column
is the vectorized texture of a certain triangle that is visible, where
is in , is the number of pixels in the triangle, and is the number of frames in which the triangles are visible. can be arbitrarily set to a sufficiently large number because we can arbitrarily warp the texture. All column vectors in can be centralized, i.e., , to form using the averaged texture over the frames with being the triangle visible. We can factorize as a following equation:(2) 
where is the
th eigenvalue and
the th eigenvector.We can embed a texture into the subspace spanned by a subset of eigenvectors. The lowdimensional representation (i.e., the coefficient for each eigenvector) of texture can be computed by
(3) 
where is a matrix whose columns are a set of largest eigenvectors ( is the number of eigenvectors for the subspace). We can also reconstruct texture from with
(4) 
This means that unobserved textures can be synthesized if we can regress the lowdimensional coefficient vector.
3 bases 5 bases 10 bases 
V Eigendeformation
Our eigentexture method is a powerful tool to visually represent deformations caused by loose clothes. However, such texturingbased compensation may not be enough for big deformations. In order to synthesize deformations in unobserved surfaces as well as to compress the storage for storing individual vertex positions, we propose eigendeformation, inspired by the eigentexture method.
The basic idea is almost the same as eigentexture, but eigendeformation deals with vertex positions. There is an inherent difference between textures and deformations: Given that the mesh is well registered to the point cloud, the variations in textures are not significant since such variations are caused solely by local deformations, such as wrinkles. On the other hand, those in vertices come from body poses. That is, changes in, e.g., the shoulder joint angle, ends up with large changes in the vertex positions of forearms. Thus, direct application of eigendecomposition to vertex positions may not work well.
To improve the representativity, we compute displacement vectors of each part between the statistic model mesh and the fullyregistered mesh . We represent the displacement vector in a certain coordinate system associated with each body part; therefore, only the difference between and is counted in the displacement vector. Body parts are divided at each joint, and their indices are shown in Fig. 3(right).
The displacement vector of the th vertex of the th body part in mesh are computed by for the corresponding vertex position in , where and are the th vertices in the th body parts in and , and is a rigid transformation matrix between the entire body coordinate system and each body part’s coordinate system. We concatenate these displacement vectors to form a column vector , and then aggregate these displacement vectors over all frames in which the triangle is visible. These vectors are centralized as in eigentexture and again concatenated to form matrix . We apply eigendecomposition to to obtain eigenvectors. We can also embed/reconstruct the displacement vectors into/from the subspace spanned by the eigenvectors. Examples of cumulative contribution ratios for three body parts from measurements of a real human body are shown in Fig. 4. We can see that the ratio drastically increases with a small number of eigenvectors.
When reconstructing , we firstly recover from and using Eq. (1), and then recover using a small number of eigenvectors. Finally is calculated for each vertex of each body part and added to . Examples of recovered shapes with a small number of eigenvectors of a real human body are shown in Fig. 3, where the error in recovered shape is visualized by pseudo color, demonstrating a larger number of eigenvectors decrease the error.
Vi NNbased coefficient regression
In order for fullbody reconstruction, we interpolate unobserved textures and deformations in unobserved surfaces. We do this via coefficient regression in the eigentexture and eigendeformation’s spaces. Provided that the illumination and RGBD sensors are fixed in our case, the variations of the textures and the displacements of the same triangle are solely explained by the person’s pose. More specifically, joint angles represented by rotation matrices mostly determine them. This implies that coefficients for eigenvectors (or coordinates in the lowdimensional spaces) can be regressed from the rotation matrices. Therefore, we train NNbased regressors that map a joint angle (a rotation matrix) to the coefficients.
Let be the vectorization of the rotation matrix that represents a certain body part’s joint angle. Since the relationship between rotation matrices and coefficients are unknown, we use a NN with two layers to represent the nonlinearity. Our regressor gives coefficients by
(5) 
where is in and in . The regressor is trained with the gradient descent algorithm. For regularization, we employ weight decay. Examples of estimated coefficient from pose are shown in Fig. 6 and reconstructed shapes are shown in Fig. 6, implying authentication of the algorithm.
Vii Experiments
Textured  Nontextured  Naked  Clothed 
model  model  model  model 
(sim data)  (sim data)  (SMPL)  (SMPL) 
3 bases  5 bases  10 bases 
Frame #300  #400  #440 
Frame #7 #247 #416  Frame #247 #280 #319 
Viia Evaluation with synthetic data
We applied our method to synthetic data for evaluation purpose. We use a 3D mesh model of entire human body with/without clothes, which are commercially available. Skeleton (bones) was also attached to the mesh model so that we were able to create sequences of 3D meshes and rendered images using 3DCG software, e.g., 3DMAX. Since muscle deformation and cloth simulation were employed in rendering, realistic shape deformations with complicated shading effects were represented. Some examples of rendered images are shown in Fig. 7. In the following, the sequence of rendered images, 3D meshes with/without clothes, and poses are used for inputs.
First, we applied eigentexture and eigendeformation. Cumulative contribution ratios are shown in Fig. 10 and differences from ground truth meshes (i.e., clothed meshes) are shown in Fig. 8. As shown in the plots and figures, we can see that 10 eigenvectors are sufficient to represent original mesh. This is equivalent to 2.52% of the original data.
Next, we evaluated our regressors. Results are shown in Fig. 9. As shown in the figures, our NNs worked well with synthetic data and most of the body parts have small errors. There are large errors around the arm joint and crotch; those errors mainly came from failure in our fitting algorithm. Although those areas are usually invisible in rendered images and thus less critical for practical uses, we need to seek for a solution in our future work.
We also evaluated interpolation accuracy. We adopted two scenarios, such as shortterm and longterm interpolation as explained in Fig. 13. Results for interpolation and extrapolation are shown in Fig. 13 and Fig. 13 and coefficient values for extrapolation are shown in Fig. 14. As shown in the figures, extrapolation tends to make larger errors than interpolation, however, regressed coefficients still have a similar trend to the ground truth.
The textured results with interpolated shapes are shown in Fig. 15. Considering the compression rate of 2.52%, the visual quality of the rendered images is comparable to video compression, in addition to the significant advantage of arbitrary viewpoint rendering.
Ground truth by CG 


Results with texture 

Frame #247  Frame #400  Frame #440 
captured images by Kinect  Kinect  Proposed  
Frame #0 
Kinect 
Proposed  Kinect  Proposed 
Frame #200  Frame #400  

ViiB Demonstration with real data
We used two calibrated RGBD sensors to obtain two sequences of a moving person from the person’s front and back for real data experiment. A pair of depth measurements from each corresponding pair of frames were integrated according to the RGBD sensors’ relative poses to make a single point cloud. This point cloud, as well as the corresponding RGB images, still had unobserved surfaces due to, e.g., selfocclusion although they were not large. Our system was able to synthesize the deformations in such surfaces. Note that our eigentexture and deformationbased approach can be potentially applied even to a single RGBD sequence.
Viii Conclusion
In this paper, we presented eigentexturing and eigendeformation method enabling fullbody reconstruction with loose clothes. By using lowerdimensional embeddings of texture and deformation, i.e., 10 coefficients for our datasets, the storage size required to store our model representation is drastically reduced. It is also capable of longterm interpolation. We evaluated our method using both synthetic and real data, proving the effectiveness of our method in both visually and quantitatively. In the future, more complicated shape like skirt should be taken into account.
Acknowledgment
This work was supported by JSPS/KAKENHI 16H02849, 16KK0151, MIC/SCOPE 171507010 and MSR CORE12.
References

[1]
J.Y. Guillemaut, J. Kilner, and A. Hilton, “Robust graphcut scene
segmentation and reconstruction for freeviewpoint video of complex dynamic
scenes,” in
Proc. IEEE Int. Conf. Computer Vision (ICCV)
, 2009, pp. 809–816.  [2] A. Collet, M. Chuang, P. Sweeney, D. Gillett, D. Evseev, D. Calabrese, H. Hoppe, A. Kirk, and S. Sullivan, “Highquality streamable freeviewpoint video,” ACM Trans. Graphics (TOG, Proc. ACM SIGGRAPH), vol. 34, no. 4, pp. 69:1–69:13, 2015.

[3]
M. Jancosek and T. Pajdla, “Multiview reconstruction preserving
weaklysupported surfaces,” in
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)
, 2011, pp. 3121–3128.  [4] Y. Furukawa and J. Ponce, “Accurate, dense, and robust multiview stereopsis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 8, pp. 1362–1376, 2010.
 [5] P. J. Besl and N. D. McKay, “A method for registration of 3D shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
 [6] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon, “Kinectfusion: Realtime 3d reconstruction and interaction using a moving depth camera,” in Proc. ACM Symposium on User Interface Software and Technology (UIST), 2011, pp. 559–568.
 [7] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Realtime dense surface mapping and tracking,” in IEEE Int. Symp. Mixed and Augmented Reality, 2011, pp. 127–136.
 [8] T. Whelan, J. McDonald, H. J. M. Kaess, M. Fallon, and J. Leonard, “Kintinuous: Spatially extended KinectFusion,” in Proc. RSS Workshop on RGBD: Advanced Reasoning with Depth Cameras, 2012.
 [9] B. Amberg, S. Romdhani, and T. Vetter, “Optimal step nonrigid ICP algorithms for surface registration,” in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1–8.
 [10] H. Li, R. W. Sumner, and M. Pauly, “Global correspondence optimization for nonrigid registration of depth scans,” 2008, pp. 1421–1430.
 [11] M. Zollhöfer, M. Nießner, S. Izadi, C. Rehmann, C. Zach, M. Fisher, C. Wu, A. Fitzgibbon, C. Loop, C. Theobalt, and M. Stamminger, “Realtime nonrigid reconstruction using an rgbd camera,” ACM Trans. Graphics (TOG, Prof. ACM SIGGRAPH), vol. 33, no. 4, pp. 156:1–156:12, 2014.
 [12] R. A. Newcombe, D. Fox, and S. M. Seitz, “Dynamicfusion: Reconstruction and tracking of nonrigid scenes in realtime,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 343–352.
 [13] H. Li, B. Adams, L. J. Guibas, and M. Pauly, “Robust singleview geometry and motion reconstruction,” ACM Trans. Graphics (TOG, Proc. ACM SIGGRAPH), vol. 28, no. 5, pp. 175:1–175:10, 2009.
 [14] C. Malleson, M. Klaudiny, A. Hilton, and J. Y. Guillemaut, “Singleview RGBDbased reconstruction of dynamic human geometry,” in IEEE Int. Conf, Computer Vision Workshops, 2013, pp. 307–314.
 [15] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, J. Davis, and S. Cruz, “Scape: Shape completion and animation of people,” ACM Trans. Graphics (TOG, Proc. ACM SIGGRAPH), vol. 24, no. 3, pp. 408–416, 2005.

[16]
Y. Chen, Z. Liu, and Z. Zhang, “Tensorbased human body modeling,” in
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2013, pp. 105–112.  [17] F. Bogo, M. J. Black, M. Loper, and J. Romero, “Detailed fullbody reconstructions of moving people from monocular rgbd sequences,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2015, pp. 2300–2308.
 [18] M. Loper, N. Mahmood, J. Romero, G. PonsMoll, and M. J. Black, “SMPL: A skinned multiperson linear model,” ACM Trans. Graphics (Proc. SIGGRAPH Asia), vol. 34, no. 6, pp. 248:1–248:16, Oct. 2015.
 [19] L. Pishchulin, S. Wuhrer, T. Helten, C. Theobalt, and B. Schiele, “Building statistical shape spaces for 3d human modeling,” arXiv preprint, arXiv:1503.05860, pp. 1–10, 2015.
 [20] K. Nishino, Y. Sato, and K. Ikeuchi, “Eigentexture method: Appearance compression and synthesis based on a 3d model,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1257–1265, 2002.
 [21] Y. Nakashima, F. Okura, N. Kawai, H. Kawasaki, A. Blanco, and K. Ikeuchi, “Realtime novel view synthesis with eigentexture regression,” Proceedings of British Machine Vision Conference), 2017.
Comments
There are no comments yet.