1 Introduction
Recovering human shape from a single image is a challenging task in the computer vision area. This task aims at predicting both human pose and shape parameters simultaneously. It can be adopted to a large variety of applications such as 3D human reconstruction, human computer interaction, etc. However, this task is still unsolved due to the ambiguities in depth and the selfocclusions of human poses.
Recent results have shown that optimizing 3D2D consistency between a 3D human body model and 2D image cues is beneficial for both trainingbased and optimizationbased approaches [4, 15, 12]. Existing methods mostly fit the projected body model in the image space to 2D keypoints and silhouette but ignore body part information which is critical for resolving depth ambiguity based on occlusion reasoning. As shown in Figure 1, predicted model in (a) is consistent with groundtruth 2D keypoints and silhouettes, but the pose is still incorrect as two forearms have wrong depth. The models which are applicable to represent body parts in (c) can be optimized with correct part segmentation in (b) to achieve the proper pose and shape.
Current representations for human body succeed in developing 3D pose estimation, but still remain a great number of ambiguities. The reason is lacking a proper body model with efficient body part representation in both 2D and 3D. For example, the skeleton is a simple and effective representation for the 3D pose estimation. However, it is impractical to support the reasoning about occlusion and collision and tell the somatotype of the estimated body only with the position of joints. On the other hand, the parametric models like SMPL [19] and SMPLX [22] utilize thousands of meshes to represent human body and can reconstruct a more detailed human body. These models are expressive, but encode the shape with several latent space parameters which makes it hard to refine the body part independently. Moreover, SMPL is computationally expensive due to the redundant number of faces. Therefore, an intermediate representation applicable to represent body parts will solve the above limitations, which bridges the gap between skeleton and SMPL. This model should be lightweight with fewer vertices and faces to have less inference time.
Due to the lack of 3D data, especially body shape data, most approaches exploit projected keypoints as the 2D supervision. The body part segmentation that provides critical semantic information is rarely mentioned. The reason is that rendering 3D model to 2D image is hard to be differentiable. Differentiable rendering is an approximate way to utilize the segmentation as supervision, refining the human pose and shape via full body silhouettes [26]. However, the differentiable renderer usually focuses on the overall human shape. With the part segmentation, we can infer the orientation, position, length, and thickness of different parts. Moreover, the part segmentation can also indicate the occlusion between different body parts evidently.
In this paper, we propose a simple geometrybased body representation called EllipBody
. The representation utilizes several ellipsoids to represent different body parts and takes the body part length, thickness, and orientations as the explicit parameters. The proposed representation is a lightweight model for differentiable rendering and is flexible to adjust each part. There is an optional way to convert the EllipBody to detailed body model, by using a MultiLayer Perceptron to retarget the pose and minimizing the ICP loss to achieve the shape parameters of SMPL
[19].To utilize the part segmentation as supervision, we extend the objectlevel differentiable neural renderer [13] to a partlevel differentiable neural renderer. This module takes the silhouettes of each body part as supervision and refines each part of EllipBody iteratively. We also design a depthaware loss in the part renderer to identify the occlusions between different parts and keep occlusion consistency with respect to part segmentation.
To predict the body shape from a single image, we first train an endtoend network to obtain the parameters of EllipBody as an initial prediction. We perform a postoptimization to minimize the part segmentation loss to correct the errors in network prediction. With the EllipBody model and our partlevel neural renderer, the performance on Human 3.6M [10] and LSP [11] dataset is competitive to the stateoftheart.
With all that in mind, our contributions are threefold.

[noitemsep,topsep=0pt]

We propose an intermediate human body representation (EllipBody) for human pose and shape recovery. It is lightweight and partbased to accelerate part rendering and optimization processes.

We propose an occlusionaware partlevel differentiable renderer (PartDR) to utilize part segmentation as supervision for learning.

We implement a framework containing a deep neural network and an iterative postoptimization with part segmentation loss computed by PartDR, which achieves the stateoftheart performance on human pose and shape recovery.
2 Related Work
2.1 Representations for Human Bodies
Various representations have been provided for human pose and shape estimation. Among these representations, 3D skeleton is simple and effective to represent human pose and has been adopted to lots of previous methods [31, 24, 29, 30, 28]. In skeletonbased methods, joints position [20, 28] and volumetric heatmaps [30, 23] are often used to predict the 3D skeleton in neural network and significantly improve the 3D human pose estimation. However, these methods only focus on the pose estimation while the human shape is often ignored. Moreover, the 3D skeleton is represented by several joints and kinematic constraint is often neglected.
The human shape recovery methods focus on statistical parametric models to represent the human pose and shape simultaneously. These models are often generated from human body scan data and encode both pose and shape parameters. Loper et al. [19] propose the skinned multiperson linear model (SMPL) to generate a realistic human body with thousands of triangular faces. Recently, Pavlakos et al. [22] provide a detailed parametric model SMPLX that is able to model human body, face, and hands. These models contain thousands of vertices and faces and can represent more detailed human body and shape. However, the large number of vertices and faces often slows down the optimizationbased methods [12]. As these parametric models perform implicit parameters to indicate human shape, it is hard to perform partial refinement independently. We proposed a lightweight model as an intermediate representation for current models. The proposed model has limited vertices and faces and is able to further speed up the optimizationbased method.
2.2 Differentiable Rendering
Rendering connects the image plane with the 3D space. Recent works on inverse graphics [8, 18, 13] put great effort to make this process differentiable that make the renderer system as an optional module in learningbased approaches. Loper et al. [18] propose a differentiable renderer called OpenDR, which obtains derivatives with respect to the model parameters. Kato et al. [13] present a neural renderer which approximates gradient as a linear function during rasterization. These methods support recent approaches [21, 26]
exploiting the segmentation as the supervised labels to improve their performance. Previous differentiable renderers output the shape and textures successfully but ignore different parts of the object. We extend the differentiable renderer to partlevel and proposed a depthaware loss function for partlevel rendering. Therefore, we can explore the spatial relations between various parts in a single model through our proposed renderer module.
2.3 Human Shape Recovery
Recovering both pose and shape first follows the optimizationbased solutions. Guan et al. [9] optimize the parameters of SCAPE [3] model with the 2D keypoints annotation. Bogo et al. [4] employ a CNN to achieve the 2D keypoints, then propose SMPLify to optimize the parameters of SMPL model[19]. Lassneret al. [15] take the silhouettes and dense 2D keypoints as additional features, using SMPLify method to obtain more accurate results. Recent expressive human model SMPLX [22] integrates face, hand, and full body together. Pavlakos et al. optimize the VPoser [22]
, which is the latent space of the SMPL parameters, together with a collision penalty and a gender classifier. While optimizationbased approaches often take a long time to reach the final result, using deep neural network to regress the parameters is the majority trend recently. Thus Pavlakos
et al. [25] use a CNN to estimate the parameters from the silhouettes and 2D joint heatmaps. Kanazawa et al. [12] also present an endtoend network, called HMR, to predict the parameters of the shape. They employ a large dataset to train a discriminator to promise the available parameters. Kolotouros et al. [14]propose a framework called GraphCMR which regresses the position of each vertex through a graph CNN. These solutions usually suppose a fixed camera intrinsic and extrinsic parameters, which may cause uncontrolled results due to the lack of generalization. Considered the shortages and benefits of both optimization and CNNbased methods, we employ a convolutional neural network to estimate the pose and shape parameters of EllipBody, then refined the model through the optimizationbased process in limit iterations.
3 Methodology
The goal of our work is to estimate an entire configuration of the human body from a single color image. Our framework is illustrated in Figure 2. Given an image of a human body, the backbone Deep Convolutional Network infers the parameters of the model, EllipBody. Then we feed the EllipBody into a partlevel differentiable renderer (PartDR) to produce individual silhouettes for the body parts. The objective function is to minimize the difference between rendered and predicted part segmentation. Finally, we use a linear regressor to acquire realistic body shape.
3.1 EllipBody: An Intermediate Representation
The statistical parametric models have significantly benefited the human shape recovery society, however, this kind of models still have specific limitations. These models encode human body shapes into latent space parameters and utilize these parameters to generate detailed human body mesh. The latent parameters represent human shape prior implicitly which makes it hard to change human part independently. Moreover, the detailed mesh may slow down the optimization process due to redundant faces.
We proposed a lightweight and flexible intermediate representation, called EllipBody, to speed up the optimization process and disentangle human body parts. We utilize ellipsoid to represent human body parts and take the position, orientation and length of semiprincipal axes of each ellipsoid as explicit parameters. The EllipBody representation contains both human skeleton and surfaces, and is able to adjust human parts independently. We choose ellipsoid as the human part representation as the human part silhouettes are mostly ellipsoidal and the ellipsoid produces continuously projection in different views.
The proposed representation is an expansion of human skeleton and represents human body parts, e.g. limbs, torso, and head, with parametric ellipsoid. As each ellipsoid contains three independent semiprincipal axes, we select one of them as the skeleton axis and the other two as the shape axes. As shown in Figure 3, we assemble the ellipsoids with the bones along the skeleton axes and locate the end points of the ellipsoids with human joints belongs to the bones. After the assembling, the ellipsoid is a more powerful alternate to human skeleton and is applicable to represent both human pose and shape independently.
The parameters of the ellipsoid contains the bone length and part thickness along different axes . We use the position and global rotation of each ellipsoid as the pose parameters. The represents the position of its center and indicate the global rotation of the ellipsoid. The proposed EllipBody is formulated as follow. , where
(1) 
As the human body is symmetric, ellipsoids in EllipBody share the parameters when indicating the same category of human parts. Therefore, we reduce the number of semiprincipal axes parameters from to . We divide EllipBody parameters in two parts, for part lengths and for thickness. The simplified parameters are shown in Table 1. The torso, feet, and hands keep asymmetrical as human body does.
In the inference phase, we first use the ellipsoid parameters to reconstruct the EllipBody model and then use the pose parameters to recover the human pose with forward kinematics. The process is shown as follow,
(2) 
where
is an offset vector indicating the direction from its parent to the current joint. So
denotes the local position of joint in joint’s coordinate. Similarly, We use and to compute the centers of the ellipsoids . The only change is to modify the offset vectors .Part  length  Shape  Part  length  Shape  

Ass  Upper legs  
Abdomen  Lower legs  
Chest  Feet  
Neck  Upper arms  
Shoulders  Fore arms  
Head  Hands  
The proposed model is an expansion of 3D human skeleton, which is able to represent the pose and shape of body parts simultaneously. As shown in Figure 3, we extract specific end points and center points from the reconstructed EllipBody model as the 3D human skeleton for pose. We divide each ellipsoid into several triangles to obtain human mesh for shape. For convenience, we exploit the icosahedron in our implement. The icosahedron is a 20face polygon whose faces are equilateral triangles which can be subdivided to generate more subtle surfaces as the classic geodesic polyhedron [32].
3.2 PartDR: PartLevel Differentiable Renderer
Human part segmentation provides effective 3D evidence, e.g. boundaries, occlusions, and locations, to infer the relationship between body parts. We extend the objectlevel differentiable neural renderer proposed by Kato et al. [13] to a partlevel differentiable renderer (PartDR). The proposed partlevel differentiable renderer draws human parts independently and generates both face mapping and part mapping. In backpropagation, we compute the partlevel approximate derivatives following the previous method[13] but omit the region that is occluded by other human parts. We also design a depthaware occlusion loss to revise the incorrectly occluded region.
Rendering the human parts
Given the camera settings and EllipBody parameters, we obtain two rendered results after the rendering process, the face index and the part index . The face index map indicates the correspondence between image pixels and faces of human mesh. We use to represent the nearest face , the th face of the th part, that is projected on image position . The part index is defined as a binary array which indicates the pixel belongs to the th part and otherwise .
(3) 
is the EllipBody model, and is the projection matrix.
Approximate derivatives for part rendering
In the training process, we follow the neural renderer approach that is proposed in [13] to compute approximate derivatives of each part. The neural renderer is a differential renderer and utilizes the approximate gradient of rasterization to enable the endtoend rendering in neural network. It efficiently approximates the gradients of vertex coordinates with respect to the rendered image.
We use the function to indicate the rendering function of pixel and show the derivatives of the th vertex of face as follow,
(4) 
We only show the derivatives on the xaxis for simplify. is the rendered value of pixel . is the residual of groundtruth P and . is the xcoordinate of the current vertex and is a new xcoordinate that makes collide the edge of the rendered face.
The neural renderer is able to be performed to single ellipsoid. However, the proposed EllipBody contains multiple ellipsoids and may lead to inaccurate approximates due to the self occlusion. We omit the selfoccluded region, shown in red triangle in Figure 4, in the derivative approximation.
(5) 
We propose derivatives on the axis (direction on depth) as an extension for the partlevel neural renderer. We omit the derivatives in the occluded regions and then design a new approximation of the derivatives on the axis to refine the incorrectly occluded part. As shown in Figure 4, we first find the occluded face. Then we compute the depth derivatives directly proportional to the distance between the occluded point and the one occlude it. The derivative is shown as
(6) 
is the distance between the two faces. is the length between two points. is the corresponding point whose projecting point is . The line form to intersects to at is a variable to magnitude the term.
3.3 EllipBody Estimation
We proposed an endtoend pipeline to estimate the parameters of EllipBody. As shown in Figure 2, a CNNbased backbone extracts the features from a single image first. Base on the image feature, we regress the parameters of the pose and shape. After that, we optimize the objective function to minimize the part segmentation loss of rendered parts.
Network Design.
As the previous works have great success in training a Deep CNN for human pose estimation, we take a simple baseline as our encoder to extract the features from an image. The features are fed into the regression block that has a similar structure in [20]. We obtain , , and . Here is local rotation vectors, thus we compute the global rotation by forward kinematic [16, 17] of EllipBody. Note that output
may not be available rotations, so we employ Gram–Schmidt process
[6] to guarantee the validity. The network also regresses the camera parameters for a weakperspective model proposed by Kanazawa et al. [12]. The skeleton and the vertices of mesh can be calculated from , , and as mentioned in Section 3.1. Given the camera parameters and EllipBody, PartDR outputs part maps as the prediction of part segmentation. Also, we project the 3D skeleton to obtain 2D keypoints .The loss function is composed of three terms, including reconstruction loss on 3D joints, projection loss on 2D keypoints. and part segmentation loss produced by PartDR. We integrates 2D annotations of inthewild images as weakly supervision.
(7) 
(8) 
(9) 
(10) 
, and are weights for each loss. We set for images that only have 2D annotations.
Optimization with Part Segmentation.
The previous methods have shown the importance of optimization after neural network prediction. We also adopted the optimization procedure to refine EllipBody estimation. Moreover, as the EllipBody is a part based model, we are able to perform the optimization on part segmentation data. We formulate the objective function of the optimizaiton as follow,
(11) 
are parameters of EllipBody, and is weaklyperspective camera settings. and describe the same loss in equation 7. and are regularization term.
We employ the part segmentation predicted from [21] as the target. Since the proposed network proposed a accurate initialization parameters, The optimization process can improve the joints position, ordinal joints and body shapes simultaneously in small number of iterations.
3.4 Convert EllipBody to SMPL
To visualize the detailed body model, we train a MultiLayer Perceptron (MLP) to convert the pose of EllipBody to the pose parameter of SMPL [19] models. First we empirically convert rotation vectors of and to rotation matrices through Rodrigues formula. The loss function is
(12) 
After that, we perform Iterative Closest Points (ICP) to obtain the rotation and translation between the two models. The objective function is shown as
(13) 
is the Iterative Closest Points (ICP) process [5] and are vertices in th part of EllipBody, are vertices in corresponding part of SMPL model.
4 Experiments
4.1 Datasets
Human3.6M:
It is a large scale human pose dataset that contains complete motion capture data. It also provides images, camera settings, part segmentation, and depth maps. We use original mocap pose data for EllipBody and combine its body part segmentation into 14 parts. We use subjects S1, S5, S6, S7 and S8 as training data, and test on S9 and S11. We employ two popular error metric Mean Per Joint Position Error (MPJPE) and Reconstruction Error (PAMPJPE) for evaluation.
Rec. Error  

Akhter & Black [1]  181.1 
Ramakrishna et al. [27]  157.3 
Zhou et al. [34]  106.7 
SMPlify [4]  82.3 
Lassner et al. [15]  80.7 
Pavlakos et al. [25]  75.9 
NBF [21]  59.9 
HMR [12]  58.1 
GraphCMR [14]  51.9 
Ours  51.4 
Ours+Optimization  47.6 
FB Seg.  Part Seg.  
acc.  f1  acc.  f1  
SMPLify on GT [4]  92.17  0.88  88.82  0.67 
SMPLify [4]  91.89  0.88  87.71  0.64 
SMPLify on [26]  92.17  0.88  88.24  0.64 
HMR [12]  91.67  0.87  87.12  0.60 
Bodynet [31]  92.75  0.84     
GarphCMR [14]  91.46  0.87  88.69  0.66 
PartDR+SMPL on GT  94.03  0.91  91.91  0.79 
PartDR+EllipBody on GT  94.74  0.92  93.26  0.84 
PartDR+EllipBody+Pred.Part  92.13  0.88  90.70  0.74 
Up3d:
Lsp:
It is a 2d pose dataset, which provides part segmentation annotations. We use the test set of this dataset to evaluate the accuracy and f1 score of part segmentation.
Model  Loss  MPJPE() 
3D Joints  104.5  
SMPL  (full)  75.9 
SMPL  (part)  67.1 
EllipBody  73.8  
EllipBody  (full)  67.1 
EllipBody  (seg)  65.2 
EllipBody  (full)  64.1 
EllipBody  (part)  62.8 
4.2 Implement Details
To train the regression network, we adopt the backbone pretrained by Xiao et al. [33]. The dimension of the regression model is 1024, and each regressor is stacked with two residual blocks. The input image size is , while output segmentation size is also . We use Adam optimizer, and a batch size of 128, with learning rate
to train the model for 70 epochs without segmentation loss
. Then we add segmentation loss and reduce learning rate to for additional 30 epochs. When optimizing the EllipBody predicted by the network, we use Adam optimizer with learning rate . The max number of iterations is 50 times. Base on the experimental results, we set . The target segmentation is predicted by RefineNet proposed by Omran et al. [21].4.3 Comparing with the StateoftheArts.
3D Pose Estimation.
We compare our approach with other stateoftheart methods for 3D pose estimation on Human3.6M. The results are presented in Table 2. 3D poses only predicted by network is competitive to other baselines. After optimization, the Reconstruction Error further decreases by , benefiting from reliable body part segmentation. We clarify that different annotations are used in different methods. Kolotouros et al. [14] utilize the 3D meshes of SMPL on Human3.6M and UP3D dataset as the supervision. Kanazawa et al. [12] use additional images with 2D keypoint annotations. Omran et al. [21] only train the model on Human3.6M, while Pavlakos et al. [25] do not use any Human3.6M data. Our method employs additional part segmentation annotations in Human3.6M and UP3D.
Part Segmentation.
To evaluate the shape recovery results, we present the comparison with other previous works on LSP test dataset by part segmentation evaluation as shown in Table 3. Note that both learningbased and optimizationbased are listed. We first use SMPL as the representation and optimize the result with groundtruth. The accuracy raises by on foreground and background segmentation, and on part segmentation. When changing the model to EllipBody, the accuracy for part segmentation raises additionally by . The experimental results show that our methods shorten the gap between the full segmentation and part segmentation. Then we take the predicted part segmentation directly from [21] to optimize EllipBody. Ours outperforms other methods, and the full body segmentation is competitive to BodyNet [31].
4.4 Ablative Study
Effectiveness of EllipBody and PartLevel Differentiable Renderer.
We investigate the effectiveness of our proposed lightweight model, EllipBody, in the network for 3D pose estimation. To this end, we compare EllipBody with another popular parametric model called SMPL, and employ solely 3D joint positions as our baseline. As shown in Table 4, both EllipBody and SMPL perform better than baseline due to embedding the body priors in their models. When only using 2D annotations including 2D keypoints and segmentation, EllipBody outperforms SMPL with MPJPE error metric in Human3.6M on P1. Note that in SMPL, 3D joints are regressed when the whole mesh is confirmed, which means the joint positions are related to shape parameters . EllipBody with explicit parameters for the length of bones inferences the skeleton and the mesh of body separately.
Beyond the choice of the model, we verify the insight that part segmentation is the 2D annotation with 3D information by comparing full body silhouettes and part silhouettes as the supervision. Even if only applying full body segmentation, MPJPE decreases by with EllipBody. When applying part segmentation, error decreases by using SMPL model and using EllipBody. The results () are close to the one that appending 3D annotations ().
Optimization Performance
Figure 5 shows the performances of different body model configurations over optimization process. Since EllipBody can be subdivided to increase the number of faces, we explore the influence on optimization with the various number of model faces. We find that fewer number of the faces significantly speed up the fitting process. However, the accuracy of part segmentation on LSP test dataset will not keeps increasing because the area of a single face is less than a pixel while the size of images produced by PartDR is .
4.5 Qualitative Evaluation
Figure 6 illustrates the qualitative results comparing with one of the stateoftheart methods [14]. With the part segmentation, the human body predicted from a single image has more accurate poses. Figure 7 shows the EllipBody and SMPL with different body proportions. Although two models both work well and can be converted to each other, the parameters of EllipBody are interpretable due to the explicit meanings of part lengths and thickness.
5 Conclusion
In this paper, we present an approach to utilize the part segmentation as supervision to improve the performance in human pose and shape recovery. To this end, we propose a lightweight and partbased human model to generate the skeleton and the shape of body parts efficiently. We also extend a differentiable mesh renderer to partlevel that has the ability to recognize the occlusion between human parts. The proposed methods enhance the performance in precision and speed for both trainingbased and optimizationbase approaches.
References
 [1] (2015) Poseconditioned joint angle limits for 3d human pose reconstruction. In CVPR, pp. 1446–1455. Cited by: Table 2.
 [2] (2014) 2d human pose estimation: new benchmark and state of the art analysis. In CVPR, pp. 3686–3693. Cited by: §4.1.
 [3] (2005) SCAPE: shape completion and animation of people. In ACM Transactions on Graphics (TOG), Cited by: §2.3.
 [4] (2016) Keep it smpl: automatic estimation of 3d human pose and shape from a single image. In ECCV, pp. 561–578. Cited by: §1, §2.3, Table 2, Table 3.
 [5] (2002) The trimmed iterative closest point algorithm. In Object Recognition Supported by User Interaction for Service Robots, Vol. 3, pp. 545–548. Cited by: §3.4.
 [6] (1976) Reorthogonalization and stable algorithms for updating the gramschmidt qr factorization. Mathematics of Computation 30 (136), pp. 772–795. Cited by: §3.3.
 [7] (2014) Body parts dependent joint regressors for human pose estimation in still images. TPAMI 36 (11), pp. 2131–2143. Cited by: §4.1.
 [8] (2008) Modelbased hand tracking with texture, shading and selfocclusions. In CVPR, pp. 1–8. Cited by: §2.2.
 [9] (2009) Estimating human shape and pose from a single image. In ICCV, pp. 1381–1388. Cited by: §2.3.
 [10] (2014) Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI 36 (7), pp. 1325–1339. Cited by: §1, Table 2.
 [11] (2010) Clustered pose and nonlinear appearance models for human pose estimation. In BMVC, Note: doi:10.5244/C.24.12 Cited by: §1, §4.1.
 [12] (201806) Endtoend recovery of human shape and pose. In CVPR, Cited by: §1, §2.1, §2.3, §3.3, §4.3, Table 2, Table 3.
 [13] (2018) Neural 3d mesh renderer. In CVPR, Cited by: §1, §2.2, §3.2, §3.2.
 [14] (2019) Convolutional mesh regression for singleimage human shape reconstruction. In CVPR, pp. 4501–4510. Cited by: §2.3, Figure 6, §4.3, §4.5, Table 2, Table 3.
 [15] (2017) Unite the people: closing the loop between 3d and 2d human representations. In CVPR, Vol. 2, pp. 3. Cited by: §1, §2.3, Table 2.

[16]
(1988)
Kinematic analysis of a threedegreesoffreedom inparallel actuated manipulator
. IEEE Journal on Robotics and Automation 4 (3), pp. 354–360. Cited by: §3.3.  [17] (1993) Kinematic analysis of a stewart platform manipulator. IEEE Transactions on Industrial Electronics 40 (2), pp. 282–293. Cited by: §3.3.
 [18] (2014) OpenDR: an approximate differentiable renderer. In ECCV, pp. 154–169. Cited by: §2.2.
 [19] (2015) SMPL: a skinned multiperson linear model. ACM Transactions on Graphics (TOG) 34 (6), pp. 248. Cited by: §1, §1, §2.1, §2.3, §3.4.
 [20] (2017) A simple yet effective baseline for 3d human pose estimation. In ICCV, Cited by: §2.1, §3.3.

[21]
(2018)
Neural body fitting: unifying deep learning and model based human pose and shape estimation
. In 3DV, pp. 484–494. Cited by: §2.2, §3.3, §4.2, §4.3, §4.3, Table 2.  [22] (2019) Expressive body capture: 3d hands, face, and body from a single image. In CVPR, Cited by: §1, §2.1, §2.3.
 [23] (2018) Ordinal depth supervision for 3d human pose estimation. In CVPR, pp. 7307–7316. Cited by: §2.1.
 [24] (2017) Coarsetofine volumetric prediction for singleimage 3d human pose. In CVPR, pp. 7025–7034. Cited by: §2.1.
 [25] (201806) Learning to estimate 3d human pose and shape from a single color image. In CVPR, Cited by: §2.3, §4.3, Table 2.
 [26] (2018) Learning to estimate 3d human pose and shape from a single color image. In CVPR, pp. 459–468. Cited by: §1, §2.2, Table 3.
 [27] (2012) Reconstructing 3d human pose from 2d image landmarks. In ECCV, pp. 573–586. Cited by: Table 2.
 [28] (2011) Skeletal graph based human pose estimation in realtime.. In BMVC, pp. 1–12. Cited by: §2.1.
 [29] (2017) Compositional human pose regression. In ICCV, pp. 2602–2611. Cited by: §2.1.
 [30] (2018) Integral human pose regression. In ECCV, pp. 529–545. Cited by: §2.1.
 [31] (2018) BodyNet: volumetric inference of 3d human body shapes. In ECCV, pp. 20–36. Cited by: §2.1, §4.3, Table 3.
 [32] (1974) Polyhedron models. Cambridge University Press. Cited by: §3.1.
 [33] (2018) Simple baselines for human pose estimation and tracking. In ECCV, Cited by: §4.2.
 [34] (2017) Sparse representation for 3d shape estimation: a convex relaxation approach. TPAMI 39 (8), pp. 1648–1661. Cited by: Table 2.
Comments
There are no comments yet.