Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data

04/02/2020 ∙ by Henry M. Clever, et al. ∙ Georgia Institute of Technology Stanford University 0

People spend a substantial part of their lives at rest in bed. 3D human pose and shape estimation for this activity would have numerous beneficial applications, yet line-of-sight perception is complicated by occlusion from bedding. Pressure sensing mats are a promising alternative, but training data is challenging to collect at scale. We describe a physics-based method that simulates human bodies at rest in a bed with a pressure sensing mat, and present PressurePose, a synthetic dataset with 206K pressure images with 3D human poses and shapes. We also present PressureNet, a deep learning model that estimates human pose and shape given a pressure image and gender. PressureNet incorporates a pressure map reconstruction (PMR) network that models pressure image generation to promote consistency between estimated 3D body models and pressure image input. In our evaluations, PressureNet performed well with real data from participants in diverse poses, even though it had only been trained with synthetic data. When we ablated the PMR network, performance dropped substantially.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

page 8

page 9

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Humans spend a large part of their lives resting. While resting, humans select poses that can be sustained with little physical exertion. Our primary insight is that human bodies at rest can be modeled sufficiently well to generate synthetic data for machine learning. The lack of physical exertion and absence of motion makes this class of human activities amenable to relatively simple biomechanical models similar to the ragdoll models used in video games

[39].

We apply this insight to the problem of using a pressure image to estimate the 3D human pose and shape of a person resting in bed. This capability would be useful for a variety of healthcare applications such as bed sore management [19], tomographic patient imaging [20], sleep studies [11], patient monitoring [12], and assistive robotics [15]

. To this end, we present the PressurePose dataset, a large-scale synthetic dataset consisting of 3D human body poses and shapes with pressure images (Fig. 1, left). We also present PressureNet, a deep learning model that estimates 3D human body pose and shape from a low-resolution pressure image (Fig. 1, right).

Prior work on the problem of human pose estimation from pressure images [11, 15, 20, 25, 33] has primarily used real data that is challenging to collect. Our PressurePose dataset has an unprecedented diversity of body shapes, joint angles, and postures with more thorough and precise annotations than previous datasets (Table 1). While recent prior work has estimated 3D human pose from pressure images, [11, 15], to the best of our knowledge PressureNet is the first system to also estimate 3D body shape.

Our synthetic data generation method first generates diverse samples from an 85 dimensional human pose and shape space. After rejecting samples based on self-collisions and Cartesian constraints, our method uses each remaining sample to define the initial conditions for a series of two physics simulations. The first finds a body pose that is at rest on a simulated bed. Given this pose, the second physics simulation generates a synthetic pressure image.

Our method uses SMPL [35] to generate human mesh models and a capsulized approximation of SMPL [6] to generate articulated rigid-body models. The first physics simulation drops a capsulized articulated rigid-body model with low-stiffness, damped joints on a soft-body model of a bed and pressure-sensing mat. Once the articulated body has settled into a statically stable configuration, our method converts the settled capsulized model into a particle-based soft body without articulation. This soft body model represents the shape of the body, which is important for pressure image synthesis. The second physics simulation drops this soft-body model from a small height onto the soft-body bed and sensor model. Once settled, the simulated sensor produces a pressure image, which is stored along with the settled body parameters.

Our deep learning model, PressureNet, uses a series of two networks modules. Each consists of a convolutional neural network (CNN) based on

[15], a kinematic embedding model from [30] that produces a SMPL mesh [35], and a pressure map reconstruction (PMR) network. The PMR network serves as a model of pressure image generation. It is a novel component that encourages consistency between the mesh model and the pressure image input. Without it, we found that our deep learning models would often make mistakes that neglected the role of contact between the body and the bed, such as placing the heel of a foot at a location some distance away from an isolated high pressure region.

When given a mesh model of the human body, the PMR network outputs an approximate pressure image that the network can more directly compare to the pressure image input. These approximate pressure images are used in the loss function and as input to a second residual network trained after the first network to correct these types of errors and generally improve performance.

In our evaluation, we used a commercially available pressure sensing mat (BodiTrak BT-3510 [38]) placed under the fitted sheet of an Invacare Homecare Bed [29]. This sensing method has potential advantages to line-of-sight sensors due to occlusion of the body from bedding and other sources, such as medical equipment. However, the mat we used provides low-resolution pressure images (6427) with limited sensitivity and dynamic range that make the estimation problem more challenging.

We only trained PressureNet using synthetic data, yet it performed well in our evaluation with real data from 20 people, including successfully estimating poses that have not previously been reported in the literature, such as supine poses with hands behind the head. To improve the performance of the model with real data, we used custom calibration objects and an optimization procedure to match the physics simulation to the real world prior to synthesizing the training data. We also created a noise model in order to apply noise to the synthetic pressure images when training PressureNet.

Our contributions include the following:

  • [noitemsep,nolistsep]

  • A physics-based method to generate simulated human bodies at rest and produce synthetic pressure images.

  • The PressurePose dataset, which consists of (1) 206K synthetic pressure images (184K train / 22K test) with associated 3D human poses and shapes 111Synthetic dataset: doi.org/10.7910/DVN/IAPI0X and (2) 1,051 real pressure images and RGB-D images from 20 human participants 222Real dataset: doi.org/10.7910/DVN/KOA4ML.

  • PressureNet 333Code: github.com/Healthcare-Robotics/bodies-at-rest, a deep learning model trained on synthetic data that estimates 3D human pose and shape given a pressure image and gender.

Figure 2: We generate the initial pose from scratch, using random sampling of the body shape, joint angles, and global transform on the bed. We use rejection sampling to distribute the poses and remove self-collisions. Then, we rest a dynamic capsulized human model onto a soft bed using DartFleX, a fusion of DART and FleX simulators, to get an updated resting pose. Because this model is a rather rough approximation of human shape, we then use FleX to particlize a finer body representation to get the pressure image.

2 Related work

work

data: (R)eal, (S)ynth

modality:

(P)ressure, (D)epth

(T)thermal, IRS -

infrared selective

3D: (Y)es, (N)o

human

representation:

(S)keleson, (M)esh

postures

# joints

# identities

# images

[25] R P Y M SP+, K 18 1 ?
[20] R D, P N S SP, L, P 10 16 1.1 K
[33] R P N S SP, L 8* 12 1.4 K
[1] R D Y S I/O, SP, L 14 10 180 K
[12] R RGB N S SP, UNK 7 3 13 K
[15] R P Y S SP, ST, K 14 17 28 K
[11] R P Y S SP+, L+, 14 6 60
ST
[34] R IRS N S SP+, L+ 14 2 419
[43] R T N S SP+, L+ 14 109 14 K
Ours S/ P Y M SP+, L+, 24 200K/ 200K/
R P+, K, CL 20 1K
HBH, PHU
posture key: SP - supine. L - lateral. P - prone. K - knee raised. I/O - getting in/out
of bed. ST - sitting. CL - crossed legs. HBH - hands behind head. PHU - prone
hands up. + indicates a continuum between postures. * indicates limbs.

Table 1: Comparison of Literature: Human Pose in Bed.

Human pose estimation. There is long history of human pose estimation from camera images [2, 33, 41, 50, 51] and the more recent use of CNNs [53, 54]. The field has been moving rapidly with the estimation of 3D skeleton models [46, 59], and human pose and shape estimation as a 3D mesh [6, 30, 45] using human body models such as SCAPE [5] and SMPL [35]. These latter methods enforce physical constraints to provide kinematically feasible pose estimates, some via optimization [6] and others using learned embedded kinematics models [15, 30, 59]. Our approach builds on these works both directly through the use of available neural networks (e.g, SMPL embedding) and conceptually.

While pressure image formation differs from conventional cameras, the images are visually interpretable and methods developed in the vision community are well suited to pressure imagery [10, 30, 54]. PressureNet’s model of pressure image generation relates to recent work on physical contact between people and objects [8, 26, 27]. It also relates to approaches that fine-tune estimates based on spatial differences between maps at distinct stages of estimation [9, 10, 40, 54].

Human pose at rest. Human pose estimation has tended to focus on active poses. Poses in bed have attracted special attention due to their relevance to healthcare. Table 1 provides an overview of work on the estimation of human pose for people in bed. These efforts have used a variety of sensors including RGB cameras [12], infrared lighting and cameras for darkened rooms [34], depth cameras to estimate pose underneath a blanket profile [1], thermal cameras to see through a blanket [43], and pressure mats underneath a person [11, 15, 16, 20, 25, 33].

Researchers have investigated posture classification for people in bed [19, 20, 42]. There has been a lack of consensus on body poses to consider, as illustrated by Table 1. Some works focus on task-related poses, such as eating [1], and stretching [11]. Poses can increase ambiguity for particular modalities, such as lack of contact on a pressure mat (e.g. knee in the air) [15, 24] or overlapping body parts facing a thermal camera [43].

Large datasets would be valuable for deep learning and evaluation. While some bed pose work has used thousands of images they have either had few participants [12] or poses highly concentrated in some areas due to many frames being captured when there is little motion [1, 11, 15]. An exception is recent work by Liu et al. [43], which has 109 participants.

Generating data in simulation. Approaches for generating synthetic data that model humans in the context of deep learning include physics-based simulators such as DART [32] and PyBullet [18] and position-based dynamics simulators such as PhysX [17] and FleX [36]. Some have used these tools to simulate deformable objects like cloth [14, 17]. For vision, creating synthetic depth images is relatively straightforward (e.g. [1]) while RGB image synthesis relies on more complex graphics approaches [13, 56, 58].

Some past works have simulated pressure sensors. One approach is to model the array as a deformable volume that penetrates the sensed object, where force is a function of distance penetrated [48]. Others model pressure sensing skin as a mass-spring-damper array [21, 28]; the former considers separate layers for the skin and the sensor, a key attribute of pressure arrays covering deformable objects.

3 PressurePose Dataset Generation

Our data generation process consists of three main stages, as depicted in Fig. 2: sampling of the body pose and shape; a physics simulation to find a body pose at rest; and a physics simulation to generate a pressure image. We use two simulation tools, FleX (Section 3.1) for simulating soft body dynamics, and DART (Section 3.2) for articulated rigid body dynamics.

Figure 3: Physics simulation #2 output: PressurePose synthetic dataset examples.

Sample initial pose and shape. We sample initial pose (i.e. joint angles) and body shape parameters from the SMPL human model [35]. The pose consists of 69 joint angles,

, which we sample from a uniform distribution,

, bounded by joint angle limits defined for the hips, knees, shoulders, and elbows in [7, 4, 52]. We initialize the human body above the bed with a uniformly sampled roll , yaw , and 2D translation across the surface of the bed. The pitch is set to and the distance normal to the bed is based on the position of the lowest initial joint position. This determines the global transform, , . The shape of a SMPL human is determined from a set of 10 PCA parameters, , which we also sample uniformly, bounded by following [49]. We use rejection sampling in three ways for generating initial poses: to more uniformly distribute overall pose about the Cartesian space (rather than the uniformly sampled joint space), to create a variety of data partitions representing specific common postures (e.g. hands behind the head), and to reject pose samples when there are self-collisions. See Appendix A.1. This step outputs pose and shape parameters , where is a set of joint angles conditioned on that has passed these criteria.

Physics Simulation #1: Resting Pose. We use FleX [36] to simulate a human model resting on a soft bed, which includes a mattress and a synthetic pressure mat on the surface of the mattress (Fig 2). The human is modelled as an articulated rigid body system made with capsule primitives, which is a dynamic variant of the SMPL model. Once the simulation nears static equilibrium, we record the resting pose .

FleX is a position-based dynamics simulator with a unified particle representation that can efficiently simulate rigid and deformable objects. However, FleX does not currently provide a way for particles to influence the motions of rigid capsules. To overcome this limitation, we use DART [32]

to model the rigid body dynamics of the capsulized human model. We combine FleX and DART through the following loop: 1) DART moves the capsulized articulated rigid body based on applied forces and moments. 2) FleX moves the soft body particles in response to the motions of the rigid body. 3) We compute new forces and moments to apply in DART based on the state of the FleX particles and the capsulized articulated rigid body. 4) Repeat. We call the combination of the two simulators DartFleX and Section

3.2 provides further details.

Physics Simulation #2: Pressure Image. The settled, capsulized body is insufficient for producing a realistic pressure image: it approximates the human shape too roughly. Instead, we create a weighted, particlized, soft human body in FleX (Figs. 2 and 3) from the SMPL [35] mesh using body shape and resting pose . We initialize the particlized human with 2D translation over the surface of the mattress . We set , the position normal to gravity, so the body is just above the surface of the bed. We then start the simulation, resting the particlized body on the soft bed, and record the pressure image once the simulation has neared static equilibrium. We note that this particlized representation has no kinematics and cannot be used to adjust a body to a resting configuration; thus our use of two separate dynamic simulations.

3.1 Soft Body Simulation with FleX.

We simulate the sensing array by connecting FleX particles in a way that mimics real pressure sensing fabric, and model the mattress with a soft FleX object.

Soft Mattress and Pressure Sensing Mat. Here we describe the soft mattress and pressure sensing array within the FleX environment, as shown in Fig. 4 and further described in Appendix A.3. The mattress is created in a common twin XL size with clusters of particles defined by their spacing, , radius, , stiffness, , and particle mass,

, parameters. We then create a simulated pressure sensing mat on top of the mattress that is used to both generate pressure images and to help the human model reach a resting pose by computing the force vectors applied to the various segments of the human body. The mat consists of two layers of staggered quad FleX cloth meshes in a square pyramid structure, where each layer is defined by its stretching,

, bending, , and shear, , stiffnesses, which are spring constraints on particles that hold the mat together. A compression stiffness, , determines the bond strength between the two layers, and its mass is defined by .

Figure 4: (a) Synthetic pressure mat structure. Pressure is a function of the penetration of the top layer array particle into the four underlying particles. (b) DartFleX collision between a capsulized limb and the simulated bed and pressure-sensing mat.

We model force applied to the mat as a function of the particle penetration vector based on the pyramid structure in Fig. 4 (a). Force increases as the particle on the top layer, , moves closer to the four particles underneath.

(1)

where is the distance between particle and an approximate underlying plane , is the initial distance at rest prior to contact, and is the normal vector of the approximate underlying plane.

Sensor Model. The BodiTrak pressure-sensing mat has an array of pressure-sensing taxels (tactile pixels). The four particles at the base of the pyramid structure in Fig. 4 (a) model the 1” square geometry of a single pressure-sensing taxel. We model the pressure output, , of a single taxel, , using a quadratic function of the magnitude of the penetration vector .

(2)

where , , and are constants optimized to fit calibration data, as described in Section 3.3.

3.2 DartFleX: Resting a Dynamic Ragdoll Body

The purpose of DartFleX is to allow rigid body kinematic chains to interact with soft objects by coupling the rigid body dynamics solver in DART to the unified particle solver in FleX as shown in Fig. 4 (b).

Dynamic rigid body chain. Our rigid human body model relies on a capsulized approximation to the SMPL model, following [6]. To use this model in a dynamics context, we calculate the per-capsule mass based on volume ratios from a person with average body shape , average body mass, and mass percentage distributions between body parts as defined by Tozeren [55]. For joint stiffnesses , we tune parameters to achieve the low stiffness characteristics of a ragdoll model that can settle into a resting pose on a bed due to gravity. We set torso and head stiffness high so that they are effectively immobile, and joint damping to reduce jitter.

DartFleX Physics. We initialize the same capsulized model in both DART and FleX. We apply gravity in DART, and take a step in the DART simulator. We get a set of updated dynamic capsule positions and orientations, and move the static geometry counterparts in FleX accordingly. In order to transfer force data from FleX to DART, we first check if any top layer pressure mat particles are in contact. Each particle in contact has a penetration vector (see equation 1) at time , which we convert to normal force vector using a mass-spring-damper model [47]:

(3)

where is a spring constant, is a damping constant, and . We then assign each force to its nearest corresponding capsule . Given the velocity, , of capsule and a friction coefficient, , we compute the frictional force for the particle in contact:

(4)

where is an operator that projects orthogonally onto a straight line parallel to . In our simulation, we set and , and we find through a calibration sequence described in Section 3.3. We can then compute the total particle force, :

(5)

We then compute a resultant force in FleX for the body capsule, based on the sum of forces from particles in contact with the capsule plus gravity, :

(6)

Moment is computed on each capsule from particles in contact, where is the moment arm between a particle and the capsule center of mass:

Figure 5: (a) Rigid calibration capsules with quarters (U.S. coins) shown for size. (b) Simulated capsules. (right) Real and simulated pressure images prior to calibration.
(7)

The resultant forces and moments are applied in DART, a step is taken with the forces and gravity applied to each body part, and the DartFleX cycle repeats. We continue until the capsulized model settles and then record resting pose , root position , and root orientation .

Figure 6: (a) PressureNet: We combine two network modules (“Mod1” and “Mod2”) in series. Mod1 learns a coarse estimate and Mod2 fine-tunes, by learning a residual that takes as input the two maps reconstructed by Mod1 combined with the input to Mod1. (b) Detailed description of a single PressureNet module showing the novel PMR network that reconstructs pressure and contact maps.

3.3 Calibration

We calibrated our simulation using the rigid capsule shapes in Fig. 5 (a). We placed varying weights on them on the real pressure-sensing mat and recorded data, and then created matching shapes in simulation. We first calibrated the FleX environment using the particlized capsules shown in Fig. 5 (b) using the covariance matrix adaptation evolution strategy (CMA-ES) [22] to match synthetic pressure images and real pressures images of the calibrated objects by optimizing , , , , , , , , , , , , and .

We also measure how much the real capsules sink within the mattress. We use these measurements to calibrate the mass-spring-damper model in equation 3. We fit the simulated capsule displacement to the real capsule displacement to solve for the spring constant and then set and . See Appendix A.4 and A.5 for details.

4 PressureNet

Given a pressure image of a person resting in bed and a gender, PressureNet produces a posed 3D body model. PressureNet (Fig. 6

 (a)) consists of two network modules trained in sequence (“Mod1” and “Mod2”). Each takes as input a tensor consisting of three channels: pressure, edges, and contact

, which are shown in Fig. 6 (b), as well as a binary flag for gender. is the pressure image from a pressure sensing mat, results from an edge detection channel consisting of a sobel filter applied to , and is a binary contact map calculated from all non-zero elements of . Given this input, each module outputs both an SMPL mesh body and two reconstructed maps produced by the PMR network, , that estimate the pressure image that would be generated by the mesh body. Mod2 has the same structure as Mod1, except that it takes in two additional channels: the maps produced by PMR in Mod1 . We train PressureNet by training Mod1 to produce a coarse estimate, freezing the learned model weights, and then training Mod2 to fine-tune the estimate.

CNN. The first component of each network module is a CNN with an architecture similar to the one proposed by Clever et al [15]. Notably, we tripled the number of channels in each convolutional layer. See Appendix B.1 for details. During training, only the weights of the CNNs are allowed to change. All other parts of the networks are held constant. The convolutional model outputs the estimated body shape, pose, and global transform, , with the estimated joint angles , body shape parameters , global translation of the root joint with respect to the bed , and parameters which define a continuous orientation for the root joint of the body with , for 3 DOF, i.e. and .

SMPL kinematic embedding. feeds into a kinematic embedding layer (see Fig. 6), which uses the SMPL differentiable kinematics model from [30] to learn to estimate the shape, pose, and global transform. This embedding outputs joint positions for the human body, , and a SMPL mesh consisting of vertices ; and relies on forward kinematics to ensure body proportions and joint angles match real humans.

PMR. The final component of each module, the PMR network, reconstructs two maps based on the relationship between the SMPL mesh and the surface of the bed. The reconstructed pressure map () corresponds with the input pressure image, , and is computed for each pressure image taxel based on the distance that the human mesh sinks into the bed. The reconstructed contact map () corresponds with the input contact map, , and is a binary contact map of . See Appendix B for details.

Loss function. We train Mod1 in PressureNet with the following loss function, given Cartesian joint positions and body parameters:

(8)

where represents the 3D position of a single joint, and and

are standard deviations computed over the whole dataset to normalize the terms.

In our evaluations (Section 6), sequentially training two separate network modules improved model performance and the resulting human mesh and pose predictions. For a pressure array of taxels, we compute a loss for Mod2 by adding the error between the reconstructed pressure maps and the ground truth maps from simulation.

(9)

where uses Mod2 estimates (i.e. , ), and are ground truth maps precomputed by setting , and , are computed over the dataset.

5 Evaluation

To evaluate our methods, we trained our CNN on synthetic data and tested it on both synthetic and real data. We generated 206K synthetic bodies at rest with corresponding pressure images (184K train / 22K test), which we partitioned to represent both a uniformly sampled space and common resting postures. By posture, we mean common recognized categories of overall body pose, such as sitting, prone, and supine. We tested 4 network types and 2 training data sets of different size.

5.1 PressurePose Data Partitions

We used the rejection sampling method described in Section 3 and Appendix A.1 to generate initial poses and create dataset partitions. Our main partition, the general partition, consists of 116K image and label pairs. In it, we evenly distributed limb poses about the Cartesian space and randomly sampled over body roll and yaw. This partition includes supine, left/right lateral and prone postures, as well as postures in between, and has the greatest diversity of poses. We also created a general supine partition (58K) featuring only supine postures and evenly distributed limb poses. Finally, we generated smaller partitions representing other common postures: hands behind the head (5K), prone with hands up (9K), supine crossed legs (9K), and supine straight limbs (9K). See Appendix A.7 for details.

5.2 PressureNet Evaluation

We normalized all input data by a per-image sum of taxels. We blurred synthetic and real images with a Gaussian of

. We trained for 100 epochs on Mod1 with loss function

. Then, we pre-computed the reconstruction maps from Mod1 for input to Mod2, and trained Mod2 for 100 epochs using loss function

. See Appendix B.3 for training hyperparameters and details.

We investigated 5 variants of PressureNet, which are all trained entirely with synthetic data in order to compare the effect of (1) ablating PMR, (2) adding noise to the synthetic training data, (3) ablating the contact and edge input ( and ), and (4) reducing the training data size. Ablating PMR consists of removing the 2 reconstructed maps from the input to Mod2 and using

for training both Mod1 and Mod2. We compared the effect of adding noise to the training data to account for real-world variation, such as sensor noise. Our noise model includes per-pixel white noise, additive noise, multiplicative noise, and blur variation, all with

. We compared networks trained on 46K vs. 184K images.

5.3 Human Participant Study

We mounted a Microsoft Kinect 2 above our Invacare Homecare bed to capture RGB images and point clouds synchronized with our pressure image data. See details in Appendix A.6. We recruited 20 (10F/10M) human participants with approval from an Institutional Review Board. We conducted the study in two parts to capture (1) participant-selected poses and (2) prescribed poses from the synthetic test set. We began by capturing five participant-selected poses. For the first pose, participants were instructed to get into the bed and get comfortable. For the remaining four, participants were told to get comfortable in supine, right lateral, left lateral, and prone postures. Next, for the prescribed poses, we displayed a pose rendering on a monitor, and instructed the participants to get into the pose shown. We captured 48 prescribed poses per participant, which were sampled without replacement from the synthetic testing set: 24 general partition poses, 8 supine-only poses, and 4 from each of the remaining partitions.

Figure 7: 3D error analysis between a human mesh (6,980 vertices) and a point cloud (8,000 downsampled points).

5.4 Data Analysis

We performed an error analysis as depicted in Fig. 7. For this analysis, we compute the closest point cloud point to each mesh vertex, and the closest mesh vertex to each point cloud point. We introduce 3DVPE (3D vertex-point-error), which is the average of these numbers. We downsample the point cloud to a resolution of 1cm so the number of points is roughly equal to the number of mesh vertices. We clip the mesh vertices and the point cloud at the edges of the pressure mat. The point cloud only contains information from the top surface of the body facing the camera, so we clip the mesh vertices that do not have at least one adjacent face facing the camera. Finally, we normalize the mesh by vertex density: while the density of the point cloud is uniform from downsampling, the mesh vertices are highly concentrated in some areas like the face. We normalize each per-vertex error by the average of its adjacent face surface areas.

Figure 8: PressureNet results on real data with the best performing network (trained with 184K samples).

We evaluated PressureNet on the synthetic test set and compared the results to the real test set. We clip the estimated and ground truth mesh vertices and normalize per-vertex error in the same way as the real data. Additionally, we evaluated per-joint error ( joints) using mean-per-joint-position error (MPJPE), and per-vertex error ( vertices) using vertex-to-vertex error (v2v) for the synthetic data. We evaluated the network’s ability to infer posture using the participant-selected pose dataset by manually labeling the inferred posture (4 labels: supine, prone, left/right lateral). We also compared to a baseline human, BL, where we put a body of mean shape in a supine position in the center of the bed and compare it to all ground truth poses. We positioned the legs and arms to be straight and aligned with the length of the body.

6 Results and Discussion

Network Description

Training

data ct.

12K synth

MPJPE (cm)

12K synth

v2v (cm)

12K synth

3DVPE (cm)

1K real

3DVPE (cm)

99 real

3DVPE (cm)



Best
184K 11.18 13.50 3.94 4.99 4.76
Noise ablated 184K 11.18 13.52 3.97 5.05 4.79
Input , ablated 184K 11.39 13.73 4.03 5.07 4.85
Best - small data 46K 12.65 15.28 4.35 5.17 4.89
PMR ablated 184K 12.28 14.65 4.38 5.33 4.94
Baseline - mean pose - 33.30 38.70 8.43 6.65 5.22

Table 2: Results comparing testing data and network type.

Overall, we found that using more synthetic data resulted in higher performance in all tests, as shown in Table 2. As expected, ablating the PMR network and ablating noise reduced performance. Ablating contact and edge inputs also reduced performance. We expect that comparable performance could be achieved without them, possibly by changing the details of the CNN. Fig. 8 shows results from the best performing network with 184K training images, noise, and the PMR network.

We compared the error on a set of 99 participant selected poses, shown in Table 3, using the best performing PressureNet. Results show a higher error for lateral postures where the body center of mass is further from the mat and limbs are more often resting on other limbs or the body rather than the mat. Results on partitioned subsets of data can be found in Appendix B.4. Fig. 9 shows four failure cases.

posture partition test ct. 3DVPE (cm) posture match

no instruction
19 3.93 100%
supine 20 4.02 100%
right lateral 20 5.45 100%
left lateral 20 5.37 100%
prone 20 4.96 95%*
Table 3: Results - participant selected poses. *See Fig. 9-top left.
Figure 9: Some failure cases. (a) Real data. (b) Testing on synthetic training data.

7 Conclusion

With our physics-based simulation pipeline, we generated a dataset, PressurePose, consisting of 200K synthetic pressure images with an unprecedented variety of body shapes and poses. Then, we trained a deep learning model, PressureNet, entirely on synthetic data. With our best performing model, we achieve an average pose estimation error of , as measured by 3DVPE, resulting in accurate 3D pose and body shape estimation with real people on a pressure sensing bed.

Acknowledgement: We thank Alex Clegg. This work was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1148903, NSF award IIS-1514258, NSF award DGE-1545287 and AWS Cloud Credits for Research.

Disclosure: Charles C. Kemp owns equity in and works for Hello Robot, a company commercializing robotic assistance technologies. Henry M. Clever is entitled to royalties derived from Hello Robot’s sale of products.

Appendix A: PressurePose Data Generation

.1 Initial Pose Sampling

We use rejection sampling to generate initial pose dataset partitions. Our criteria are as follows.

Uniform Cartesian space distribution - Fig. 10 (a). We use rejection sampling to uniformly sample poses with respect to the Cartesian space, by discretizing the space and ensuring that a given limb is equally represented in each unit. We define a Cartesian space as a cuboid for checking for presence of the most distal limb. First, we constrain in the directions to how far the distal joint (e.g. right foot, ) can extend from the promixal joint (e.g. right hip, ) in a limb. For the legs, we assume that the foot cannot move above the hip. For the right leg, these constraints can be summarized as: and . We also constrain the direction to ensure that the distal joint is initially positioned at a height close to where the proximal joint is: For laying poses, the distal joints (feet and hands) are more likely to end up close to the surface of the bed than very high in the air, for example. This constraint promotes simulation stability and decreases the time it takes for physics simulation #1 (Fig. 2) to reach an equilibrium state. We constrain .

Next, we break up into a set of smaller cuboids as shown in Fig. 10-top middle. For each limb we uniformly sample a cuboid from and then use rejection sampling on the limb joint angles — in the case of Fig. 10 (a), the right leg — until .

Generate common posture partitions - Fig. 10 (b). Some common postures, such as resting with the hands behind the head, are unlikely to be generated when the joint angles are sampled from a uniform distribution. For example, there is a probability of generating a pose with the hands-behind-the-head when sampling joint angles uniformly, so a network trained with such little hands-behind-the-head data has difficulty learning such a pose. We mitigate this issue by checking for presence of the most distal joint in a cuboid representing where it would be located in such as pose. If the joint is within the cuboid, e.g. , the joint passes the criteria and we add the limb pose to the set of checked initial poses.

Figure 10: Rejection sampling criteria. (a) Evenly distributing right leg poses across Cartesian space by sampling from four non-overlapping Cartesian cuboids, . Reject pose angles if (b) For sampling right arm in the hands-behind-head partition, we reject the right arm pose angles if . (c) Pose feasibility checking via collision detection.

Prevent self-collision - Fig. 10 (c). We reject poses that result in self collision by capsulizing the mesh and using the DART collision detector. We check the hands, forearms, feet, and lower leg capsules for collision with any other capsules except their adjacent capsules (e.g. forearm and upper arm should overlap).

.2 Dynamic Simulation Details

Weighting particles in FleX. We directly calculate particle mass for the particlized human in physics simulation #2, as well as for the particlized calibration objects depicted in Fig. 5 (b). Since FleX is a position-based dynamics simulator and the mass is defined by units of inverse mass on an arbitrary scale, we begin by defining the inverse mass scale for particles in the particlized human.

For this, we assume that the volume each particle in the human takes up, as well as the density of particles, is the same for that of water. Because volume and density are equal, we also can set inverse mass equal, so , thus .

We calculate the inverse mass for particles in calibration objects by a density ratio to that of water, given a known weight of the object and the object volume :

(10)

where is the density of water and is gravity. In contrast to the humans and objects rested on the bed, the the soft mattress and synthetic pressure mat particle inverse mass are determined from an optimization described in Appendix .4.

Weighting the capsulized human chain. We compute a per-capsule weight for the articulated capsulized chain in DartFleX based on the weight distribution for an average person and capsule volume ratios. First, we describe how we assign capsule mass for the average person. We use average body mass and mass distribution values from Tozeren [55], and calculate capsule volumes from body shape. We assume the average human of gender has a mass of , mass percentage distribution for body part of , and SMPL body shape parameters . We define the mass of each capsule in an average person to be:

(11)

where is the volume of capsule for a mean body shape , and is the sum of volumes for all capsules in body part . Now, we describe how this capsule mass can be converted into masses for people of other shapes. To find the mass of some capsule for a body of particular shape , we use a capsule volume ratio between the particular person and an average person:

(12)

where is the volume of some arbitrary capsule. Computing capsule volume analytically is simple given radius and length, but this is complicated by capsule overlap, which is often substantial in the SMPLIFY capsulized model [6] we use. Instead, we use discretization to compute capsule volume and correct for overlap. First, we use the SMPLIFY regressor to calculate capsule radius and length from body shape . Besides shape, overlap is dependent on the particular pose of the capsulized model. We assume that pose dependent differences in overlap are very small, and set the pose constant at . We then compute the global transform for each capsule using this shape and pose. From capsule radii, lengths, and global transforms, we place all capsules in 3D space and voxelize them with a resolution of . This produces a set of 3D masks, which are tagged to their corresponding capsules. Voxels belonging to a unique capsule are allocated directly, while voxels belonging to multiple capsules are allocated fractionally based on the number of capsules sharing the voxel. We compute capsule mass inertia matrices analytically from capsule radius and length.

Capsulized body joint stiffness. For an average person, we set the following joint stiffnesses for the shoulders, elbows, hands, hips, knees and feet to low stiffness: Nm, Nm, Nm, Nm, Nm and Nm. We set torso and head stiffness very stiff Nm. For a person of particular body shape, we weight joint stiffnesses by the body mass ratio, where . We set joint damping . The direction and magnitude of stiffness force on each joint is dependent on joint equilibrium position, i.e. the joint angle where force is 0. We set the equilibrium position of the joints to be the home pose, where the arms are at the sides and the legs are straight. In the SMPLIFY model, home pose consists of equilibrium joint positions set to 0, except the shoulders, which are bent downward at degrees. Rather than set to initial joint angles , we do this to guide the pose away from extreme angles at a modest force.

Because we set the joint stiffness low, our dataset does not capture non-resting postures, such when a person is getting in/out of bed (recall Table 1). However, we have been able to generate resting sitting poses by bending the mattress and pressure mat into a sitting configuration and then resting a person on it, like the sitting postures in [15].

Settling criteria - Physics simulation #1. For physics simulation #1, the goal is to slowly allow the body to fall on the bed and settle into a resting pose. We start the capsulized body at a height based on the lowest point on the body. For many randomly sampled poses, the lowest joint is initially much lower than the center of mass, which causes the center of mass to build significant momentum by the time it reaches the bed. We found that this caused bouncing and instability, and was qualitatively different from the motion one might take to assume a resting pose in bed. We alleviate this issue by zeroing the velocity of the capsulized model every 4 iterations in the simulation () until a capsule that better represents the center of mass contacts the surface of the bed. For this, we use the capsule approximating the buttocks.

Finding a resting pose in static equilibrium is hampered by the stability of DartFleX: DART uses a more traditional physics solver and FleX uses position-based dynamics, which are challenging to connect in a stable loop. Rather than run the simulation until static equilibrium, we use a cutoff threshold that takes velocity and acceleration of all capsules into account. We define a resting body as that when the maximum velocity of all capsules has reached and maximum acceleration has reached . In the event the model does not settle within iterations or the pressure array becomes unstable (defined by separation of particles in the pressure mat, e.g. limb poking into mat), the simulation is terminated and the particular is rejected. Across the whole dataset, we found roughly a rejection rate for both of these criteria.

Settling criteria - Physics simulation #2. We use the same approach as simulation #1 to determine the height to drop particlized humans. We found it to always be stable for our purpose, and it took roughly 150 iterations to reach the same resting velocity and acceleration previously stated. Because it only uses FleX and the limbs do not move kinematically, it is an order of magnitude faster to run and provides greater flexibility to determine settling criteria. We ran simulation #2 for a minimum of iterations and terminated it once the velocity and acceleration thresholds of the particlized human, and , were reached. In almost all cases, iterations was sufficient.

Computation time. For both physics simulations, we ran 10 parallel simulation environments on a computer with 32 cores and a NVIDIA 1070-Ti GPU. This allowed us to generate roughly 35,000 labeled synthetic pressure images per day.

Figure 11: Size of synthetic pressure mat. Physics simulation #1 uses forces from particles on the entire covered bed. The pressure mat calculated in physics simulation #2 uses a smaller subset representing the size of the real pressure mat.
Figure 12: Pressure mat pyramidal structure showing FleX parameters that we optimized using CMA-ES.

.3 Pressure Mat Structure Details

Limited pressure sensing area. The sensing portion of the real pressure mat does not cover the entire mattress. We measured a non-sensing border of 6 cm on the sides of the bed and 9 cm at the top and bottom. We built the simulator in the same way: the synthetic pressure mat covers the entire bed (68 x 33), but only an inner subset (64 x 27) representing the sensing area of the pressure image array is recorded, as depicted in Fig. 11.

FleX spring constraints. FleX particles in the synthetic pressure mat are bound together by stiffnesses shown in Fig. 12.

Pressure mat adhesion. For the real pressure mat, velcro and tape are used to prevent sliding across the bed. For the synthetic pressure mat, particles are fixed in horizontal directions across the bed.

.4 FleX Calibration

Although FleX is able to simulate soft bodies, FleX is not optimized to model real-world physics or to calculate realistic pressures. To optimize our FleX simulation to match the real-world mattress and pressure mat, we place a set of static objects on the real mattress, and record the resulting pressure images from the pressure mat. We then build a similar environment in FleX, and we optimize FleX parameters such that the simulated and real-world measurements closely align.

We jointly optimize 16 deformable bed and pressure sensing array parameters using CMA-ES [23]. These include the 13 FleX parameters in Fig. 12, including 4 soft mattress parameters, 7 pressure array stiffnesses, spacing between the pressure mat layers and particle inverse mass, as well as quadratic taxel force constants , , and . To optimize, we first place a set of real rigid objects each with weights on the real bed. Fig. 5 (a) depicts , where and we use capsular objects with 5 weights for each: 1.3, 2.3, 4.5, 9.1 and 14 kg on the shorter capsules (L=20 cm), and 1.3, 4.5, 9.1, 14 and 18 kg on the longer capsules (L=40 cm). We then collect real pressure mat images and measure the distance that the mattress compresses normal to the bed surface in centimeters, .

Next, we build a matching set of simulated capsules in FleX with the same weights, where one of these objects is shown Fig. 5 (b). At each iteration of the optimization, we drop simulated capsules of each weights onto the FleX mattress, re-compute the synthetic pressure images, and compare them to the real ones. The loss function for our optimization takes as input simulated and real pressure images and is computed as:

(13)

with terms for force error in the pressure mat, , contact locations on the pressure mat, , and amount of mattress compression by the object, . For some real object with weight resting on a soft bed at depth from the unweighted height of the soft bed, a pressure image measures forces on individual taxels , where contact is a binary vector indicating which taxels are measuring non-zero forces. The upper limit is a spatial index indicating the number of taxels on the pressure image. We note that the value of for these calibration images is roughly equal to a fraction of the pressure mat size, , because we drop multiple objects simultaneously to speed up the optimization. Similar to the real mat, the values for the simulated environment are computed as , , and . The loss terms are computed as:

(14)
(15)
(16)

The first term for both and account for errors in pressure measurements between individual taxels between the real and simulated pressure mats. The second term accounts for errors in the total measured pressure under an object. All terms are normalized. Since the distances and are signed, we take the absolute value in the denominator of for normalization.

CMA-ES implementation. To optimize the FleX environment with CMA-ES [23], we used a population size of , max iterations of , max function evaluations of , mean learning rate of , function tolerance of , function history tolerance of , x-change tolerance of , max standard deviation of , and stagnation tolerance of . We used a machine with 8 cores and a Nvidia 1070-Ti GPU, and the optimization took 6 days.

Various combinations of parameters result in simulation instability. We perform a constrained optimization by placing a high cost on the evaluation function, f_eval, when a parameter is suspected of causing instability.

  • [noitemsep,nolistsep]

  • Negative FleX parameters can cause instability. If any negative FleX parameter is proposed, a high f_eval is assigned.

  • Large differences between (see Fig. 12) causes knotting in the simulated array. If any stretch, bending, or shear stiffness value is outside of the range , we add the deviation from this range to the f_eval.

  • An unusually long simulation time step indicates instability in the parameters. In this event, the particular rollout is terminated and a high f_eval is assigned.

  • If an object takes too long to settle, the rollout is terminated and a high f_eval is assigned.

.5 DartFleX Calibration

The purpose of this calibration is to calibrate the force that should be applied to a DART capsule from particle penetration on the FleX pressure mat. This enables the two simulators to be connected through a mass-spring-damper model, which we described in Section 3.2 in the main paper.

We begin with an optimized FleX environment (Appendix .4) and calibrate the spring coefficient , from the mass-spring-damper model. We calibrate so that the dynamic collision geometries displace the FleX mattress in the same way that real objects would. We take the same set of real objects from the FleX calibration of various shapes and weights , where , place them on the real mattress, and measure the mattress displacement . Then, we recreate the objects as collision geometries in FleX, displace the FleX mattress by , and record the sum of particle penetration distances of underlying taxels . We compute as the average across objects:

(17)

where the vertical bar indicates the amount that object of weight is displaced by distance , which results in particle penetration distances . The length of a timestep is uncontrollable in FleX. Thus, the timestep in DART is calculated by dropping objects in both environments from a matching height and equating the time to contact the ground, where both simulators have . This resulted in a DART timestep of .

.6 Real Dataset Collection Details

Participants donned an Optitrak motion capture suit with high contrast to the bed sheets to facilitate analysis of the pose and body shape. We provided S, M, L and XL sizes, and instructed participants to use a form fitting size.

We used the IAI Kinect2 package to calibrate the Kinect [57]. Our released dataset consists of RGB images and depth/point cloud data from the Kinect that are synchronized and spatially co-registered to the pressure images. We manually synchronized the modalities; only static poses are captured so the time discrepancy is insignificant. We spatially co-registered the Kinect to the pressure mat by putting 1” tungsten cubes on the corners of the pressure mat, which could be seen with all modalities. We captured a co-registration snapshot for each participant, which was taken after they were finished. We created an interface to click on the tungsten block locations on the images and used CMA-ES to find the 6DOF camera pose and co-register it with the mat.

.7 Dataset Partitions


pose partition, limb distribution

gender

limbs

on bed

train ct.

synth

test ct.

synth

test ct.

real


general*
F N 26000 3000 120
even leg space: M N 26000 3000 119
even arm space: F Y 26000 3000 120
M Y 26000 3000 120

supine general**
F N 13000 1500 40
even leg space: M N 13000 1500 39
even arm space: F Y 13000 1500 40
M Y 13000 1500 40

supine hands behind head**
F Y 2000 500 40
even leg space, arms Fig. 10(b) M Y 2000 500 40

prone hands up
F Y 4000 500 40
even leg space, hnds above shldrs M Y 4000 500 40


supine crossed legs**
F N 2000 - -
even leg space, even arm space, M N 2000 - -
feet must cross according to F Y 2000 500 40
direction in Fig. 10(a) M Y 2000 500 38
supine straight limbs** F N 2000 - -
even leg space, even arm space, M N 2000 - -
elbows and knees straight F Y 2000 500 40
M Y 2000 500 36
TOTAL - - 184000 22000 952
Table 4: Partitions for synthetic data and prescribed poses. For evening the leg space, see Fig. 10(a). For evening the arm space, an additional four subspaces are chosen because the most distal joint (hand) is allowed to extend all the way below and above the limb root joint (shoulder), measured in the direction.
* ,
** ,
,

Table 4 presents a detailed description of the data partitions. We split the data for gender. We also split for requiring initial limb positions to be over the surface of the bed, meaning that the Cartesian cuboids used for initial pose sampling (recall Fig. 10) are clipped in the and directions at the edge of the mattress.

.8 Dataset Limitations

Domain gap. The real pressure mat has a larger force range. Additionally, as a result of putting a blanket on the bed during the real study, the overall pressure magnitude was reduced , which was not reflected in synthetic data calibration. To correct for this, we normalize as described in Appendix .1.

Figure 13: Uncomfortable or infeasible poses outside of typical human movement range (left, middle). Impossible pose where the thighs are in collision (right).
Figure 14:

PressureNet: Convolutional Neural Network (CNN) with five convolutional layers, one max pooling layer, and one fully connected layer. Input images are normalized by per-image division by the sum of taxels. * indicates that the number of channels shown (3) represents Mod1 in Fig. 6 (a), whereas Mod2 in Fig. 6 (a) uses 5 input channels.

Synthetic body joint limits. We observed that roughly of the synthetic poses appear uncomfortable or infeasible for a real person (Fig 13). This work could be improved by using pose-conditioned joint angle limits such as [3] instead of constant limits. Fig. 13-right shows an impossible pose where the thighs are in collision. We were not able to check collisions between the thighs using the capsulized model because the thigh capsules are often in collision for valid poses.

Figure 15: PressureNet: Differentiable SMPL human mesh reconstruction from Kanazawa et al. [30]. Our additions to [30] include input constraints (shown in the light grey box) and the root joint rotation and translation.
Figure 16: PressureNet: Pressure Map Reconstruction (PMR). PMR is fully differentiable, and performs sorting, filtering and patching to reconstruct spatial maps from the human mesh.

Appendix B: PressureNet

.1 PressureNet Architecture Details

CNN - Convolutional Neural Network. Our CNN architecture, depicted in Fig. 14, is similar to that of Clever et al. [15]

, and uses the same kernel sizes, layers, and dropout. The first layer is a convolutional layer with a 7x7 kernel, and uses a stride of 2 and zero padding of size 3 on the sides of input images. The max pooling layer has a stride of 2 and padding of 0. All other convolutional layers are 3x3 with a stride of 1 and padding of 0. We use 192 channels in the first two convolutional layers and in the max pooling layer, and 384 channels in the last two convolutional layers. This CNN also differs from

[15]

in that we use tanh activation functions instead of ReLU. Through informal testing on smaller data sizes (e.g. 46K images), we observed that networks with tanh activations had less overfitting. We normalize the input and output of the network. To normalize the input channels, divide by the sum of taxels for each input image,

. To normalize the output, we multiply it by the range of shape, pose, and posture parameters from the synthetic training dataset. We compute the range from the lower and upper limits, and , of all parameters in the training dataset. For joint angle limits (i.e. pose), we use values from [52, 7, 4]. For body shape, we use sampling bounds from [49]. For global rotation, we use our sampling bounds for roll and yaw of and , and for global translation, we use the size of the bed.

SMPL - Human Mesh Reconstruction. Following the CNN, we use the human model generative part of the HMR network [30], which inputs estimated shape, pose, and posture , and outputs a differentiable human mesh reconstruction , as well as a set of Cartesian joint positions

. This generative SMPL model, implemented in PyTorch 

[37], along with our modifications, is presented in Fig. 15.

In addition to using the generative kinematic SMPL embedding part of the full HMR network, our implementation constrains the input parameters to keep angles within human limits and body shape parameters inside our initial sampling range. To constrain the input parameters, we normalize the parameters to a range based on the limits , , and use a tanh function for a soft limit that is more amenable to gradient descent. Then, we perform a reverse normalization to scale back up. To prevent the tanh from clipping feasible values at the angle limits, for example a straight knee that is at 0 degrees, we inflate the angle range by a factor as shown in the figure.

PMR - Pressure Map Reconstruction. PMR, a novel component of PressureNet, takes as input a human mesh in global space , and outputs a set of reconstructed spatial maps , which resemble a real pressure image and indicate where contact occurs between the estimated mesh and the bed. We reconstruct these maps differentiably as depicted in Fig. 16

, meaning that we can backpropagate gradients through PMR to train the CNN. The PMR loss is based on the error between estimated spatial maps

and ground truth spatial maps . PMR works by projecting the mesh onto the surface of the bed and computing the distance that it sinks into the bed over each taxel. This amounts to finding the distance between the lowest vertex within the cm area of each taxel and the undeformed height of the bed.

The PMR input is in units of meters, which we convert to units of taxels (), so it can be indexed on the scale of the pressure image. We then use a process involving sorting, filtering, and patching to recreate the spatial maps, which is detailed in Fig. 16.

Figure 17: PressureNet deep learning in action, showing an example from our synthetic test set. The first network module (“Mod1”) outputs an initial coarse pose estimate (right leg shown) and a reconstructed pressure map . The second network module (“Mod2”) corrects the estimated black mesh by a small angle difference based on the spatial residual between and .

.2 PressureNet Loss Function

We compute a loss on joint error rather than vertex error because the vertices are highly concentrated in some areas like the face and hands for aesthetic reasons, rather than for representing overall pose. Moreover, training the first network module (“Mod1”) with reconstruction of 24 joint positions rather than a full set of vertices is much faster.

The purpose of the second network module (“Mod2”) is to fine-tune an initial estimate from Mod1 using both reconstructed pressure maps as input and a loss function with spatial map awareness. Fig. 17 shows a real example of how Mod2 corrects the initial mesh estimate from Mod1 using PMR. Note the spatial difference in the input images for Mod2, where the reconstructed map of the foot pressure in is shifted further right than the information on pressure image .

.3 PressureNet Training Details

We build PressureNet in PyTorch [44], which is shown at a high level in Figure 6 (b). For both Mod1 and Mod2, we used a learning rate of and a weight decay of , which are the same used in [15]. We used the Adam optimizer for gradient descent [31]. Training Mod2 for 100 epochs using 184K images took 3 days on a Nvidia Tesla K80 GPU. Training Mod2 took 8 days due to increased computation from PMR.

.4 Results for Separate Partitions

test ct. test ct. 3DVPE 3DVPE
pose partition real synth real (cm) synth (cm)

supine straight limbs
76 1000 3.71 2.68
supine general 159 2000 4.51 3.40
supine crossed legs 78 1000 4.49 3.41
prone hands up 80 1000 5.12 4.24
general, roll 479 6000 5.39 4.30
supine hands behind head 80 1000 5.09 4.40
gender partition
F 480 6000 4.88 3.85
M 472 6000 5.10 4.04
Table 5: Partitioned results for prescribed poses with the best network for each real and synthetic.

Table 5 shows the results of our PressureNet evaluated between prescribed resting poses from participants in bed, and a per-gender comparison.

.5 Additional Failure Cases

We present additional failure cases in Fig. 18. One limitation is that our network does not have an interpenetration error, so the limbs sometimes intersect, e.g. the left hand in Fig. 18(a)-top left. Our network also failed for some limbs when there was little or no contact information, and for non-resting poses. This issue is related to the limitations of the sensor, which were explored in [15]. Our network failed for non-resting poses, such those in [15]; however these are not part of the training or testing PressurePose dataset. We observed some inaccuracies when testing on training data (Figs. 9 and 18), which suggests that there is a performance limitation on the network’s ability to extract pressure image features in some scenarios.

Figure 18: (a) Real data failure cases. Self penetration of inferred left hand into chest (top), lack of information on mat leading to inaccurate pose (bottom). (b) Synthetic data failure cases: testing on training data, various inaccuracies.

References

  • [1] F. Achilles, A. Ichim, H. Coskun, F. Tombari, S. Noachtar, and N. Navab (2016) Patient mocap: human pose estimation under blanket occlusion for hospital monitoring applications. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 491–499. Cited by: Table 1, §2, §2, §2, §2.
  • [2] A. Agarwal and B. Triggs (2006) Recovering 3d human pose from monocular images. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28 (1), pp. 44–58. Cited by: §2.
  • [3] I. Akhter and M. J. Black (2015) Pose-conditioned joint angle limits for 3d human pose reconstruction. In CVPR, Cited by: §.8.
  • [4] G. B. Andersson. (1982) Normal range of motion of the hip, knee and ankle joints in male subjects, 30–40 years of age. Acta Orthopaedica Scandinavica 53 (2), pp. 205–208. Cited by: §3, §.1.
  • [5] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis (2005) SCAPE: shape completion and animation of people. ACM Transactions on Graphics 24 (3), pp. 408–416. Cited by: §2.
  • [6] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black (2016) Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In ECCV, pp. 561–578. Cited by: §1, §2, §3.2, §.2.
  • [7] D. C. Boone and S. P. Azen (1979) Normal range of motion of joints in male subjects. Journal of Bone and Joint Surgery 61 (5), pp. 756–759. Cited by: §3, §.1.
  • [8] S. Brahmbhatt, C. Ham, C. C. Kemp, and J. Hays (2019) ContactDB: analyzing and predicting grasp contact via thermal imaging. In CVPR, pp. 8709–8719. Cited by: §2.
  • [9] A. Bulat and G. Tzimiropoulos (2016) Human pose estimation via convolutional part heatmap regression. In ECCV, pp. 717–732. Cited by: §2.
  • [10] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik (2016) Human pose estimation with iterative error feedback. In CVPR, pp. 4733–4742. Cited by: §2.
  • [11] L. Casas, N. Navab, and S. Demirci (2019) Patient 3d body pose estimation from pressure imaging. International Journal of Computer Assisted Radiology and Surgery, pp. 1–8. Cited by: §1, §1, Table 1, §2, §2, §2.
  • [12] K. Chen, P. Gabriel, A. Alasfour, C. Gong, W. K. Doyle, O. Devinsky, D. Friedman, P. Dugan, L. Melloni, T. Thesen, D. Gonda, S. Sattar, S. Wong, and V. Gilja (2018) Patient-specific pose estimation in clinical environments. IEEE Journal of Translational Engineering in Health and Medicine 6, pp. 1–11. Cited by: §1, Table 1, §2, §2.
  • [13] W. Chen, H. Wang, Y. Li, H. Su, Z. Wang, C. Tu, D. Lischinski, D. Cohen-Or, and B. Chen (2016) Synthesizing training images for boosting human 3d pose estimation. In Conference in 3D Vision, pp. 479–488. Cited by: §2.
  • [14] A. Clegg, W. Yu, Z. Erickson, J. Tan, C. K. Liu, and G. Turk (2017) Learning to navigate cloth using haptics. In International Conference on Intelligent Robots and Systems, pp. 2799–2805. Cited by: §2.
  • [15] H. M. Clever, A. Kapusta, D. Park, Z. Erickson, Y. Chitalia, and C. C. Kemp (2018) 3D human pose estimation on a configurable bed from a pressure image. In IROS, pp. 54–61. Cited by: §1, §1, §1, Table 1, §2, §2, §2, §2, §4, §.2, §.1, §.3, §.5.
  • [16] V. Davoodnia, S. Ghorbani, and A. Etemad (2019) In-bed pressure-based pose estimation using image space representation learning. arXiv preprint arXiv:1908.08919. Cited by: §2.
  • [17] Z. Erickson, H. M. Clever, G. Turk, C. K. Liu, and C. C. Kemp (2018) Deep haptic model predictive control for robot-assisted dressing. In International Conference on Robotics and Automation (ICRA), pp. 1–8. Cited by: §2.
  • [18] Z. Erickson, V. Gangaram, A. Kapusta, C. K. Liu, and C. C. Kemp (2020) Assistive gym: a physics simulation framework for assistive robotics. In International Conference on Robotics and Automation (ICRA), Cited by: §2.
  • [19] M. Farshbaf, R. Yousefi, M. B. Pouyan, S. Ostadabbas, M. Nourani, and M. Pompeo (2013) Detecting high-risk regions for pressure ulcer risk assessment. In BIBM, pp. 255–260. Cited by: §1, §2.
  • [20] R. Grimm, S. Bauer, J. Sukkau, J. Hornegger, and G. Greiner (2012) Markerless estimation of patient orientation, posture and pose using range and pressure imaging. International journal of computer assisted radiology and surgery 7 (6), pp. 921–929. Cited by: §1, §1, Table 1, §2, §2.
  • [21] A. Habib, I. Ranatunga, K. Shook, and D. O. Popa (2014) SkinSim: a simulation environment for multimodal robot skin. In International Conference on Automation Science and Engineering, pp. 1226–1231. Cited by: §2.
  • [22] N. Hansen, Y. Akimoto, and P. Baudis (2019-02) CMA-ES/pycma on Github. Note: Zenodo, DOI:10.5281/zenodo.2559634 External Links: Document, Link Cited by: §3.3.
  • [23] N. Hansen, S. D. Müller, and P. Koumoutsakos (2003) Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary Computation 11 (1), pp. 1–18. Cited by: §.4, §.4.
  • [24] T. Harada, T. Mori, Y. Nishida, T. Yoshimi, and T. Sato (1999) Body parts positions and posture estimation system based on pressure distribution image. In ICRA, Vol. 2, pp. 968–975. Cited by: §2.
  • [25] T. Harada, T. Sato, and T. Mori (2001) Pressure distribution image based human motion tracking system using skeleton and surface integration model. In ICRA, Vol. 4, pp. 3201–3207. Cited by: §1, Table 1, §2.
  • [26] M. Hassan, V. Choutas, D. Tzionas, and M. J. Black (2019) Resolving 3d human pose ambiguities with 3d scene constraints. In ICCV, Cited by: §2.
  • [27] Y. Hasson, G. Varol, D. Tzionas, I. Kalevatykh, M. J. Black, I. Laptev, and C. Schmid (2019) Learning joint reconstruction of hands and manipulated objects. In CVPR, pp. 11807–11816. Cited by: §2.
  • [28] B. Hollis, S. Patterson, J. Cui, and J. Trinkle (2018) BubbleTouch: a quasi-static tactile skin simulator. In Artificial Intelligence for Human-Robot Interaction, Cited by: §2.
  • [29] Invacare 5410IVC Full Electric Homecare Bed. Note: www.invacare.com/cgi-bin/imhqprd/inv_catalog/prod_cat_detail.jsp?s=0&prodID=5410IVC, Last accessed on 2020-02-27 Cited by: §1.
  • [30] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik (2018) End-to-end recovery of human shape and pose. In CVPR, pp. 7122 – 7131. Cited by: §1, §2, §2, §4, Figure 15, §.1.
  • [31] Cited by: §.3.
  • [32] J. Lee, S. Ha, T. Kunz, S. Jain, Y. Ye, S. S. Srinivasa, M. Stilman, and C. K. Liu (2018) DART: dynamic animation and robotics toolkit.

    Journal of Open Source Software

    3 (22).
    Cited by: §2, §3.
  • [33] J. J. Liu, M. Huang, W. Xu, and M. Sarrafzadeh (2014) Bodypart localization for pressure ulcer prevention. In EMBC, pp. 766–769. Cited by: §1, Table 1, §2, §2.
  • [34] S. Liu, Y. Yin, and S. Ostadabbas (2019) In-bed pose estimation: deep learning with shallow dataset. IEEE Journal of Translational Engineering in Health and Medicine 7, pp. 1–12. Cited by: Table 1, §2.
  • [35] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black (2015) SMPL: a skinned multi-person linear model. ACM Transactions on Graphics 34 (6), pp. 248. Cited by: §1, §1, §2, §3, §3.
  • [36] M. Macklin, M. Müller, N. Chentanez, and T. Kim (2014) Unified particle physics for real-time applications. ACM Transactions on Graphics 33 (4). Cited by: §2, §3.
  • [37] MandyMo (2018) PyTorch HMR - https://github.com/MandyMo/pytorch_HMR. github. Cited by: §.1.
  • [38] V. Medical BodiTrak pressure mapping system solutions. Note: www.boditrak.com/products/medical.php, Last accessed on 2020-02-27 Cited by: §1.
  • [39] I. Millington (2010) Game physics engine development: how to build a robust commercial-grade physics engine for your game. CRC Press. Cited by: §1.
  • [40] A. Newell, K. Yang, and J. Deng (2016) Stacked hourglass networks for human pose estimation. In ECCV, pp. 483–499. Cited by: §2.
  • [41] R. Okada and S. Soatto (2008)

    Relevant feature selection for human pose estimation and localization in cluttered images

    .
    In ECCV, pp. 434–445. Cited by: §2.
  • [42] S. Ostadabbas, M. B. Pouyan, M. Nourani, and N. Kehtarnavaz (2014) In-bed posture classification and limb identification. In BioCAS, pp. 133–136. Cited by: §2.
  • [43] S. Ostadabbas (2019) Seeing under the cover: a physics guided learning approach for in-bed pose estimation. In Medical Image Computing and Computer Assisted Intervention, pp. 236–245. Cited by: Table 1, §2, §2, §2.
  • [44] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. Cited by: §.3.
  • [45] G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black (2019) Expressive body capture: 3d hands, face, and body from a single image. In CVPR, pp. 10975–10985. Cited by: §2.
  • [46] G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis (2017) Coarse-to-fine volumetric prediction for single-image 3d human pose. In CVPR, Cited by: §2.
  • [47] S. Payandeh and N. Azouz Finite elements, mass-spring-damper systems and haptic rendering. In ICRA, pp. 224–229. Cited by: §3.2.
  • [48] Z. Pezzementi, E. Jantho, L. Estrade, and G. D. Hager (2017) Characterization and simulation of tactile sensors. In IEEE Haptics Symposium, pp. 199–205. Cited by: §2.
  • [49] A. Ranjan, J. Romero, and M. J. Black (2018) Learning human optical flow. In British Machine Vision Conference, Cited by: §3, §.1.
  • [50] N. Sarafianos, B. Boteanu, C. Ionescu, and I. A. Kakadiaris (2016) 3D human pose estimation: a review of the literature and analysis of covariates. Computer Vision and Image Understanding (152), pp. 1–20. Cited by: §2.
  • [51] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake (2011) Real-time human pose recognition in parts from single depth images. In CVPR, pp. 1297–1304. Cited by: §2.
  • [52] J. M. Soucie, C. Wang, A. Forsyth, S. Funk, M. Denny, K. E. Roach, D. Boone, and H. T. C. Network (2011) Range of motion measurements: reference values and a database for comparison studies. Haemophilia 17 (3), pp. 500–507. Cited by: §3, §.1.
  • [53] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in neural information processing systems, pp. 1799–1807. Cited by: §2.
  • [54] A. Toshev and C. Szegedy (2014) Deeppose: human pose estimation via deep neural networks. In CVPR, pp. 1653–1660. Cited by: §2, §2.
  • [55] A. Tözeren (1999) Human body dynamics: classical mechanics and human movement. Springer. Cited by: §3.2, §.2.
  • [56] G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid (2017) Learning from synthetic humans. In CVPR, pp. 109–117. Cited by: §2.
  • [57] T. Wiedemeyer (2014 – 2015) IAI Kinect2. Institute for Artificial Intelligence, University Bremen. Note: https://github.com/code-iai/iai_kinect2Accessed June 12, 2015 Cited by: §.6.
  • [58] T. Yu, Z. Zheng, Y. Zhong, J. Zhao, Q. Dai, G. Pons-Moll, and Y. Liu (2019) Simulcap: single-view human performance capture with cloth simulation. In CVPR, pp. 5499–5509. Cited by: §2.
  • [59] X. Zhou, X. Sun, W. Zhang, S. Liang, and Y. Wei (2016) Deep kinematic pose regression. In ECCV 2016 Workshops, pp. 186–201. Cited by: §2.