1 Introduction
The accurate acquisition of physical objects has numerous applications, especially in the entertainment industry. While there exist commercial systems for digitizing rigid objects, the acquisition of deforming objects remains a challenge due to the complex changes in geometry over time. A rigid object can be scanned sequentially from multiple viewpoints to accurately capture the complete surface, whereas scanning the entire surface of a deforming object would require a complex and expensive physical setup involving multiple synchronized sensors, which may still be subject to occlusions.
Recently, several techniques were proposed that solve this problem by using a template shape as a geometric and topological prior for the reconstruction and by deforming the template to fit to the observed data [10, 38, 21, 7]. In some of these methods, the observed data comes from a set of singleview scans. The single viewpoint assumption greatly simplifies the acquisition process. Templatebased tracking approaches are shown to lead to visually pleasing results for numerous examples. However, the deformation of the unobserved side of the object is generally only guided by a smoothness function.
We combine a trackingbased approach with fitting a volumetric elastic model to improve the estimation of the unobserved side of the object. We employ a linear finite element method (FEM) to solve for physical deformations when a given force is applied. Our method proceeds in two steps: First, we use a tracking approach to deform the template model. Second, we use the displacements of the observed vertices of the template mesh found using the tracking step in a FEM to predict the displacements of the unobserved vertices. Hence, rather than smoothly deforming the unobserved side of the model, we deform the unobserved side through the volumetric mesh of the FEM model. We repeatedly linearize the deformation in the FEM at its current deformation state. Note that our method allows for tracking data acquired using single, multiple, or moving viewpoints.
While deformable models have been introduced to computer vision and computer graphics 30 years ago
[34], here we combine modern nonrigid templatebased tracking with a volumetric elastic model for completion of the deformation at the unobserved side only. Our major contributions are therefore:
The combination of a nonrigid templatebased tracking approach with a linear finite element model to robustly track the complete geometry of an object whose deformation is captured from a single viewpoint only.

The use of a FEMbased model to deform the unseen side leading to more physically plausible results than by using a smoothness cost in the templatebased tracking.

Tracking linear and nonlinear deformations by repeatedly linearizing the FEM model at its current deformation state.
This paper presents the following three major improvements over the preliminary version of this work [40]. First, we present a computationally more efficient energy formulation to track the deformable object using a nonrigid iterative closest point framework. Second, we propose an iterative method for the FEM estimation that considers forces at supporting surfaces of the model. Our method does not require the measurement of forces but estimates the forces up to a scale factor from the FEM model and the deformation. Third, we evaluate the performance of our method extensively using numerous synthetic and scanned data sequences and compare our results to our preliminary findings. Special attention is paid to the influence of both synthetic and real scanner noise, as well as to the influence of the FEM step on the result.
2 Related Work
This section reviews work related to tracking surfaces and predicting shape deformations using finite element models.
2.1 Tracking
Computing the correspondence between deformed shapes has received considerable attention in recent years and the surveys of van Kaick et al. [37] and Tam et al. [33] give a comprehensive overview of existing methods. The review in this paper focuses on techniques that do not employ a priori skeletal models or manually placed marker positions, as we aim to minimize assumptions about the structure of the surface. None of the following approaches combine physicsbased models with tracking approaches.
Templatebased approaches
The following techniques solve the tracking problem using a template as shape prior. De Aguiar et al. [10] tracked multiview stereo data of a human subject that is acquired using a set of cameras. The algorithm uses a template of the human subject that was acquired in a similar posture as the first frame. The approach used for tracking first uses Laplace deformations of a volumetric mesh to find the rough shape deformation and refines the shape using a surface deformation. The deformation makes use of automatically computed features that are found based on color information. Vlasic et al. [38] developed a similar system to track multiview stereo data of human subjects. Tung and Marsuyama [36] extended de Aguiar et al.’s approach by using 3D shape rather than color information to find the features.
Li et al. [23, 21] proposed a generic datadriven technique for mesh tracking. A template is used to model the rough geometry of the deformed object, and the algorithm deforms this template to each observed frame. A deformation graph is used to derive a coarsetofine strategy that decouples the complexity of the original mesh geometry from the representation of the deformation. Cagniart et al. [7] proposed an alternative where the template is decomposed into a set of patches to which vertices are attached. The template is then deformed to each observed frame using a data term that encourages interpatch rigidity. Cagniart et al. [8]
extended this technique to allow for outliers by using a probabilistic framework.
In this work, we combine a template fitting method with a finite element step.
Templatefree approaches
The following techniques solve the tracking problem without using a shape prior. However, the methods assume prior information on the deformation of the object. Mitra et al. [26] modeled the surface tracking problem as a problem of finding a smooth spacetime surface in fourdimensional space. To achieve this, they exploited the temporal coherence in the densely sampled data with respect to both time and space. Sharf et al. [31] used a similar concept to find a volumetric spacetime solid. Their approach assumes that each cell of the discretized fourdimensional space contains the amount of material that flowed into it.
Wand et al. [39]
used a probabilistic model based on Bayesian statistics to track a deformable model. The surface is modeled as a graph of oriented particles that move in time. The position and orientation of the particles are controlled by statistical potentials that trade off data fitting and surface smoothness. Tevs et al.
[35] extended this approach by first tracking a few stable landmarks and by subsequently computing a dense matching.Furukawa and Ponce [13] proposed a technique to track data from a multicamera setup. Instead of using a template of the shape as prior information, their technique computes the polyhedral mesh that captures the first frame and deforms this mesh to the data in subsequent frames.
2.2 Predicting Shape Deformations Using FEM
Several authors suggested learning the parameters of linear finite element models from a set of observations. We use such a method in combination with a tracking method to find an accurate tracking result of the observed and the unobserved side of the model. For a summary of linear finite element methods for elastic deformations, refer to BroNielsen [6].
Lang and Pai [19] used a surface displacement and contact force at one point along with rangeflow data to estimate elastic constants of homogeneous isotropic materials using numerical optimization. Becker and Teschner [3] presented an approach to estimate the elasticity parameters for isotropic materials using a linear finite element method. The approach takes as input a set of displacement and force measurements and uses them to compute the Young’s modulus and Poisson’s ratio (see BroNielsen [6]) with the help of an efficient quadratic programming technique. Eskandari et al. [12] presented a similar approach to estimate both the elasticity and viscosity parameters for viscoelastic materials using a linear finite element method. This approach reduces the problem to solving a linear system of equations and has been shown to run in realtime.
Syllebranque and Boivin [32] estimated the parameters of a quasistatic finite element simulation of a deformable solid object from a video sequence. The problems of optimizing the Young’s modulus and the Poisson ratio were solved sequentially. Schnabel et al. [30] used finite element models to validate the nonrigid image registration of magnetic resonance images. Nguyen and Boyce [27] presented an approach to estimate the anisotropic material properties of the cornea.
Bickel et al. [5] proposed a physicsbased approach to predict shape deformations. First, a set of deformations is measured by recording both the force applied to an object and the resulting shape of the object. This information is used to learn a relationship between the applied force and the shape deformation, which allows to predict the shape deformation for new force inputs. The technique assumes the object material to have linear elasticity properties. Bickel et al. [4] extended this approach to allow the modeling and fabrication of materials with a desired deformation behavior using stacked layers of homogeneous materials.
Recently, Choi and Szymczak [9] used FEM to predict a consistent set of deformations from a sequence of coarse watertight meshes but their method does not apply to singleview tracking, which is the focus of our method.
Finite element models are used in medical applications to estimate the material parameters of different tissues because the stiffness of a particular tissue can be used to detect anomalies [17, 42, 14]. For instance, elasticity information can help to segment medical data [17] or to detect cancerous tissues [42] or malignant lesions [14]. In this context, Lee et al. [20] proposed a method to estimate material parameters and forces of tissue deformations based on two given frames of observations. This method, which is similar in spirit to our approach, assumes that for both frames, the segmented boundaries of the organ (which are 3D surfaces) are given. The approach proceeds by repeatedly simulating a deformation from the start frame for a set of material parameters and input forces, and by measuring the distance of the simulated deformation to the given target surface. The distance to the target surface is then used to improve the estimated material parameters and input forces using a gradient descent technique. Unlike our method, this approach operates exclusively on a tetrahedral volumetric mesh and therefore has limited resolution. A more serious limitation of the approach is the need for good initial material parameters and force directions. While good initial estimates are known for many organic tissues, we do not have access to good initial estimates in our application. Hence, our approach takes a different strategy to optimize the material parameters that does not require initial estimates.
3 Overview
The input to our method consists of a closed template , a set of contact points on along with force directions that lead to the deformation of the model, and a set of observed 3D video frames (point clouds) capturing the deformation. We assume that the template has roughly the shape of the object before the deformation, and is approximately aligned to the first frame.
The main idea of our approach is to combine tracking the observed point cloud data using a template deformation approach and predicting the deformation on the unobserved side of the model using a linear FEM. To track the data, we use an energy optimization approach that aims to find deformation parameters that deform to be close to the observed data and that maintain the geometric features of . Let denote the template that was deformed to fit to . Afterwards, we displace the vertices of that are not observed in the data with a linear FEM using the given contact point and force direction. To compute the FEM deformation, we use a down sampled version of that is tetrahedralized, which is denoted by in the following. Let denote the deformed tetrahedral mesh. Finally, we readjust the shape of the unobserved side of to take this displacement into account. When multiple frames are recorded, we start the tracking and FEM deformation for frame from and , respectively. Fig. 2 gives a visual overview of the proposed method.
We use the following representation for the deformation of . Let denote the vertices of ,
denote their position vectors, and
denote their homogeneous coordinates. Furthermore, let denote the unit outer normal at . We deform by applying a transformation matrix to each of . The transformation matrix depends on six parameters: three parameters describing a translation, two parameters describing a unit rotation axis, and one parameter describing a rotation angle. That is, describes a rigid transformation as(1) 
where is a coordinate transformation that expresses a point in a coordinate frame centered at and is a coordinate transformation that expresses a point in a coordinate frame rotated by angle around axis . Expressing the transformation in a coordinate system centered at has the advantage that differences between the transformations of neighboring vertices can be measured directly.
4 Tracking of Point Cloud Data
This section presents our energy optimization approach to find the transformation parameters , and that lead to a mesh that is close to the point cloud data.
4.1 Rigid Alignment
We aim to deform the template to the observed data. When fitting to the first frame, we start by deforming using a global rigid transformation to fit the observed data as much as possible. We consider all vertices of and compute the nearest neighbor of the deformed point in the point cloud data. Let denote the unit outer normal at in the point cloud.
To rigidly align to , we find by minimizing
(2) 
with respect to seven degrees of freedom (one for scaling, three for rotation, three for translation), where
is a weight term and denotes the scalar product. Note that the term measures the distance of the transformed point to the supporting plane of its nearest data point , which leads to a faster convergence rate of the algorithm than using the distance from to [29]. Note that we fix the associated point of for this step to obtain a differentiable energy function. The scaling term accommodates slight errors in the calibration of the 3D scanner used to acquire the template shape and/or the calibration of the camera system used to acquire the frames.The weight is used to distinguish valid data observations from invalid ones, and should be one for vertices that have a valid data observation and zero for all other vertices. To exclude data points that are inconsistent with , is set to zero if the distance between and is above or if the angle between and is above , where is the average edge length of the undeformed template and and are parameters. Setting all of the remaining to one has the problem that many vertices of that are close to the acquisition boundary of may pick the same nearest neighbor, which leads to poor assignments. To remedy this problem, we wish to set to zero if is located on the acquisition boundary of
. It is not straight forward to define a “boundary” on a noisy point cloud. We use a heuristic that considers points
of to be part of the boundary if many vertices of choose as nearest neighbor. To find the boundary points in a way that is independent to global resolution changes of both and , we count for each point of the number of vertices of that chose it as nearest neighbor and average this count over all points of that were chosen by at least one point of . A point of is then considered a boundary point if its count exceeds twice the average count. In all remaining cases, is set to one.4.2 NonRigid Alignment
To fit to any frame, we deform in a nonrigid fashion by changing , and to minimize the energy
(3) 
where and are weights for the individual energy terms, is the set of indices corresponding to points that have geodesic distance at most of , and denotes the cardinality of a set. As before, is the average edge length of and is a parameter. As above, is set to zero if the angle between and is above , if the distance between and is above , or if is a boundary point, and to one otherwise. The transformation matrices are computed according to Equation 1.
The data term drives the template mesh to the observed data. However, using only this term results in an illposed problem. We therefore add the smoothness term to act as a regularization term that encourages smooth transformations. Unlike previously used regularization terms [2, 21] that measure the difference between the transformations , measures the differences in translations and rotation angles in a local coordinate system centered at . To obtain results that are invariant under rigid transformations, it is important to initialize the transformation parameters in a way that is invariant to rigid transformations of the scene. Initializing and to zero yields the initial identity transformation regardless of how is initialized. A natural rotationinvariant choice to initialize is . If were to depend on the difference in rotation axes with this initialization, would not be zero in the rest state. Hence, the difference in does not increase . We found that in practice, helps to avoid selfintersections during the deformation, which is important as selfintersections might cause problems in the FEM step.
To minimize , we start by encouraging smooth transformations by setting and . Similar to Li et al. [21], whenever the energy does not change by much, we relax the smoothness weight as to give more weight to the data term. We stop when the relative change in energy
, where is the iteration number, is less than or when is smaller than . To obtain a differentiable energy function, we do not update the associated point of for a fixed set of weights. That is, is updated every time the weight is changed.
4.3 MultiResolution Optimization
Recall that the distance to the nearest neighbor used in is limited by the template resolution. To allow for larger deformations, we use a multiresolution approach as follows. We compute a multiresolution hierarchy of by collapsing edges of the mesh according to Garland and Heckbert’s geometry criterion [15]. We do not collapse edges if this would lead to a selfintersecting mesh. We perform the test whether an edge collapse leads to selfintersections greedily by performing the collapse and testing if selfintersections occur. In each resolution step, we halve the number of vertices. We stop the collapse when the base mesh contains about 1000 vertices or when no more valid edges can be found for the edge collapse operation.
Once is minimized for resolution level , we consider the mesh of the next higher resolution level . For the vertices of level that are present in level , we initialize the transformation parameters to the result of the previous resolution. For the remaining vertices, is initialized to , and and are found by minimizing with respect to the indices that are not present in resolution level .
This multiresolution framework works well when the geometric complexity is approximately linked to the amount of deformation. However, in cases where most of the deformation occurs in featureless regions of the surface, some deformation detail may be lost by this multiresolution framework. A possible remedy is to use a deformation graph to compute a multiresolution framework [21].
4.4 PostProcessing
While the solution yields a globally smooth deformation field by design, it is not guaranteed to give a deformation field that is locally smooth at every vertex . Instead, it may happen that a single vertex is transformed by a significantly different deformation than its neighbors, thereby generating a new feature in the geometry.
If this happens, we can optionally postprocess the result as follows. For every vertex of , we consider the minimum of over all in the onering neighborhood of . If this minimum is larger than two, which means that the distance of to each of its neighbors has at least doubled during the deformation, we set the transformation parameters of to the average of the transformation parameters of the onering neighbors of
. The average rotation axis is computed using spherical linear interpolation.
In our experiments, we observed that the postprocessing step was usually only required because of the FEM step in the previous frame. That is, the FEM step caused some vertices to move relatively far from their neighbors when updating the unobserved side of the previous frame. This in turn led to a lack of smoothing as contained few points. A possible remedy to this problem is to use a more complex multiresolution framework, as discussed above. We chose not to implement this solution, because in practice, we found that this postprocessing was usually not crucial in most examples. In our experiments, less than of the frames were influenced by this postprocessing.
4.5 Comparison to Prior Work
Our preliminary work used a data energy based on a pointtopoint distance, a smoothness energy that varied depending on the differences in rotation axes, and an energy designed to discourage selfintersections by repelling closeby vertices that are not neighbors [40]. If we let denote the complexity to find , and assume to have constant complexity (which holds for regularly sampled templates), a single evaluation of the energy used by Wuhrer et al. has a complexity of for a template mesh with vertices as their energy requires the computation of all distances between pairs of vertices on the mesh. In contrast, evaluating has a complexity of . Furthermore, it is known that using a pointtoplane distance instead of a pointtopoint distance leads to faster convergence rates [29]. Hence, we reduced the computational complexity of our method.
Furthermore, as will be shown in Section 7, using results in less selfintersections and higher data accuracy than using the energy by Wuhrer et al.
5 Displacing Unobserved Vertices Using FEM
Consider the situation after was deformed to frame using the approach outlined in the previous section, and denote the deformation of by . We call the vertices in that were deformed using valid data observations observed vertices, and we call the remaining vertices unobserved vertices. Unobserved vertices were deformed using smoothness assumptions on the deformation field only. This section describes how to displace the unobserved vertices using a linear FEM.
5.1 Linear FEM
We aim to reposition the unobserved vertices of using a finite element model. We use with as start position for the FEM step. A tetrahedral mesh is used to compute the FEM. The initial tetrahedral mesh of is obtained by tetrahedralizing a simplified version of . This simplification is necessary to make the algorithm more time and space efficient. The tetrahedral mesh contains vertices on the surface of the model (these vertices are a subset of the vertices of ) and vertices that are internal to the model. In the following, let denote this tetrahedral mesh.
The FEM linearly relates the displacements of the vertices and the forces applied to the tetrahedral mesh using a stiffness matrix that depends on the geometry of the tetrahedral mesh and on two elasticity parameters, the Young’s modulus and the Poisson ratio . Let denote the vector of forces applied to the vertices of the tetrahedral mesh and let denote the vector of displacements of the vertices of the tetrahedral mesh. Both and have dimension , where is the number of vertices of the tetrahedral mesh. Furthermore, let and denote the force and displacement vectors of vertex . Then,
(4) 
This equation can be used in three ways.

Given all displacements and forces, and can be estimated by solving a linear system of equations, as shown by Becker and Teschner [3]. In principle, if the displacements and forces at all vertices of a single tetrahedron are known, the approach by Becker and Teschner can estimate and . However, due to numerical instabilities when using a single tetrahedron, in practice, redundant observations are commonly used to estimate the material parameters.

Given , , and , we can compute by a matrix multiplication.

Given and along with at least three fixed displacements, Equation 4 can be modified such that is invertible. If for each vertex with nonfixed displacement either the force or the displacement are provided, we can compute the missing displacements and force vectors by rearranging the linear system of equations [6].
We rely on forces in addition to displacements in the estimation of unobserved vertices because overspecified boundary conditions are required to estimate material parameters.
Note that this simple linear FEM is only suitable to model small deformations, as large rotations may cause artifacts. However, this problem does not occur in our case as we linearize the deformation locally at each frame by modeling deformations between consecutive frames, which ensures that only small deformations are considered. Because of using the deformed tetrahedral template from the previous tracking frame as the rest state, the material parameters estimated by our method are not expected to be physically meaningful.
5.2 Estimating the Positions of Unobserved Vertices
It is easy to see that we do not have enough constraints to use Equation 4 directly to solve for all missing information. From the tracking step, we computed displacements for all surface vertices, but not for the internal vertices. These displacements are reliable for observed vertices only, and we aim to use the FEM model to find reliable displacements for the unobserved ones.
Furthermore, we are given the direction of the force at the contact point. Note that we can normalize the length of the force direction at the contact point, since changing the length of only scales (see e.g. Becker and Teschner [3, Equation 17]). The forces at internal vertices can be assumed to be zero as no external forces can act on the interior of the model. Note that other contact surfaces of the model, such as the table the model rests on, are not modeled explicitly in our framework. Instead, we rely on the observed surface points to model these additional constraints.
This leaves us with the following unknown or unreliable quantities: , , the displacements at internal and unobserved vertices, and the forces at surface vertices that are not contact points.
Prior work assumed all forces to be zero, solved for and by only considering the points with known displacements and forces, and used the estimated and to solve for the displacements of unobserved vertices [40]. This approach does not model all physical constraints as forces from the contact with supporting surface are being set to zero. Hence we did not adopt this approach.
We propose an iterative method to find reliable displacements at unobserved vertices and demonstrate in Section 7 that this change leads to an improvement of the tracking results.
The method starts by using the displacements at surface vertices computed using the tracking step as an initial estimate. That is, is computed as the difference between the vertex coordinate of on and its corresponding point on . This estimate is diffused to internal vertices using a thinplate spline (TPS) deformation.
The following description of TPS closely follows the description by Dryden and Mardia [11, Chapter 10.3]. Let denote the matrix of the coordinate vectors of the vertices of , and let denote the matrix of displacement vectors sorted in the same order as in . The TPS deformation is , where is a dimensional vector, is a matrix, is a matrix, and is a dimensional vector with
(5) 
We find by solving the linear system of equations
(6) 
where 1 is a vector containing at each position, and is a matrix containing the vectors as its rows.
We then evaluate at the internal nodes of to obtain an initial estimate for the displacements.
Let denote the vector of length containing all initially estimated displacements. These displacements are used to iteratively update the material parameters using (A1) and the estimated forces using (A2). Finally, the estimated , and along with fixed displacements at observed vertices are used to estimate at unobserved vertices using (A3). Algorithm 1 summarizes this approach.
After the FEM step, the tetrahedral mesh of is obtained by simply updating the vertex positions of using .
In our experiments, we set (since we found this to be sufficient in our experiments).
5.3 Updating the Transformation Parameters
Once the FEM step is completed for frame , it remains to adjust the transformation parameters , and to capture the new deformation. We achieve this by minimizing
(7) 
where is the position of the point corresponding to vertex on the deformed tetrahedral mesh . Note that we only optimize the energy with respect to parameters that influence unobserved vertices of . In our experiments, we set and .
6 Implementation Details
The implementation of the algorithm is in C++ and uses a quasiNewton method [25] for all of the optimization steps. For each optimization step, at most 1000 iterations are used. The tetrahedralization is computed using tetgen (http://tetgen.berlios.de). When tetrahedralizing the model, we find a high quality tetrahedralization by restricting the radiusedge ratio of each tetrahedron to be at most two.
This section discusses implementation details, and in particular the parameter settings used in the experiments. The parameters and used during tracking give the relative weights between the different energy terms, the parameters and control which data points influence the data term, and the parameter influences the neighborhoods considered for the smoothing term. Finally, the parameter controls how many iterations are performed during the FEM estimation.
To make the relative influence of the weights and invariant with respect to scaling, we prescale all of the input models, such that the length of the diagonal of the bounding box of the template model is one. This allows to set most of the parameters to one constant value for all experiments. The weight schedule used for and as well as the choice of has been discussed in Sections 4 and 5. Furthermore, we set .
The parameter is the only parameter that is varied. This parameter gives the smoothing radius with respect to the resolution of the template mesh. It needs to be adjusted depending on the ratio between the mesh resolution and the mesh size. If the mesh resolution (measured as average edge length) is high compared to the size of the model, then can be set relatively low. If the mesh resolution is low compared to the size of the model, then needs to be set to a higher value.
Fig. 3 shows the influence of on the result of tracking scan data of an ear model. The larger the parameter , the more localized shape deformations are penalized by the tracking energy. This has the effect that for small , the template can accurately follow the data at the cost of being influenced by data noise and for large , the template is not significantly affected by noise but cannot follow localized shape deformations. In our experiments, we set for synthetic data, for the ear model, and for the dinosaur model.
7 Evaluation
This section discusses the datasets used in the experiments and shows a synthetic evaluation of the method as well as experiments based on real data. Furthermore, we compare the proposed method to its predecessor [40], denoted by Wuhrer et al. (2012) in the following. More detailed visualizations of some experiments are available in the supplementary material. For all the experiments, the input models are prescaled, such that the length of the bounding box diagonal of the template model is one. This information on the scale of the models serves as reference for the numerical evaluations below.
7.1 Input Data
Synthetic Data. The synthetic datasets (shown in Fig. 4) are created using the bust, hand, and bulldog models from the AIM@Shape repository [1]. We create synthetic deformations of the models by applying different finite element deformations to the models with GetFEM [16]. First, the shapes are deformed using a linear FEM, and second, the shapes are deformed using the incompressible nonlinear Saint VenantKirchhoff model (StVK). The back sides of the deformed models are removed and the remaining front sides (shown in Fig. 4) are used as input to the algorithm. In our simulations, the head of the bust is pushed to the left, the middle finger of the hand is pushed to the left, and the head of the bulldog is pushed to the side. For all deformations, the Lagrange multiplier was set to . Refer to Table 1 for more information on the models and the parameters used to generate these deformations, and to Fig. 5 for the start and end poses of the deformations.
Model  Number  Number  

Vertices  Frames  ()  ()  
Bust  7470  10  
Hand  1257  13  
Bulldog  847  25 
Stereo Data. We acquire the stereo observations of each frame using a commercial machine vision stereo camera (Point Grey Bumblebee 2) with the windowbased matching software of the manufacturer. Typically, we use a matching window of size for subpixel matching with edge filtering based on window in images of size and a disparity range of pixels. We filter the matching result with backandforth verification and a size constraint on the matches in a neighborhood of pixels to give us an incomplete point cloud with mainly reliable matches. We segment the point cloud based on the color image of the reference camera. We adapted the code of Horprasert et al. [18] for background subtraction in our controlled lighting environment, for which we capture a series of background images without the deformable object to be captured. After extracting the foreground object, we estimate normals based on the point cloud in the image domain. We slide a window of pixels over the image and fit a leastsquares plane to the
coordinates using singular value decomposition. If the singular values indicate that the plane approximation is poor or if less than
plus depth values are available in the current window, we discard the current stereo match as an outlier. This gives us a filtered 3D point cloud with normal vectors at every point.Our experiments use two datasets acquired using this setup. The first dataset is a silicon ear used for acupuncture training that is acquired while the helix of the ear is being pushed down. Fig. 6 shows one of the input frames and the template mesh (containing 19993 vertices). The second dataset is a dinosaur plush toy that is acquired in two sequences: first while a flap of the model is being pushed towards the spine and second while the neck of the model is being pushed towards the floor. Fig. 6 shows one of the input frames and the template mesh (containing 25315 vertices). Note that all of these datasets contain noise that is typical for data acquired using stereo algorithms, such as noise along the viewing direction and missing data due to occlusions. Occlusions are especially visible in the areas of the helix of the ear model and the flaps of the dinosaur model. Furthermore, the input data of the dinosaur model contains points located on a tag attached to the tail of the model. This tag is not part of the template model, and handling this discrepancy is challenging.
Range Data. We acquire range data using a Kinect sensor and the point cloud library. With this setup, we acquire a deformation of the dinosaur plush toy that is similar to one of the sequences acquired using the stereo camera. Furthermore, we acquire a deformation where the tail of the dinosaur toy is pushed to the side. The resolution of this type of data is low compared to our stereo data, as shown in Fig. 6.
Force Probe. The position and orientation of the force sensor probe can be tracked by fixing a checkerboard pattern to the probe handle, and by tracking the corners of the checkerboard. The resulting position and orientation of the force sensor can be used to remove the data caused by the sensor from the point cloud. We take advantage of this option for the ear model, where the probe has a similar scale as the smallscale geometry of the helix. For all remaining datasets, the information on the location of the force sensor is not used.
Template Model. For all synthetic datasets, we use the complete undeformed model as template. For the ear and dinosaur models, we generate a template by acquiring the geometry of the model in a rest pose using a 3D scanner. The different views of the scans are merged and processed manually to lead to a watertight mesh.
7.2 Evaluation of Robustness with respect to Noise
Frame  No Noise  Outliers  Gaussian Noise  Resolution  
Input  Output  Input  Output  Input  Output  Input  Output  
1  
4  
7  
10 
Mean Distance  Maximum Distance  

Our first set of experiments aims to show that our approach is robust with respect to noise. To start, consider the synthetic bust data generated with a linear FEM. For this data, the complete ground truth model is known, and can therefore be used to quantitatively evaluate the robustness of our approach with respect to noise. In this experiment, we consider three types of noise. First, we aim to model outliers in a way that simulates the outliers commonly present in stereo data. To model these outliers, we pick a viewpoint for the model. A vertex is perturbed as , where is the unit vector pointing from to the viewpoint and
is a uniformly distributed random number in the range
, andis the resolution of the model. Each vertex of the model is perturbed with probability 1/10. Second, we perturb the vertices of the input data by adding Gaussian noise in the vertices normal direction. The variance of the Gaussian is
of the bounding ball radius of the model. Third, we evaluate the influence of the resolution of the input data on the results by subdividing the input data using one step of Loop subdivision.Ear Sequence (Stereo Data)  

Dinosaur Flap Sequence (Stereo Data)  
Dinosaur Neck Sequence (Stereo Data)  
Dinosaur Neck Sequence (Range Data)  
Fig. 7 shows that the results obtained by our method are not visually affected by noise. As the pervertex distance for a frame can be computed as the distance between the vertex position in the tracking result and its corresponding point in the ground truth model, the error for a frame can then be computed as the average or maximum over all pervertex distances of this frame. The resulting errors are given in Fig. 8. The average distances are small compared to the size of the model and the increase in the distances caused by the presence of outliers, Gaussian noise, and an increase in resolution is insignificant. In our templatebased tracking, distances increase linearly over time.
Second, we qualitatively evaluate the results of tracking noisy stereo and range datasets. Fig. 9 shows the observed sides of four sequences acquired using either a stereo or a range camera. The figure shows the data, the deformed template, and the signed distance between the deformed template and the data. In the visualization of the signed distance, points that do not have a valid nearest neighbor in the data are shown in red. Note that the tracking result captures the overall deformation of the models in spite of large noise, high levels of occlusion, and additional data. Furthermore, it is noteworthy that results of similar quality are obtained for stereo and range data, which have significantly different resolution and noise levels. An additional result for a deformation of the dinosaur model, where the tail is pushed to the side, is shown in the supplementary material.
To quantitatively evaluate how well the shape of the models is preserved during the deformation, we measure the volume and the surface area of the template models and of the deformed template at the last frames of the deformation sequences. Fig. 10 shows that volume and surface area are preserved well throughout the deformation sequences for all four realworld datasets. Since all of these deformations are approximately volume and areapreserving in reality, this result gives supporting evidence that the method performs well.
7.3 Evaluation of FEM Correction
Our second set of experiments shows that the FEM step improves the shape of the unseen side of the model. We show that displacing the unobserved vertices using a linear FEM in every deformation step yields satisfactory results even for synthetic deformations that were generated using a nonlinear FEM and for realworld deformation sequences.
Comparison to Ground Truth Deformations
Bust  Hand  

We consider the synthetic deformation sequences of the bust and hand models generated using linear FEM and the StVK model. For each deformation sequence, tracking is computed using our algorithm. We then evaluate the errors as above. However, instead of considering all vertices of the model for the error computation, we only consider vertices on the unseen side. The bust dataset exhibits a deformation that affects the global shape of the model, while the hand dataset exhibits the strongest deformation locally on one finger. Fig. 11 shows that linear and nonlinear deformations as well as deformations that affect the global shape and deformations that affect the local shape are tracked equally well.
For the models generated using linear FEM, we can also compare the estimated material parameters to the ground truth. In our experiments, the estimated Young’s modulus is stable across the deformation sequence and can be obtained from the ground truth by a scaling. This is to be expected, as we normalize the lengths of the force directions. The estimated Poisson ratio varies across the input frames and on average, the true Poisson ratio is underestimated by approximately 0.25.
Comparison to Global Linear FEM
We compare the result of our method to a deformation computed by a global linear FEM. This experiment uses the synthetic bulldog data and is shown in Fig. 12. The figure shows the result of applying a linear FEM deformation to the bulldog model and our result. The global linear FEM aims to linearize the global deformation that is observed when applying a nonlinear FEM to the model. Observe that applying a linear FEM leads to unrealistic artifacts at the right front leg and the head of the bulldog because the applied force causes a rotation. Recall that our method uses a linear FEM to predict the unobserved side of the model. However, since our method linearizes the deformation locally at each frame, no unrealistic artifacts occur. Furthermore, our approach is significantly closer to the ground truth than using a global linear FEM in terms of mesh volume and area.
Results for end position 
Volume 
7.4 Comparison to SurfaceBased Deformation
We compare the results of our method to the results of only using the surfacebased template deformation method outlined in Section 4. Choosing this surfacebased deformation technique results in deformations that predict the unobserved side using a term that aims to preserve a smooth deformation field. Comparing to this technique directly evaluates the influence of the FEM step on the result. In the following, we refer to the surfacebased template deformation as our method without FEM.
We evaluate the influence of the FEM correction for tracking noisy scans. Consider the neck sequence acquired using the stereo setup. Fig. 13 shows the template (yellow) and the results at the end of the sequence with (green) and without (blue) the FEM correction. Note how the surfacebased deformation finds a solution that deforms the template smoothly, which leads to a translation of the leg rather than a bending. Furthermore, the tail is merely translated upwards. Using our method, the legs slide and bend realistically, and the model’s tail lifts up, as in reality.
Frame  Input  Wuhrer et al.  Ours  Mean Distance  Max. Distance 
(2012)  
1  
4  
7  
10 
Fig. 1 (third from left) shows that the FEM correction also leads to a more physically plausible result for the ear model acquired using the stereo setup. In this case, using the FEM correction prevents a fattening of the object, which is mostly visible at the base and the helix.
7.5 Comparison to Wuhrer et al. (2012)
Finally, we compare the proposed method to its predecessor [40] and demonstrate that the proposed changes yield a significant improvement in the performance of the algorithm. To start, consider the synthetic bust model generated with linear FEM. We track this sequence using both the method by Wuhrer et al. (2012) and our method, and measure the mean and maximum distances over all vertices to the ground truth. Fig. 14 shows the tracking results and the measured distances. Note that the mean and maximum distances of our method are less than a third of the corresponding distances of the method by Wuhrer et al. (2012). As can be seen in the left of the figure, this improvement is obtained because our method tracks the rotation of the head better than the method by Wuhrer et al. (2012) and because the back side of the bust is not flattened by our method.
The improvement of our method over the method by Wuhrer et al. (2012) is especially noticeable for tracking noisy scan data. Fig. 1 shows a comparison of the result obtained using the method by Wuhrer et al. (2012) (blue) to our result (green) for the last frame of the ear dataset acquired using a stereo setup. Note that while the method Wuhrer et al. (2012) does not track the local deformation of the helix, our method tracks the data correctly without resulting in a noisy output. Fig. 15 shows the results of the two methods for different frames of datasets acquired using stereo or range cameras. Note that our method results in less noise and selfintersections of the model (especially visible in the areas of the flaps of the dinosaur models), while at the same time yielding a higher data accuracy. This significant improvement is a consequence of the improvements of both the tracking step and the FEM prediction.
Ear Sequence (Stereo Data)  

Input  Wuhrer et al. (2012)  Our Method  
Dinosaur Flap Sequence (Stereo Data)  
Input  Wuhrer et al. (2012)  Our Method  
Dinosaur Neck Sequence (Stereo Data)  
Input  Wuhrer et al. (2012)  Our Method  
Dinosaur Neck Sequence (Range Data)  
Input  Wuhrer et al. (2012)  Our Method  
7.6 Limitations
The proposed method is currently designed to deform models of homogeneous isotropic material that are deformed by applying external forces on a small number of points. Modeling complex force fields or force fields acting on heterogeneous material, such as deformations of tissue caused by muscle movements, would require a segmentation of the model into regions of homogeneous materials and the input of a full force field. This is too tedious to acquire to be of practical use. However, by combining templatebased tracking with simple physical simulation models, we make a first step in the direction of acquiring the geometry and material properties of an object jointly.
We have demonstrated that our method, which uses a linear FEM model, allows to track both linear and nonlinear material deformations accurately. This flexibility comes at the cost of material parameter estimates that vary over time and are not expected to be physically meaningful. In the future, we plan to explore modeling nonlinear material behaviour explicitly, and to find stable and physically meaningful material parameters for this scenario by considering all available frames for material parameter estimation.
As our approach employs a nonrigid iterative closest point algorithm to fit the template to the data, tracking large deformations may lead to drift. An example of this problem is included in the supplementary material. Furthermore, due to the nonrigid iterative closest point algorithm, our method cannot deform the template accurately if the initial alignment is poor or if there is significant deformation between consecutive frames. Hence, in cases of extreme deformations that are sampled sparsely in time, our tracking may get lost. This is shown in Fig. 16. Here, we simulated the same hand deformation twice; once sampled sparsely in time using 50 frames, and once sampled densely in time using 350 frames. For a particular frame (frame 20 in the sparsely sampled simulation, which corresponds to frame 140 in the densely sampled simulation), the ground truth deformation is shown in yellow, the result for tracking using the sparse sequence is shown in blue, and the result for tracking using the dense sequence is shown in green. Note that by using more frames, the tracking is able to follow the data more closely at the cost of additional drift.
8 Conclusions
We proposed an approach to track the geometry of a surface over time from a set of input point clouds captured from a single viewpoint. We combine the use of a template and the use of a linear finite element method to track the model. By linearizing the deformation at each frame, we show that we can accurately track surfaces that deform in a nonlinear fashion. We demonstrate the robustness of our approach with respect to noise using a synthetic evaluation and using real data captured with a stereo setup and with a depth camera.
We leave the following ideas for future work. The tracking is lost when the distance between consecutive frames is large. This could potentially be addressed by tracking feature points on the model and by using these features to guide the nonrigid motion of the template during tracking. Furthermore, our approach assumes that a template is known a priori. While this assumption is commonly used in 3D tracking approaches, it will be interesting to relax this requirement in the future. One way to relax this requirement would be to assume that the undeformed object is observed from a single moving viewpoint before the deformation, which allows to fuse these views into a template shape automatically.
Acknowledgements
This work has partially been funded by NSERC, Canada, Networks of Centres of Excellence GRAND, Canada, and the Cluster of Excellence on Multimodal Computing and Interaction within the Excellence Initiative of the German Federal Government.
References
 [1] Aim@Shape. http://shapes.aimatshape.net/releases.php.
 [2] Brett Allen, Brian Curless, and Zoran Popović. The space of human body shapes: reconstruction and parameterization from range scans. ACM Transactions on Graphics, 22(3):587–594, 2003. Proceedings of SIGGRAPH.
 [3] Markus Becker and Matthias Teschner. Robust and efficient estimation of elasticity parameters using the linear finite element method. In SimVis, pages 15–28, 2007.
 [4] Bernd Bickel, Moritz Bächer, Miguel Otaduy, Hyunho Richard Lee, Hanspeter Pfister, Markus Gross, and Wojciech Matusik. Design and fabrication of materials with desired deformation behavior. ACM Transactions on Graphics, 29(3), 2010. Proceedings of SIGGRAPH.
 [5] Bernd Bickel, Moritz Bächer, Miguel Otaduy, Wojciech Matusik, Hanspeter Pfister, and Markus Gross. Capture and modeling of nonlinear heterogeneous soft tissue. ACM Transactions on Graphics, 28(3), 2009. Proceedings of SIGGRAPH.
 [6] Morten BroNielsen. Finite element modeling in surgery simulation. Proceedings of the IEEE, 86:490–503, 1998.

[7]
Cedric Cagniart, Edmond Boyer, and Slobodan Ilic.
Freefrom mesh tracking: a patchbased approach.
In
IEEE Conference on Computer Vision and Pattern Recognition
, 2010.  [8] Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. Probabilistic deformable surface tracking from multiple videos. In European Conference on Computer Vision, 2010.
 [9] Jaeil Choi and Andrzej Szymczak. Fitting solid meshes to animated surfaces using linear elasticity. ACM Transactions on Graphics, 28(1):6:1–6:10, 2009.
 [10] Edilson de Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, HansPeter Seidel, and Sebastian Thrun. Performance capture from sparse multiview video. ACM Transactions on Graphics, 27(3):98:1–98:10, 2008. Proceedings of SIGGRAPH.
 [11] Ian Dryden and Kanti Mardia. Statistical Shape Analysis. Wiley, 2002.
 [12] Hani Eskandari, Septimiu Salcudean, Robert Rohling, and Ian Bell. Realtime solution of the finite element inverse problem of viscoelasticity. Inverse Problems, 27(8):085002:1–16, 2011.
 [13] Yasutaka Furukawa and Jean Ponce. Dense 3d motion capture from synchronized video streams. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.
 [14] L. Gao, K. Parker, R. Lerner, and S. Levinson. Imaging of the elastic properties of tissue–a review. Ultrasound in Medicine & Biology, 22:959–977, 1996.
 [15] Michael Garland and Paul S. Heckbert. Surface simplification using quadric error metrics. In International Conference on Computer Graphics and Interactive Techniques, pages 209–216, 1997.
 [16] GetFEM. http://download.gna.org/getfem/html/homepage/.
 [17] Jennifer Hensel, Cynthia Ménard, Peter Chung, Michael Milosevic, Anna Kirilova, Joanne Moseley, Masoom Haider, and Kristy Brock. Development of multiorgan finite elementbased prostate deformation model enabling registration of endorectal coil magnetic resonance imaging for radiotherapy planning. International Journal of Radiation Oncology, Biology and Physics, 68(5):1522–â€“1528, 2007.
 [18] Thanarat Horprasert, David Harwood, and Larry S. Davis. A statistical approach for realtime robust background subtraction and shadow detection. In ICCV FrameRate Workshop, pages 1–19, 1999.
 [19] Jochen Lang and Dinesh Pai. Estimation of elastic constants from 3d rangeflow. In 3D Digital Imaging and Modeling, International Conference on, page 331, 2001.
 [20] HuaiPing Lee, Mark Foskey, Marc Niethammer, Pavel Krajcevski, and Ming Lin. Simulationbased joint estimation of body deformation and elasticity parameters for medical image analysis. IEEE Transactions on Medical Imaging, 31(11):2156–2168, 2012.
 [21] Hao Li, Bart Adams, Leonidas J. Guibas, and Mark Pauly. Robust singleview geometry and motion reconstruction. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia 2009), 28(5), 2009.
 [22] Hao Li, Linjie Luo, Daniel Vlasic, Pieter Peers, Jovan Popović, Mark Pauly, and Szymon Rusinkiewicz. Temporally coherent completion of dynamic shapes. ACM Transactions on Graphics, 31(1):2:1–11, 2012.
 [23] Hao Li, Robert W. Sumner, and Mark Pauly. Global correspondence optimization for nonrigid registration of depth scans. Computer Graphics Forum, 27(5):1421–1430, 2008. Proceedings of SIGGRAPH.
 [24] Miao Liao, Qing Zhang, Huamin Wang, Ruigang Yang, and Minglun Gong. Modeling deformable objects from a single depth camera. In IEEE International Conference on Computer Vision, 2009.
 [25] Dong C. Liu and Jorge Nocedal. On the limited memory method for large scale optimization. Mathematical Programming, 45:503–528, 1989.
 [26] Niloy J. Mitra, Simon Flory, Maks Ovsjanikov, Natasha Gelfand, Leonidas Guibas, and Helmut Pottmann. Dynamic geometry registration. In Symposium on Geometry Processing, 2007.
 [27] Thao Nguyen and Brad Boyce. An inverse finite element method for determining the anisotropic properties of the cornea. Biomechanics and Modeling in Mechanobiology, To appear.
 [28] Tiberiu Popa, Ian SouthDickinson, Derek Bradley, Alla Sheffer, and Wolfgang Heidrich. Globally consistent spacetime reconstruction. Computer Graphics Forum, 29(5):1633–1642, 2010. Proceedings of SGP.
 [29] Szymon Rusinkiewicz and Marc Levoy. Efficient variants of the icp algorithm. In Conference on 3D Digital Imaging and Modeling, pages 145–152, 2001.
 [30] Julia Schnabel, Christine Tanner, Andy CastellanoSmith, Andreas Degenhard, Martin Leach, Rodney Hose, Derek Hill, and David Hawkes. Validation of nonrigid image registration using finiteelement methods: Application to breast MR images. IEEE Transactions on Medical Imaging, 22(2):238–247, 2003.
 [31] Andrei Sharf, Dan Alcantara, Thomas Lewiner, Chen Greif, Alla Sheffer, Nina Amenta, and Daniel CohenOr. Spacetime surface reconstruction using incompressible flow. ACM Transaction on Graphics, 27(5), 2008. Proceedings of Siggraph Asia.
 [32] Cédric Syllebranque and Samuel Boivin. Estimation of mechanical parameters of deformable solids from videos. The Visual Computer, 24:963–972, 2008.
 [33] Gary Tam, ZhiQuan Cheng, YuKun Lai, Frank Langbein, Yonghuai Liu, David Marshall, Ralph Martin, XianFang Sun, and Paul Rosin. Registration of 3d point clouds and meshes: A survey from rigid to nonrigid. IEEE Transactions on Visualization and Computer Graphics, 19:1199â€“–1217, 2013.
 [34] Demitri Terzopoulos. Multilevel reconstruction of visual surfaces: Variational principles and finite element representations. Technical Report AI Memo number 671, MIT, 1982.
 [35] Art Tevs, Alexander Berner, Michael Wand, Ivo Ihrke, Martin Bokeloh, Jens Kerber, and HansPeter Seidel. Animation cartography  intrinsic reconstruction of shape and motion. ACM Transaction on Graphics, 31(2):12:1–15, 2012.
 [36] Tony Tung and Takashi Matsuyama. Dynamic surface matching by geodesic mapping for 3d animation transfer. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1402–1409, 2010.
 [37] Oliver van Kaick, Hao Zhang, Ghassan Hamarneh, and Danial CohenOr. A survey on shape correspondence. In Eurographics Stateoftheart Report, 2010.
 [38] Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popovic. Articulated mesh animation from multiview silhouettes. ACM Transactions on Graphics, 27(3):97:1–10, 2008. Proceedings of SIGGRAPH.
 [39] Michael Wand, Philipp Jenke, Qixing Huang, Martin Bokeloh, Leonidas Guibas, and Andreas Schilling. Reconstruction of deforming geometry from timevarying point clouds. In Symposium on Geometry Processing, 2007.
 [40] Stefanie Wuhrer, Jochen Lang, and Chang Shu. Tracking complete deformable objects with finite elements. In Conference on 3D Imaging Modeling Processing Visualization and Transmission, 2012.
 [41] Qian Zheng, Andrei Sharf, Andrea Tagliasacchi, Baoquan Chen, Hao Zhang, Alla Sheffer, and Daniel CohenOr. Consensus skeleton for nonrigid spacetime registration. Computer Graphcis Forum, 29(2), 2010. Proceedings of Eurographics.
 [42] Yanning Zhu, Timothy J. Hall, and Jingfeng Jiang. A finiteelement approach for youngâ€™s modulus reconstruction. IEEE Transactions on Medical Imaging, 22(7):890–901, 2003.
Comments
There are no comments yet.