Projective Dynamics [bouaziz2014projective] is a popular, robust, and efficient iterative scheme for interactive simulation of models govered by the corotational elasticity constitutive model. Equivalent in principle to a quasi-Newton scheme [Liu2017], Projective Dynamics (PD) often delivers significant advantages against traditional Newton-style implicit schemes, in terms of stability and efficiency. Robust and stable simulation is guaranteed by casting each time step of PD as an optimization problem, in which both of its alternating components (e.g. the “local” and “global” step) is assured to decrease monotonically. Such guarantees do not exist in a traditional Newton scheme in the absence of linesearch failsafes. Efficiency in Projective Dynamics largely stems from the fact that the modified Hessian it uses when viewed as a quasi-Newton scheme is a constant Laplacian-like matrix that can be prefactorized and efficiently solved using forward/backward substitution. This is in contrast to the true Hessian of full-Newton schemes which varies with deformation and can also become indefinite, limiting the available options for high-performance, yet robust solvers.
While Projective Dynamics enjoys these benefits, it is not without limitations and challenges. Some of these are associated with its specific affinity to corotated elasticity (or close variants [Ichim2017]) making it less than ideal to pair with generic material models. But more importantly, the presence of collisions has the tendency to clash with many of the preconditions that contribute to the robustness, efficiency, and favorable convergence of Projective Dynamics. Collision resolution for volumetric elastic objects, especially in the context of implicit or quasistatic simulations, is most frequently handled via the imposition of penalty forces on parts of the mesh that penetrate into prohibited regions [teschner2005collision, Mcadams2011, mitchell2015non]. This response is materialized in the form of short-lived zero-restlength springs that connect points on the model surface found to be colliding with the closest surface point on the “other side” of the collision. As such, the proper treatment of such spring forces would be to incorporate them into the Laplacian-like matrix in the global step of Projective Dynamics. This can be absolutely detrimental to the ability of Projective Dynamics to use a factorization-based direct solver, since the update cost (say, of a Cholesky decomposition) would be prohibitive in an interactive application. Models with hundreds of thousands of elements, which could otherwise be simulated using direct solvers for the global step in interactive rates, would no longer enjoy such performance if the pre-factorization opportunity is compromised.
Preserving the ability to use a direct method for the global step typically comes with some type of compromise. It is possible, for example, to build the matrix of the global step under the premise that all collision sites used in detection (often referred to in the literature as “collision proxies” [Mcadams2011]) are engaged in active collision, while the right-hand side can be built with only active proxies taken into consideration; this option was discussed by the original proposers of Projective Dynamics [bouaziz2014projective]. Although this approach retains stability, it adds unnecessary drag on collision proxies that are not actually colliding, and is problematic for self-collisions when the pattern of interaction between colliding parts of the mesh cannot be statically inferred. Later work [Ichim2016] proposed adding linear equality constraints associated with active collisions to the minimization problem in the global step of PD, and using a Schur complement with respect to the constraint equations to build a smaller dense system, with the dimension of the active constraints. Although this approach is quite flexible, it requires a somewhat expensive update of the Schur complement at each iteration, and is only practical for a relatively small number (at most a few hundred) active collision proxies. Our approach also leverages Schur complements, but in a very different context as we will see.
Proposed method and Scope
In this paper, we propose a new and distinctive approach to reconciling collision processing with the philosophy of Projective Dynamics. Our method safeguards the strong robustness guarantees of PD and its ability to use an accurate, direct solver for the global step,while retaining very attractive performance on models of substantial resolution, but there is a price we consciously have to accept: We commit to an upfront narrowing of our scope of applicability to simulation scenarios that satisfy the following two conditions: (1) We must know in advance which sections of the object’s surface are likely, by-and-large, to ever be engaged in collision. We shall call this the collision-prone region; (2) The simulation nodes that are associated with collision proxies (either by being collision proxies themselves, or embedding them) in the collision-prone region should only be a small fraction of the total nodes in the simulation mesh, e.g. ideally less than of a volumetric mesh with more than 100K vertices as in our examples.
It is not difficult to identify simulation scenarios that satisfy these stipulations – and others that would not. Figure [ illustrates such scenarios featured in our demonstrations. Models of the human face would be a prime candidate, if we accept the modeling hypothesis that collisions will only be handled on the immediate vicinity of the mouth. For reasonably resolved face meshes with several hundred of thousand tetrahedral elements, it is easy to localize the collision-prone region to no more than a few thousand nodes. On the other hand, this assumption would not hold if we intended to collide the face with external objects without restricting where the contact takes place. Body models would also satisfy this stipulation if we only targeted collisions that appear around joints: the elbow, the underarm area, the region behind the knee, etc. Again, considering collisions with external objects, or non-local self-collisions (e.g. hand touching the torso) would break our hypothesis.
If, however, these modeling assumptions do hold true for our simulation task, we are presented with a very clear opportunity for highly-optimized processing and accurate treatment of collisions within Projective Dynamics, while retaining the stability and convergence of direct solvers. Our method can then separate our simulation mesh in collision-safe, and collision-prone regions, and use a partial Cholesky factorization to reduce the computation that needs to occur during the global step into a problem that only involves the collision-prone degrees of freedom. This localized problem is a linear system of equations, using the Schur Complement of the traditional global step Laplacian (with respect to the collision-prone nodes) as its coefficient matrix; the core benefit is that updates to the overall scheme due to activation or deactivation of collision proxies is purely sparse, additive updates to the Schur Complement. Our formulation also affords the opportunity to update the optimal rotations of elements in the collision-prone region at the same time that we repeat collision detection, but without explicitly updating the collision-safe region and at drastically reduced cost. For models with a resolution in the order of half a million tetrahedral elements we can perform accurate penalty-based collision handling at no more than twice (and often much less) the cost of the same model simulated without collisions.
Finally, in delineating our scope, we clarify that our method presumes that using a direct solver as opposed to an iterative scheme for the global step is something the user seeks to preserve. This is often motivated by the accuracy and robustness of a direct solver, and avoiding the need to fine-tune the iterative scheme to the model resolution, stiffness of constraints, or abrupt nature of motion. We should disclose, however, that in our experience for models with significantly lower resolution than what we target (e.g. in the order of 50K-100K elements) or in dynamic simulation aided by inertia, we have found the convergence of iterative methods to be very adequate even with modest iteration count. In such instances, an accelerated iterative solver [Komaritzan2019] could be best suited to solving the global step. In section 5 we comment further on benefits of direct solvers for higher resolution models, such as the ones we target.
2 Related Work
Simulation of deformable bodies using corotated elasticity strikes a good balance between respecting nonlinearity and rotational invariance, while revealing opportunities for interactive simulation. The principle of Corotated Elasticity first materialized in warped stiffness methods [muller2002stable], and later made rotationally invariant [muller2004interactive], and robust to inversion [irving2004invertible] and indefiniteness of the stiffness matrix [teran2005robust]. Analytic second derivatives of the corotated energy allowed improved convergence of Newton Methods [Mcadams2011, chao2010simple] while the derivative singularity of the model around highly compressed configurations was treated with appropriate modifications [stomakhin2012energetically].
Targeting corotated elasticity as a material model, the concept of Projective Dynamics [bouaziz2014projective] has enjoyed significant adoption and evolution. Analyzed as a quasi-Newton scheme [Liu2017] and related to ADMM optimization [Narain2016], it has been used for developing damping models [Jaillet2018], elastic rod simulations [Soler2018], face animation [Ichim2017], motion control using volumetric actuators [Lee2018], skinning simulation [Komaritzan2018, Komaritzan2019] and reduced models [Brandt2018]. The relation between Projective Dynamics and ADMM has also been investigated [Narain2016, overby2017admm], allowing more general constitutive models and constraints to be used, with iterative solvers utilized for the global step, albeit typically demonstrated at more modest resolutions than we use. Chebyshev iteration has also been used to tackle the global step [wang2015chebyshev], allowing efficient GPU implementation, albeit carrying weaker guarantees for robustness relative to direct solvers. Among these approaches, we find that iterative methods based on GPU-acclerated Conjugate Gradients solvers [Komaritzan2018, Komaritzan2019] are the closest in scope to our work; as we discuss in Section 5 such schemes would be preferable for models of more modest resolution than ours (we use about half a million elements), where CG would converge well and not require localization of collisions.
Skinning and collisions
Collision processing for volumetric objects can leverage more flexible, and occasionally more performant techniques than those used for cloth simulation, due to its ability to recover from tangled configurations. Detection responses leveraging implicit geometry representations have seen significant adoption [teschner2005collision, Mcadams2011, mitchell2015non], and typically employ penalty force formulations for collision response. Recent skinning methods that focus on interactive simulation include implicit skinning [vaillant2013implicit], Delta mush [Le2019], methods that exploit the Projective Dynamics concept [Komaritzan2018, Komaritzan2019], Position-Based Dynamics [abu2015position], and subspace deformation [teng2014simulating], often in conjunction with Projective Dynamics [lan2020medial]. Contact and collision for muscle-based skinning simulations have also leveraged volume-preserving fiber primitives [angles2019viper] and simplified yet anatomy-inspired muscle primitives coupled with the Implicit Skinning concept [roussellet2018dynamic]. Finally, using Projective Dynamics in simulations involving frictional contact [ly2020projective] was recently explored.
3 Technical Background
In this paper, we denote variables that represent aggregate quantities (e.g. concatenated lists of a physical property on all nodes, or all elements
) by using boldface type. These aggregate quantities can be either scalar or vector, which are differentiated by an arrow over the variable for vector quantities. For examplemight be the -coordinates of all vertices in a mesh, while might be all 3d vertices of the mesh, consisting of the three components . Subscripts in parenthesis denote iteration numbers, for example might be the 3D values of all the mesh vertices after the second iteration of an algorithm. Matrices are capital Roman letters (non-bolded). Aggregate matrices are boldfaced versions of the single matrix notation.
3.2 Projective Dynamics
We start by reviewing the mathematical formulation for Projective Dynamics [bouaziz2014projective], with the slight modification that we attempt to cast the description slightly more in the language of continuum mechanics (including concepts such as stress and force), instead of the style used by the original authors, which was more attuned to a Computational Geometry and Optimization viewpoint.
When simulating an elastic body using the Finite Element Method, the body is first discretized with a volumetric mesh composed of many discrete elements (tetrahedra in our case). Assuming linear tetrahedral elements [sifakis2012fem], we can compute a deformation gradient, , which is constant in each element and is a linear function of the deformed locations of the mesh vertices. The constitutive model of corotated elasticity defines the energy density as a function of the deformation gradient:
where are the rotational and symmetric components of the deformation gradient given by the polar decomposition , and are the Lamé coefficients. In keeping with the typical mode of use of Projective Dynamics, we omit the term by setting this value to zero. We draw attention to the important detail that is a dependent function of in this formulation. Projective Dynamics suggests an alternative formulation of Equation (1) where is no longer a function of , but rather an independent variable:
The fundamental observation at the core of Projective Dynamics is that the conventional description of the constitutive model’s force density function, , is equal to the minimum over all rotation matrices of the Projective Dynamics energy density :
We transition from the (constant) energy density function across each element to an integrated energy function for the same element by multiplying by the (undeformed) volume of each element:
where we have defined . The overall energy of the entire body (with rotations momentarily regarded as independent variables) is the sum of all elemental energies:
from which we can recover the conventional discrete corotated energy for the entire mesh by minimizing rotations over all elements:
where it is implied (for brevity of notation) that the minimum is taken over an aggregate of matrices that are all rotations (i.e. in SO(3)). We may intuitively interpret the energy as a separate consitutive model from corotational elasticity, where each element’s matrix is no longer functionally tied to its deformation gradient, but is simply an element-specific simulation parameter.
Projective Dynamics is usable both in a quasistatic, as well as an implicit Backward Euler time integration scheme, as both cases are ultimately cast in very similar optimization problems. For simplicity of exposition, in this paper we focus on the quasistatic case, with the understanding that our methodology remains fully applicable in the case where Backward Euler is used. In a quasistatic simulation, the deformable system evolves as to satisfy a force equilibrium condition, or equivalently in pursuit of a minimizer for the energy:
Therefore, the quasistatic evolution can be seen as a minimization of the modified energy jointly over both independent parameters and . Projective Dynamics chooses to conduct this minimization by alternating the following two steps until convergence:
- Local Step
Treat as constant, and minimize over all .
- Global Step
Treat as constant and minimize over all .
The stability property of PD results from the fact that each of the aforementioned minimization steps can be iterated while guaranteeing that the energy will monotonically decrease after the application of each one. The local step of minimizing is actually multiple independent steps of minimizing separately for each element. Because these are independent, the problem is highly parallel. The solution to the minimization of of each element is obtained via the Orthogonal Procrustes Problem and yields the minimizer where is the SVD of . The minimization associated with the global step will be handled by an application of a Newton-Raphson procedure, producing an iterative update sequence where is computed by solving:
Let us examine the components of (5) more closely. On the left hand side, we recognize the second derivative of the reformulated energy from Equation (3). Having treated the rotations as an independent parameter, this energy is a pure quadratic function of positions , thus the Hessian is a constant matrix. When the expression in Equation (3) is interpreted as a modified constitutive model with being an independent parameter, this Hessian would be intuitively associated with the “stiffness matrix” of this material model, which we denote as (with the subscript denoting that this is the “elastic” energy, contrasted to collision-spawned contributions discussed later). Similarly on the right-hand side, we recognize the term as , which are the aggregate elastic forces computed on the mesh nodes from this constitutive model, at position [Sifakis2015]. This allows us to write an equivalent expression:
The stiffness matrix can be computed either in the fashion of the Projective Dynamics formulation [bouaziz2014projective], or by following the Finite Element route which would produce exactly the same result. For the force however, we opt for a computation using the Finite Element paradigm: On each particular element, , we start by calculating the deformation gradient,
, then calculating the first Piola stress tensor,, which in turn is used to calculate the force, [Sifakis2015]. The first Piola stress tensor will be given by which is seemingly the same as that of corotated elasticity [Mcadams2011], with the caveat that is still treated as an independent parameter rather than a function of (the two will have been brought in sync, by virtue of the preceding local step).
We conclude our review of the core Projective Dynamics theory with some implementation-minded observations. The stiffness matrix has been shown [bouaziz2014projective] to be block diagonal (in the sense that it has no cross-terms that straddle different coordinate components among , and ) and also all three diagonal blocks are identical which allows the factorization of just one of them to be reused as to solve Equation (6) with three independent applications of forward/backward substitution for each component of .
In the spirit of prior work [teschner2005collision, Mcadams2011, mitchell2015non], we process collisions by sprinkling a number of points on the surface of our volumetric model that we refer to as collision proxies
. These collision proxies can either be selected among the surface vertices of a conforming volumetric mesh, or simply embedded in the mesh in the sense that each of their locations is barycentrically interpolated from nodes of the containing element. That is, we can represent the location of theproxy point, , as a weighted sum of all mesh vertex locations
Note that this vector can be decomposed into a corresponding equation for each component, :
where is the weight of the vertex for the proxy. We expect that only 4 components of for any given are non-zero, corresponding to the vertices of the tetrahedron containing the proxy. At each time step of the simulation, we will make a check to determine if any of these proxies are located in a prohibited region (e.g., inside a kinematic colliding object). Supposing that proxy is inside a prohibited region, we will use the geometric representation of the obstacle (typically an implicit surface), to project the proxy location to the colliding region’s surface. We label that point on the obstacle surface . We then instantiate a short lived, zero-restlength spring connecting and . These springs will contribute to the energy of the system that we seek to minimize. We can write the energy contribution due to collisions concretely as
where is the stiffness coefficient of the proxy and is the indicator function
We can further decompose this by components:
where the diagonal matrix satisfies and . Let us highlight two subtle but important points about equation (10): First, the only components of the equation that are dependent on are and , with the dependence of the latter being due to proxies being flagged as active or inactive as a function of their placement. Second, the three components (, , and ) are separable and independent, just as we saw with the global step of projective dynamics. These observations suggest we follow the path of Projective Dynamics derivation further. We can write as an energy-minimization problem:
where the “projections” are selected among all collision-free locations in the ambient space, as to minimize this energy. Conceptually, this suggests that by freezing and to specific values (determined by collision detection) as part of the local step, we retain both the stability traits of Projective Dynamics, and the property that this expression becomes a quadratic function. We denote this by and this energy term can be folded into the Newton scheme in Equation (5) in the global step.
4 Proposed Method
We present our method by first partitioning our mesh into a collision-prone and a collision-safe region. We then use a Schur complement method to craft a numerical solution that concentrates on the collision prone region. Finally, we present a nested iteration that can refine the solution in the vicinity of collisions, at low cost.
4.1 Broader context
The power of projective dynamics is largely due to the ability to pre-factorize the system stiffness matrix using a Cholesky Factorization. Once this matrix is factorized, performing the global step of Projective Dynamics only incurs the cost of a single forward and backward substitution. However, as we discussed, when the simulation involves collisions, the system matrix changes at each step, compromising the ability to use a constant, pre-factored matrix. Let us start by taking a closer look at the total energy equation of the global step, which is the sum of the energy due to elastic deformation and the energy due to collisions:
and differentiating again we see the complete left hand side of (6):
Unfortunately, it is the case that our constant matrix used in the global step has been polluted by terms that depend on . There are two straightforward options we can consider: One option is to perform a refactorization of the matrix at each timestep. Given that the matrices we are targeting will have in the order of nodal degrees of freedom, we cannot tolerate the pre-factorization cost at each time step for an interactive application. A second option is to use an iterative approach to solve the system without factorization. In section 5 we discuss when this is appropriate, but also note that convergence may then suffer for high resolution models. Perhaps another option would be to observe that the matrix that depends on is low rank. The rank of this collision matrix is a function of the number of active collisions at any one time. We could contemplate using methods for low-rank updates on factorized matrices. The problem with this is that even our “low rank” matrix has a rank in the hundreds to low-thousands for reasonable simulation scenarios. This would quickly yield an untenable proposition trying to manage such an effort with any low-rank update algorithm.
4.2 Domain partitioning for the Global Step
Motivated by these observations, we craft an approach that leverages our modeling hypotheses, namely:
The fraction of the mesh prone to collision is a small subset () of the simulated model, and
The region where collisions may occur can be known a-priori.
Consider the rank of the components of the global step matrix:
Here, is the number of simulation mesh vertices. The part of the matrix that is changing, however, is a much smaller submatrix. An upper bound on will be the cardinality of the union of vertices of tetrahedral elements that contain a collision proxy. As a practical matter, in our simulation of the face, is the number of degrees of freedom of the tetrahedral elements surrounding the lips, where in the arm model the same area is localized around the inner fold of the elbow joint, as shown in figure 1. In the experimental examples presented in section 5, models typical have in the order of 500K tetrahedra, 100K vertices, of which 1000-3000 vertices fit the criteria of anchoring a tetrahedral element that contains a collision proxy.
Referring to figure 2 as an example, we can partition the full set of vertices, , into two subsets:
where contains the nodes not in the immediate vicinity of collision proxies, and contains the nodes that are in the immediate vicinity of collisions. To further clarify, in figure 2, is the total number of all vertices (), and is the number of vertices in . Then, we re-write the global step equation as follows
in block form:
where and are the force components induced by collisions. Based on the relative sizes of and , we point out that the entire matrix is largely unchanged and remains constant, with only a very small subset of the matrix being dependent on the current vertex locations. Next, we capitalize on this structure and relative sizes by using a Schur complement method.
4.3 Partial Cholesky Schur Complement Factorization
Consider a linear system, , where is symmetric. Partition , , and such that the system can be written in block format:
Suppose that has the Cholesky factorization . Careful multiplication will verify the following factorization of :
where is known as the Schur complement. It is important to realize that the factorization in equation (14) is nothing more than a partial Cholesky factorization, and is typically an intermediate step in the full Cholesky factorization of. The Intel® MKL PARDISO library, which we use, can be invoked as to compute exactly such a factorization, providing the user with the express value of the Schur complement while retaining the partial triangular factors in its internal representation.
We should note that our partitioning of nodes into the subsets and does incur some slight sub-optimality relative to the stock (non-Schur) Cholesky factorization, by constraining the degrees of freedom in to appear strictly before those in . Our experiments however indicate that any deterioration in sparsity of the resulting factors was less than in all of our examples.
4.4 Towards an Accelerated Solution
The Intel® MKL PARDISO sparse factorization library provides a highly optimized CPU-based implementation of the partial factorization shown in equation (14) and discussed above. The user provides as input the symmetric system matrix and the degrees of freedom that are to be maintained after the factorization. The library provides as output the (dense) Schur complement matrix, and maintains the partial triangular factors in internal representation.
Now our system shown in (13), after factorization, becomes:
We can solve this in a series of steps:
Step 1: Solve the lower triangular system:
Being a lower triangular matrix, this can be solved quickly by forward substitution on the CPU using Intel® MKL PARDISO optimized forward substitution algorithm (PARDISO “phase 331”).
Step 2: Solve the system:
It is trivial to see that and that . We will momentarily defer discussion of how to solve this until completing the description of the full solution.
Step 3: Finally, Solve the system:
Being an upper triangular matrix, this can be solved quickly by backward substitution on the CPU using MKL PARDISO optimized backward substitution algorithm (“phase 333”).
At first glance, a concern might be that since the overall matrix in our application is changing with each time step due to collisions, that we may still have to recompute this partial factorization at each timestep to produce a new . Fortunately this is not the case. To see this, refer to (12) and consider the Schur complement of the matrix without any collision forces. Without collision forces, the Schur complement of the block matrix would be
In the presence of collisions, the term has been replaced by . Because only appears on the right hand side of (19) as a lone term, we can see that in the presence of collisions, we can simply add the term to yield:
This means that we can compute the partial factorization only once in the absence of collisions and retain a matrix, to which we can add at each time step. This addition is performed as a purely additive update to the Schur Complement, and is very efficient. Now knowing that it is easy to produce the correct matrix at each timestep, we return to describing an efficient solution to . Recalling that the expected size of is approximately 1000-3000 degrees of freedom, we are presented with a slightly surprising opportunity that one might otherwise overlook. Although for the global, sparse matrix it is not practical to repeat a factorization every time its entries change, for the local, dense matrix the refactorization is a perfectly realistic and efficient option.
To support this point, let us explore the cost of factorizing and directly solving . For purposes of illustration, we will consider a typical in our simulation having degrees of freedom. (dimension ). A full Cholesky factorization requires floating point operations (FLOPS), or in our case will be billion floating point operations (GFLOPS). Modern workstation-grade CPUs are capable of + trillion floating point operations (TFLOPS) per second, and modern high end GPUs are capable of approximately TFLOPS. This suggests we have the ability to factorize such a system in a budget of low number of milliseconds; such opportunity is not afforded to sparse matrices, as their processing is often bound by memory bandwidth. For such sizes of dense matrices however, the cubic computational complexity is well counterbalanced by the (typical 100:1) ratio of possible arithmetic computations per memory access on CPUs or GPUs.
With this “order of magnitude” calculus suggesting that we have a promising approach to achieving the desired performance, we describe our implementation of Step 2. We solve this system on the GPU. At the start of simulation, we move the matrix to the GPU memory. At each timestep, we move the matrix and the vector to the GPU. In case of self-collisions, we may need to also transmit to the GPU the embedding weight matrix . Recall that is a diagonal matrix, so the bandwidth per timestep is small, and similarly is sparse. We then use highly efficient stock algorithms available in NVIDIA cuSPARSE and cuSOLVER libraries to a) perform the rank- update on the pre-calculated Schur complement matrix (using cusparseScsrgemm2), b) refactorize it on the fly (with cusolverDnSpotrf), and c) perform the forward and backward substitution to provide the solution (cusolverDnSpotrs), , which we stream back to main memory and proceed with Step 3 to complete the solution steps for the current time step.
4.5 Further Optimization of the Solution
A frequent observation in simulation of volumetric objects with collisions is that the non-linear, highly volatile penalty terms are the main contributor to the need for a large number of iterations at each time step to achieve convergence. In particular, it is the changing nature of collision proxies alternating between being active and inactive during iteration. Although all of these volatile behaviors are localized, traditionally the cost that we pay is global.
In this section, we take the opportunity to investigate the possibility to completely restrict the iterative computation so that it only takes place in the immediate vicinity of the collision prone region. To begin, refer again to figure 2, observing the two partitions:
The elements are partitioned into black elements, , which are elements that do not contain collision proxies, and blue elements, , which are elements that do contain collision proxies.
Vertices of the collision-prone elements will be labeled the collision-prone set (in blue); their complement will be the collision-safe nodes (in black).
In this partitioning the element set is associated with vertices from both and , while is associated with vertices only from . We also refer back to equation (11) that defines the total energy, recalling that the quasistatic solution can be written as a minimization problem under the context of Projective Dynamics (as in (4)):
The final line above separates the equation into contributions from and , being careful to note that the first () term is a function of and while the second () term is a function only of .
To see how we can solve the global step in a nested iteration, first assume that the and have been fixed by the local step. We can then rewrite (22) with fixed rotations as a nested minimization:
Our method solves (22) by a nested iteration that leverages the opportunity to do additional processing on the collision affected region () in an inner loop without compromising the the correctness of the solution in the non-collision affected region (). Our approach involves these steps:
- Preamble of outer loop
We optimize only rotations in the collision-safe region, keeping all other variables fixed.
- Inner loop
Treating as fixed, we use the Schur Complement to express the minimum (over ) of as a function, , of only and (the latter being a constant). This is equivalent to the elimination of from the global step via the Schur Complement, as described in the previous section. The resulting energy is only a function of and at this point, and we iterate on it in the style of Projective Dynamics – freezing each of and and optimizing over the other – combined with collision detection and update of proxies at the local step.
- Conclusion of outer loop
We reconstruct the solution for the collision-safe region via backward substitution.
This nested iterative solution procedure is captured in Algorithm 1, where the specific utilization of the MKL PARDISO library is highlighted. From the inner loop, update of the Schur-derived Hessian matrix , its dense factorization, and the solution for the local update , which are are hosted on the GPU. As noted above, we perform steps 4.2 through 4.4 of the algorithm on the GPU. We point out that we make the choice of skipping the calculation of in step 1 and instead doing it inside the inner loop before step 4.1. Updating as part of each inner iteration will improve the rate of convergence, with the minimal extra cost of updating the small number of collision-affected rotations during each inner iteration. In our experience, this extra computation pays off in convergence rates as Figure 4 illustrates.
5 Results and Evaluation
We demonstrate our method on three simulation scenarios, all of which achieve interactive performance with resolutions in the order of half million elements. We also perform a comparison of our direct solver to a GPU optimized iterative scheme [Komaritzan2019].
All our tests were on an Intel Core i9-9940X CPU @ 3.30GHz and a NVIDIA TITAN X GPU. See Figure 3 for detailed benchmark results. In our implementation, all outer loop calculations are run on the CPU including the Projective Dynamics local step, forward substitution and backward substitution. Inside the inner loop, we perform collision detection, update contact constraints, and update the right-hand side of the Schur system on the CPU. The other steps, including rank- update and dense Cholesky factorization are done on the GPU. For the local step on the CPU, we leverage AVX512 SIMD instructions to vectorize our code, including the update of best-fit rotations and computation of elemental forces.
5.1 Test #1: Arm Flexing at Elbow Joint
The first example we examine involves simulating a human arm bending at the elbow joint. In this case, self collisions only occur on the flesh surface on the interior side of the joint. The arm model is embedded in a tetrahedron mesh with 644,486 elements and 111,666 simulation nodes. The potential colliding region is predetermined and marked with collision proxies, which results in 3,933 collision-prone nodes - about 3.5% of the total simulation nodes, as seen in Figure 1 (left). Skeletal bones are attached to the simulation mesh by zero restlength springs uniformly sampled over the bone surface. The scripted motion bends the elbow joint by 2.5 degrees per animation frame, before coming to a stop at a pose that creates significant self-collision. Collision detection and response uses rest-pose levelset representations [Mcadams2011] of the arm to detect interpenetration and instance collision springs
As can be seen in our supplemental video, using a direct solver for the global step allows our solver to produce a smooth and visually converged animation even with a single Projective Dynamics loop per animation frame. We experimented with both 1 inner-loop of localized update of element rotations in the collision-prone region per global PD iteration, and 3 or 5 inner-loops which only modestly adds to the solver cost (most of the added cost comes from the repeated detection step) but significantly improves the convergence in strenuous contact cases, as the test in Figure 4 where the elbow is brought to a sharp angle with collisions disabled, and attempts to disentangle from this state when collision response is again enabled. Even in challenging frames of this animation, we achieve 5fps with 1 inner loop, and 2.5fps with 5 inner loops.
5.2 Test #2: Face Simulation with Self Collision at Lips
The next case we look at, shown in figure 5, is a human face simulation, where self collisions occur purely around the lip region. With 644,486 embedding tetrahedral elements, there are 123,784 simulation nodes, only 1,846 of which are contained in the collision prone region. Here the ratio of collision-prone nodes is only 1.5%. We achieve 5-7fps throughout this animation.
5.3 Test #3: Surgical Cleft Lip and Palate Simulation
Finally, we apply our algorithm in a virtual surgery simulator where a volumetric facial flesh mesh from a patient with cleft lip and palate is discretized into 503,910 tetrahedra and 97,249 simulation nodes. At some point during surgical manipulation, tissue is excised, leaving behind a simulation mesh with 467K tetrahedra and 92K vertices, as reported in Figure 3. In our test, only collisions between the deformable flesh and the rigid teeth and maxilla are processed, as the surgical repair being modeled relies on comprehensive suturing rather than self-collision to create the final repair and closure. With this hypothesis, we mark 1,434 collision prone nodes in the designated area inside of the lip. Screen captures from the interactive simulator are shown in figure 6.
To realistically simulate the elastic response of the simulated model, we note that human skin, and tissue beneath it, presents a unique bi-phasic behavior where a drastic increase of the stiffness of skin and tissue is observed when the stress is beyond a certain threshold, thus preventing further stretching of the tissue beyond a certain extent. To take into account this effect of human skin, we modified the force density function to the following form:
Where is still the deformation gradient, is the
th singular value of, and are the lower and upper limits that we wish to allow our principal strain to assume. Scalar is the increased stiffness when tissue enters the bi-phasic regime. Similar to the PD formulation for corotated elasticity, we have
Again considering all and momentarily as independent variables, the energy over all elements can be modified to :
And the discrete energy is: This energy is fully compatible with the PD paradigm, and the minimization of can be performed by computing SVD of and clamping its singular values to the bounds in the local step.
5.4 Comparison with GPU-optimized iterative solver [Komaritzan2019]
We performed a comparison with iterative solvers that could be natural alternatives to our method, especially for models of more modest resolution and detail. We focused on the GPU-accelerated PCG solver of the recent Fast Projective Skinning technique [Komaritzan2019] as the most promising recent technique in terms of efficiency and features. Although the highest resolution model in their demos (91K tetrahedra) has 6-7 times fewer tets than our target meshes, they demonstrated real-time performance for models of that scale, making it possible that their technique might scale up to the half-million elements we accommodate. Although we did not have an end-to-end comparison due to differences in collision processing components and their use of embedding vs. conforming meshes in our approach, we performed a study of the comparative convergence efficiency of the two methods in the 500K element regime. Our findings are summarized below, and we have included a video with several comparative benchmarks; we should however clarify that if one is targeting resolutions below 100K tetrahedral elements, even though both methods would in principle yield interactive performance, the GPU-based PCG solver would not require a prescription of collision regions, and should be preferred due to its generality.
We focused our comparison purely on the solver stage, excluding any runtime cost of assembling or updating the global step matrix or performing collision detection. We also modified their solver [Komaritzan2019] to include support for embedded collision proxies, which are crucial to our examples. Jacobi preconditioning was used as suggested, although we did not experience any nontrivial acceleration, as our embedding meshes were perfectly regular. As seen in the supplemental video, we noticed that for high-resolution models the PCG solver required at least 100 (or more) iterations to start approaching the accuracy of the exact solver, and often exhibiting artifacts if inadequate convergence was reached in the global step. As an indication, the cost of 100 iterations (which was the bare minimum for acceptable or even stable convergence) in the GPU PCG solver was 27ms for our elbow bending example, and 21ms for our virtual surgery scenario. These times are already 2-4x higher than our inner-loop costs (which produces exact solutions) in instances where multiple inner loops aid convergence, as the elbow simulation. Our method has to sustain the cost of a CPU forward/back-substitution at the beginning/end of each outer PD loop, which takes 70-130ms, but we have experimented with using stock GPU solvers for this stage which typically accelerate this stage by a factor of 3-4x. The direct solver offers the most accurate and robust convergence behavior, does not require parameter tuning to aid convergence (e.g. dynamics leads to much easier matrices for PCG than quasistatics), and is resilient to the stiffness of constraints such as bone attachments and collision penalty forces.
6 Limitations & Future Work
We are naturally susceptible the restrictions of Projective Dynamics with respect to material models (primarily corotated elasticity) that can be accommodated. We did not demonstrate dynamic simulations, but such cases are straightforward extensions of our scheme. Our performance benefits are mostly realized when our target simulations are in the order of half-million elements, as in our demonstrations. Models of one order of magnitude smaller might be able to afford a full re-factorization in each PD step without losing interactivity, or enjoy adequate convergence with an iterative solver. The most fundamental limitation of our work is our conscious assumption that contact regions are relatively small and static. Obviously there are many examples of highly relevant simulations that would not satisfy such preconditions. We look forward to investigating alternative methodologies for such problems, especially at even higher scale and resolution, such as Multigrid technqiues.