1. Introduction
The Material Point method (MPM) is a versatile and highly effective approach for simulating widely varying material behaviors ranging from stiff elastodynamics to viscous flows (e.g. Figures 11 and 13) in a common framework. As such MPM offers the promise of a single unified, consistent and predictive solver for simulating continuum dynamics across diverse and potentially heterogenous materials. However, to reach this promise, significant hurdles remain. Most significantly, obtaining accurate, consistent and robust solutions within a practical time budget is severely challenged by small timestep restrictions. This is most evidenced as we vary material properties, amounts of deformation and/or simulate heterogenous systems (see Table 1).
While MPM’s Eulerian grid resolution limits time step sizes to the CFL limit^{2}^{2}2A particle cannot travel more than one grid cell per time step while, in practice, a CFL number of is often used [gast:2015:tvcg]. [fang2018temporally], the explicit time integration methods commonly employed for MPM often require much smaller time steps. In particular, the stable timestep sizes of explicit MPM time integration methods remain several orders of magnitude below the CFL limit when simulating stiff materials like metal (see Table 1) and snow [stomakhin:2013:snow; fang2018temporally]. A natural solution then is to apply implicit numerical time integration methods, i.e. implicit Euler, which can enable larger stable time step sizes for MPM [gast:2015:tvcg; Fang:2019:ViscousMPM]. However, doing so requires solving challenging and potentially expensive nonlinear systems at every timestep.
1.1. Challenges to implicit MPM timestepping
While implicit MPM time timestepping methods in engineering provide larger step sizes, they still remain limited to impractically small step sizes, i.e., around s even for a small scale system with grids in 2D [nair2012implicit]. This is because in engineering, standard Newton method is applied without any stabilization strategy like line search or trust region to solve the nonlinear time stepping problem [nair2012implicit; charlton2017igimp]; with larger time step size, the nonlinearity grows and the 1storder Taylor expansion becomes less accurate, which can then make Newton method unstable and even explode. More recently stateoftheart implicit MPM methods in graphics have been introduced that enable time steps closer to the CFL limit. Gast and colleagues [gast:2015:tvcg] introduce a globalized NewtonKrylov method for MPM while Fang et al. [Fang:2019:ViscousMPM] extend ADMM to solve implicit MPM timesteps. However, their convergence and performance are limited for simulations involving heterogeneous and/or stiff materials, leading to slow computations and inconsistent, unpredictable and even unstable results. While ADMM [Fang:2019:ViscousMPM] for MPM is attractively efficient, the underlying ADMM algorithm has no guarantee of convergence for nonlinear continua problems. In practice it can at best achieve linear convergence. As we show in Section 7, when able to converge the ADMM solver is thus exceedingly slow to reach reasonable solutions.
On the other hand inexact NewtonKrylov methods exemplified by Gast et al. [gast:2015:tvcg] are seemingly ideal for solving implicit MPM problems where the sparsity structure of the Hessian can change at every timestep. Key to the efficiency and stability of these methods are the inexact iterative linear solve of each inner Newton iterate. In turn this requires setting a tolerance to terminate each such inner loop. However, no single tolerance setting works across examples. Instead, working tolerances, as we show in Section 7, can and will vary over many orders of magnitude per example and so must be experimentally determined as we change setups over many expensive, successive simulation trials. Otherwise, as we show in our Section 7 and our supplemental, a tolerance suitable for one simulated scene will generate extremely slow solves, nonphysical artifacts and even explosions in other simulations.
Next, we observe that for each largescale linear system solve in the inner loop of a NewtonKrylov method, classic Jacobi and GaussSeidel preconditioned CG solvers lose significant efficiency from slowed convergence as problem stiffnesses increase (Table 3). For such cases multigrid preconditioners [wang:2018:multigridcloth; Tamstorf:2015:multigridcloth; zhu2010efficient] are often effective solutions as the underlying hierarchy allows aggregation of multiple approximations of the system matrix inverse across a range of resolutions. This accelerates information propagation across the simulation domain, improving convergence.
However, hmultigrid methods for MPM have not previously been proposed in either the graphics or engineering and building a multigrid hierarchy for MPM is challenging. As we discuss below in Section 7.3, that while the direct application of multigrid by coarsening system DOFs on grid nodes improves convergence of inner linear solves, this seemingly reasonable hierarchy its computational overhead does not reduce the overall cost of MPM simulations. This is because construction and evaluation of system matrices in each coarser level can be as expensive as the fine level computation while DOFs art each level may not be matched – especially at the domain boundaries. Alternately we could try merging particles level by level to reduce matrix construction costs. However, merging lacks both error bounds and can worsen DOF mismatch.
1.2. Hierarchical Optimization Time Integration
We propose the HOT algorithm to address these existing limitations and so provide “out of the box” efficient MPM simulation. To enable consistent, automatic termination of both outer Newton iterations and inner inexact linear solves across simulation types we extend the characteristic norm [Li:2019:DOT; bcqn18] to inhomogenous MPM materials. As we show in Section 5.2) and Table 1 in our supplemental, this produces consistent, automatic, highquality results for inexact NewtonKrylov simulations that match the quality and timing of the best handtuned results of Gast et al. [gast:2015:tvcg].
To obtain both improved convergence and performance for MPM systems with multigrid we develop a new, MPMcustomized hierarchy. We begin by embedding progressively finer level grid nodes into coarser level nodes with the MPM kernel. We then construct coarse level matrices directly from their immediate finer level matrix entries. This avoids computation and storage of each coarse level’s geometric information, automatically handles boundaries and enables flexible control of sparsity via choice of MPM embedding kernel. As we will see this resulting multigrid then retains improved convergence and gains significantly improved performance; see Figure 18.
While offering a significant gain, our MPMcustomized multigrid still requires explicit matrix construction. In some elastodynamic applications, matrix construction costs are alleviated by applying only a fixed number of Newton iterations irrespective of convergence. This sacrifices consistency. For example producing artificially softened materials and numerically damping dynamics so that artists can not control output. This also generates inaccurate results so that output can not be used for applications in engineering, experiment and design. Following recent developments in meshbased elasticity decomposition methods [Li:2019:DOT] we instead observe that our hierarchy can be constructed once per timestep, and then applied as an efficient secondorder initializer, in our case with one Vcycle per iteration, inside a quasiNewton solve.
1.3. Contributions
HOT’s inner multigrid provides efficient secondorder information, while its outer quasiNewton lowrank updates provide efficient curvature updates. This enables HOT to maintain consistent, robust output with a significant speedup in performance – even as we grow stiffness, increase deformation and widely vary materials across the simulation domain. These, as we show in Section 7, are the regimes where traditional NewtonKrylov methods suffer from widely varying and often degraded performance.
We note that applying LBFGS without HOT, either with a single level Hessian inverse approximation, or with the direct, baseline multigrid results in much worse performance (see Table 2 in our supplemental). Thus as we demonstrate in our ablation study in Section 7, it is HOT’s combined application of nodeembedding multigrid, automatic termination, and customized integration of multigrid Vcycle into the quasiNewton loop that jointly provide its significant and consistent performance gains. Thus, in summary, HOT’s contributions are

We derive a novel MPMspecific multigrid model exploiting the regularity of the background grid and constructing a Galerkin coarsening operator consistent with rediscretization via particle quadrature. To our knowledge, this is the first time hmultigrid is applied successfully for the MPM discretization of nonlinear elasticity.

We develop a new, nodewise Characteristic Norm [Li:2019:DOT; bcqn18] (CN) measure for MPM. Nodewise CN enables unified tolerancing across varying simulation resolutions, material parameters and heterogenous systems for both termination of inner solves in inexact Newton and convergence determination across methods. CN likewise ensures a fair comparison across all solvers in our experiments.

We construct HOT – an outofthebox implicit MPM time integrator by employing our multigrid as an efficient inner initializer inside a performant quasiNewton MPM time step solve. A carefully designed set of algorithmic choices customized for MPM then achieve both efficiency and accuracy that we demonstrate on a diverse range of numerically challenging simulations.

We perform and analyze extensive benchmark studies on challenging, industrial scale simulations to determine these best data structure and algorithmic choices for MPM numerical time integration. Across these simulation examples, we compare HOT against a wide range of alternative, seemingly reasonable algorithmic choices to demonstrate their pitfalls and the carefully designed advantages of HOT.
Across a wide range of challenging elastodynamic and plastic test simulations we show (see Section 7) that HOT without the need of any parameter tuning outperforms existing stateoftheart, heavily optimized implicit MPM codes. All alternative methods either exhibit significantly slower performance or else suffer from large variations across simulated examples. Our study then suggests HOT as robust, unified MPM time integrator with fast convergence and outstanding efficiency across a wide range of possible simulation input.
2. Related Work
2.1. Material Point Method
MPM was introduced by Sulsky et al. [sulsky:1994:historymaterials] as a generalization of FLIP [brackbill1988flip; zhu:2005:sandfluid] to solid mechanics. The accuracy, or the convergence to the analytical solution of MPM was demonstrated computationally and explained theoretically by Steffen et al. [steffen:2008:analysis] with a smooth e.g. quadratic Bspline basis to represent solutions on the grid, which is further verified by Wallstedt [wallstedt:2009:order] with manufactured solutions.
In graphics, MPM has been shown effective for snow [stomakhin:2013:snow], sand [gao2018gpu; yue2018hybrid; daviet:2016:smp; klar:2016:des], foam [yue:2015:mpmplasticity; ram:2015:foam; Fang:2019:ViscousMPM], cloth [jiang:2017:cloth; guo2018material], rod [han2019hybrid; fei2019mmc], sauce [MS:2019], fracture [Wolper:2019:CDMPM; Wretborn:2017:ACP:3170009.3170083; wang2019simulation], multiphase flow [stomakhin:2014:augmentedmpm; andre:2017:wetsand; gao2018animating], and even baking of food [ding2019thermomechanical]. More recently, coupling MPM and rigid bodies was explored in both explicit [hu2018] and implicit [ding2019penalty] settings. While using rigid bodies to approximate stiff materials works sufficiently well in certain animation applications, it is however not a viable option for animating elastoplastic yielding or when accurate mechanical response measures are crucial.
An implicit time integration scheme such as implicit Euler is often the preferred choice for stiff materials and large deformations due to the unacceptable soundspeed CFL restriction [fang2018temporally] in explicit integrators, despite their superior level of parallelization. The first implicit MPM [guilkey:2003:implicitmpm] used Newmark integration and demonstrated that besides stability, the accuracy of the implicit solution was also superior to the explicit MPM when compared to validated finite element solution. Recently, Nair and Roy [nair2012implicit] and Charlton et al. [charlton2017igimp]
further investigated implicit generalized interpolation MPM for hyperelasticity and elastoplasticity respectively. On the other hand, researchers in graphics have explored force linearization
[stomakhin:2013:snow] and an optimizationstabilized NewtonRaphson solver for both implicit Euler [gast:2015:tvcg] and implicit Midpoint [jiang2017angular] to achieve large time step sizes. In practice, to ensure the highly nonlinear systems are solved in an unconditionally stable way, optimizationbased time integration is often applied.2.2. Optimization and nonlinear integrators
Numerical integration of partial differential systems can often be reformulated variationally as an optimization problem. These methods can then often achieve improved accuracy, robustness and performance by taking advantage of optimization methods. In computer graphics, simulation methods are increasingly applying this strategy to simulating both fluid [batty:2007:solidfluid; Weiler2016ud] and solid [gast:2015:tvcg; wang2016descent; bouaziz2014projective; Overby2017ns; Dinev2018pp; Dinev2018zn] dynamics, often enabling large time stepping. For optimizations originating from nonlinear problems, Newtontype methods are generally the standard mechanism, delivering quadratic convergence near solutions. However, when the initial guess is far away from a solution, Newton’s method may fail to provide a reasonable search direction as the Hessian can be indefinite [Li:2019:DOT; liu2017quasi; Smith2018hv]. Teran et al. [teran2005robust] propose a positive definite fix to project the Hessian to a symmetric positive definite form to guarantee a descent direction can be found; we compare with this method and further augment it with a backtracking line search to ensure energy decreases with each Newton iteration. We refer to this method as projected Newton (PN) throughout the paper.
In each PN iteration, a linear system has to be solved. For MPM simulations which often contains a large number of nodes, Krylov iterative linear solvers such as conjugate gradient (CG) are often more favorable than direct factorizations. To improve CG convergence, different preconditioning options exist. We apply the most efficient and straightforward Jacobi (diagonal) preconditioner as our baseline PN to which we refer as PNPCG. To further minimize memory consumption and access cost, all the existing implicit MPM works in graphics apply a matrixfree (MF) version of PNPCG where only matrixvector product is implemented as a function without explicitly constructing the system matrix. Then if a lot of CG iterations are required especially when using large time step sizes and/or stiff materials, matrixfree might not necessarily be a better option than explicitly building the matrix (Section
7). To further accelerate convergence in these challenging settings, more effective but expensive preconditioners such as the multigrid could potentially be applied. However, neither graphics nor engineering literature has explored the adoption of hmultigrid in MPM.2.3. Multigrid methods
Multigrid methods [MGBook] have been widely employed in accelerating existing computational frameworks for solving both solid [McAdams2011kv; zhu2010efficient; wang:2018:multigridcloth; Tamstorf:2015:multigridcloth; xian2019multigrid; tielen2019efficient] and fluid dynamics [fidkowski2005p; mcadams2010parallel; Setaluri:2014:spgrid; Zhang2015lv; Zhang2016hv; Zhang2014io; gao2018animating; Aanjaneya2017mi]. With multilevel structures, information of a particular cell can propagate faster to distant cells, making multigrid methods highly efficient for systems with longrange energy responses, or high stiffnesses. Unlike pmultigrid [fidkowski2005p; tielen2019efficient] that uses higher order shape functions on the same degreeoffreedoms to improve convergence, hmultigrid constructs coarser degreeoffreedoms that potentially has lower computational cost, which is often more favorable but has not been explored in MPM.
Hmultigrid is generally categorized as geometric or algebraic. Unlike algebraic multigrid that relies purely on algebraic operations to define the coarser system matrices and does not utilize any geometric information [stuben2001review], geometric multigrid constructs the coarser level system matrices from coarsened grids or meshes. However, the mismatch at the irregular boundaries due to the geometric coarsening may require specialized treatment to ensure convergence improvement, e.g. extra smoothing at the boundary as in McAdam et al. [mcadams2010parallel]. Instead, Chentanez and Müller [nvidiaMGFlow] take a volume weighted discretization and demonstrate that more robust results can be obtained without extra smoothing at boundaries. On the other hand, Ando et al. [ando2015pressure] derive a multiresolution pressure solver from a variational framework which handles boundaries using signeddistance functions.
On the other hand, galerkin multigrid [strangAndArikka1986] automatically handles the boundary conditions well by correctly passing the boundary conditions between different multigrid levels but do not maintain sparsity. Xian et al. [xian2019multigrid] designed their special galerkin projection criterion based on skinning space coordinates with piecewise constant weights to maintain sparsity, but their projection could potentially lead to singular coarser level matrices. In our work, we derive prolongation and restriction operators and the coarser level matrices all with our node embedding idea, and the resulting formulation turns out to be consistent with Galerkin multigrids. But due to the regularity of the MPM grid, our coarse level matrices are guaranteed to be fullrank and support sparsity control in a flexible way with kernel selection.
As in Ferstl et al. [mcadams2010parallel], McAdams et al. [AMGAdaptgrid], and Zhang et al. [Zhang2016hv], one natural approach to utilize our multigrid model is to then augment the Krylov solver as a preconditioner. As demonstrated in our benchmark experiments, this superficially straightforward utilization does not outperform existing diagonallypreconditioned alternatives (PNPCG) due to repeated expensive reconstruction of the multilevel hierarchy in each Newton step. We instead develop HOT, a highly performant quasiNewton LBFGS solver where the multigrid model is novelly adopted as an efficient inner initializer. HOT achieves significant performance gains and outperforms all alternating methods on a wide range of challenging examples.
2.4. QuasiNewton Methods
QuasiNewton methods e.g. LBFGS have long been applied for simulating elastica [Deuflhard2011pf]. LBFGS can be highly effective for minimizing potentials. However, an especially good choice of initializer is required and makes an enormous difference in convergence and efficiency [nocedal:2006:numerical]. Directly applying a lagged Hessian at the beginning of each time step is of course the most straightforward option which effectively introduces second order information [Brown2013]; unfortunately, it is generally a too costly option with limitations in terms of scalability. Liu et al. [liu2017quasi] propose to instead invert the Laplacian matrix which approximates the restshape Hessian as initializer. This provides better scalability and more efficient evaluations, but convergence speed drops quickly in nonuniform deformation cases [Li:2019:DOT]. Most recently Li et al. [Li:2019:DOT] propose a highly efficient domaindecomposed initializer for meshbased FE that leverage start of time step Hessians — providing both scalability and fast convergence in challenging elastodynamic simulations.
For the MPM setting, where inexact NewtonKrylov methods are generally required given the scale of MPM simulation discretizations, the convergence of iterative linear solvers, rather than direct solvers, become critical for scalability. HOT leverages the start of time step Hessians at the beginning of each incremental potential solve, applying a new, inner hierarchical strategy for LBFGS to build an efficient method that outperforms or closely matches bestperexample prior methods across all tested cases on stateoftheart, heavily optimized implicit MPM codes.
Unlike Wen and Goldfarb [wen2009line] which requires many unintuitive parameter setting to alternate between multigrid and singlelevel smoothing and to convexify the objective function, HOT consistently apply Vcycles on our node embedding multigrid constructed from the projected Hessian [teran2005robust] as inner initializer, without the need of any parameter tuning.
3. Problem Statement and Preliminaries
3.1. Optimizationbased Implicit MPM
MPM assembles a hybrid LagrangianEulerian discretization of dynamics. A background Cartesian grid acts as the computational mesh while material states are tracked on particles.
Notation. In the following we apply subscripts for particles and for grid quantities respectively. We then remove subscripts entirely, as in , to denote vectors constructed by concatenating nodal quantities over all grid nodes. Superscripts , and then distinguish quantities at time steps , and .
Implicit MPM time stepping with implicit Euler from to is then performed by applying the operations:

Particlestogrid (P2G) projection. Particle masses and velocities are transferred to the grid’s nodal masses and velocities by APIC [jiang:2015:apic].

Grid time stepping. Nodal velocity increments are computed by minimizing implicit Euler’s incremental potential as in Eq. (1) and are then applied to update nodal velocities by .

Gridtoparticles (G2P) interpolation. Particle velocities are interpolated from by APIC.

Particle strainstress update. Particle strains (e.g. deformation gradients ) are updated by the velocity gradient via the updated Lagrangian. Where appropriate, inelasticity is likewise enforced through perparticle strain modification [stomakhin:2013:snow; gao2017adaptive].

Particle advection. Particle positions are advected by .
Here we focus on developing an efficient and robust solver for MPM grid time stepping in Step 2. All other steps are standard (ref. [jiang:2016:course]).
Assuming mechanical responses correspond to an MPM nodalposition dependent potential energy , e.g. a hyperelastic energy, Gast et al. [gast:2015:tvcg] observe that minimization of
(1) 
subject to proper boundary conditions is equivalent to solving the MPM implicit Euler update , where is the implicit nodal force. Minimization of a corresponding incremental potential for the meshbased elasticity has been widely explored for stable implicit Euler time stepping [bouaziz2014projective; liu2017quasi; Overby2017ns; Li:2019:DOT]. For MPM, however, a critical difference is that nodal positions are virtually displaced from the Eulerian grid during the implicit solve, and are then reset to an empty Cartesian scratchpad. Significantly, in different time steps, the system matrix can also have different sparsity patterns. This is an essential reason why matrixfree NewtonKrylov methods are generally preferred over direct factorizations.
3.2. Inexact NewtonKrylov Method
To minimize Eqn.(1) for high resolution scenarios where more than 100K degree of freedoms are common, direct solvers can be too slow or even run out of memory altogether for numerical factorization. Instead NewtonKrylov methods are generally obtained for scalability, and further computational savings can be achieved by employing inexact Newton. For example, Gast et al. [gast:2015:tvcg] propose to use L2 norm of incremental potential’s gradient to adaptively terminate Krylov iterations and computational effort in early Newton iterations can be saved by inexactly solving the linear system. However, Gast and colleagues mainly target softer materials while in more general cases objects may have Youngs modulus as large as e.g. for metal wheel as shown in Fig. 11. It becomes even more challenging when materials with widely varying stiffnesses interact with each other. In these cases the inexact Newton method in Gast et al. [gast:2015:tvcg] can simply fail to converge within practical amounts of time in our experiments on the Twist (Fig. 1) and Chain (Fig. 9) examples.
This observation is not new and has motivated researchers to question whether we can apply an early termination to the Newton iterations such that we can still get visually appealing results that are stable and consistent. Li et al. [Li:2019:DOT] extend the characteristic norm (CN) from distortion optimization [bcqn18] to elastodynamics and demonstrate its capability to obtain consistent, relative tolerance settings across a wide set of elastic simulation examples over a range of material moduli and mesh resolutions. However, for a scene with materials with drastically different stiffness parameters, the averaging L2 measure will not suffice to capture the multiscale incremental potential gradient in a balanced manner.
We propose an extended scaledCN in our framework to support common multimaterial applications in MPM. The incremental potential gradients are nonuniformly scaled such that the multiscale residuals can be captured effectively. We apply this new characteristic norm to terminate outer Newton iterations, and also improve the inexact strategy for the inner linear solve iterations with the same idea to create our baseline PN solver. See Algorithm 1 for our inexact Newton and details can be found in Section 5.
With this extended CN and the improved inexact solving strategy, NewtonKrylov methods may still suffer from poor convergence simply because the linear systems could easily become illconditioned when materials with large youngs moduli are involved. This triggers the desire of designing better preconditioning methods such as incomplete Cholesky and multigrid. However, the former one is not a viable option here as elastodynamic system matrices are not Mmatrices [kershaw1978incomplete]. Likewise, we observe that the extra computational cost of the construction of the multigrid hierarchy per outer iteration may not be well compensated by the convergence improvement within our NewtonKrylov framework. We thus propose a novel MPM multigrid scheme in Section 4 and then HOT can be built up by employing this hybrid multigrid as an efficient inner initializer in a highly performant quasiNewton loop as in Section 5.
4. MPMMultigrid
4.1. Motivation
We begin with an level multigrid hierarchy. We denote level and level as the finest and coarsest levels respectively. System matrices are constructed for each single level along with the prolongation and restriction operators between each adjacent levels for exchanging information. Here is the prolongation matrix between level and level , and is the corresponding restriction matrix.
Unlike McAdams et al. [mcadams2010parallel] where the Poisson matrix can be completely derived from the Cartesian grid, the system matrix of MPM also depends on the particle positions and strains because the particles are used as the quadrature points. Thus for constructing the matrices for the coarser levels, traditional geometric multigrid methodology would require the definition of “coarsened” particles. The direct geometric solution is then to always use the actual particles as the quadrature points in all the coarse levels, which is extremely inefficient (Section 7.3). Another straightforward way is to merge nearby particles into “larger” particles as in Gao et al. [gao2017adaptive] and Yue et al. [yue:2015:mpmplasticity]
; however, there only exists adhoc or heuristic methods for interpolating the deformation gradients that the resultant Hessians at coarser levels may fail to correctly encode the original problem and lead to slow convergence. Moreover, this method requires to construct nodes with particlelike attributes at each level introduces computating overhead. To remove these challenges, we propose to construct our hierarchy by embedding finer level grids into the coarser level grids analogously to MPM’s embedding of particles into grid nodes. Then by explicitly storing the system matrix we progressively constructing coarser level matrices, directly from the adjacent finer level matrix entries, avoiding the need to compute or store any coarser level geometric information. We next show that our multigrid is consistent with Galerkin multigrid where boundary conditions are automatically handled and by selecting different node embedding kernels we support flexible control on sparsity.
4.2. Nodeembedding multigrid derivation
We first derive our restriction/prolongation operators. We begin by considering these two operations between levels and . Nodal forces in the finest level are
(2)  
(3) 
Here is the initial particle volume, is the energy density function, is the first PiolaKirchhoff stress, is the deformation gradient and are the weights. Notice that in multigrid hierarchy, the residuals are restricted from finer levels to coarser levels.
Forces at nodes in the next level are then
. Embedding finer level nodes to coarser level nodes, we then can simply apply the chain rule, converting derivatives evaluated at a coarse node to those already available at the finer level:
(4)  
(5) 
This gives our restriction operation as with .
Prolongation is given by transpose . Recalling that MPM particle velocities are interpolated from grid node velocities as , so that
(6) 
or
For coarse matrices, we similarly can verify by computing the secondorder derivative of Eq. (1) w.r.t. . Applying chain rule to take as intermediate variable we then obtain
(7) 
where is the Hessian of Eq. (1) w.r.t. . This gives
(8) 
consistent with Galerkin multigrid. Since Dirichlet boundary conditions can be achieved by projecting the corresponding rows and columns of the system matrix and the entries of the righthandside, Galerkin multigrid automatically transfers boundary conditions from finer levels to coarser levels during the projection algebraically. This will often result in the blending of the boundary DOF’s and nonboundary DOF’s on the coarser level grid nodes, which is more consistent and thus effective compared to purely geometric multigrids.
Here our node embedding kernel can simply be MPM kernels such as linear or quadratic Bsplines. Different choice of kernels can result in different sparsity pattern of the coarser matrices, which also leads to different convergence and performance (see the supplemental document Table 2). But because of the locality of Bsplines, the sparsity patterns can be effectively controlled. Interestingly, by a careful selection of the kernel, our nodeembedding multigrid can be mathematically equivalent to the particle quadrature geometric multigrid.
4.3. Geometric multigrid perspective
Now applying the standard MPM derivation of the nodal force in level :
(9) 
and rewriting Eq. (5) as
(10) 
our multigrid then has a simple geometric interpretation as illustrated in Fig. 6. The particlegrid weight gradient between level and the original particles is substituted by . In other words, as a geometric multigrid, we can now define a new weighting function that bridges the particles and the coarse grid nodes such that the coarse grid can be generated by traversing all particles to find occupied coarse nodes. Similarly, a concatenation of prolongation operators for each coarse level, right multiplied by the original weight gradient, gives us the new weight gradients required in each level. With this weight gradient, the Hessian matrix can be easily defined to complete the multigrid model. We use the corresponding weighting function to plot the curves in Fig. 7.
Sparsity pattern in coarse levels.
As discussed in Zhang et al. [Zhang2016hv], in fluid simulation, when the linear kernel relating fine and coarse cells is adopted for defining and standard Galerkin coarsening is applied, the sparsity of the multigrid levels become increasingy dense with coarsening and eventually the system matrix can even possibly end up being effectively a dense matrix. In contrast, in the MPM framework, with the Bspline quadratic function being the particlegrid weighting function in level , the sparsity pattern of coarse levels actually becomes increasingly sparser with the standard linear kernel for prolongation and restriction. As shown on the left of Fig. 7, the kernel width reduces from to as the level increases. Another option is to replace the linear kernel by the Bspline quadratic weighting to define prolongation and restriction operations. However, in this case the matrices would become increasingly denser in coarser levels (c.f. the right half of Fig. 7), making it computationally less attractive (see the supplemental document Table 2 for the comparison). Thus our MPM multigrid assumes that Bspline quadratic weighting is adopted for particlegrid transfers in finest level and the linear kernel is used for defining prolongation and restriction operators between adjacent levels in the hierarchy.
In Fig. 8, we visualize the matrix sparsity pattern for the ArmaCat example (see Fig. 3); the number of nonzero entries per row in our matrices decrease in our MPM multigrid as the grid coarsens. In contrast, the rightmost visualization shows that when the direct geometric multigrid is used, i.e. when the finest particle set are used to construct the matrix of level , the Hessian matrix is much denser.
5. Hierarchical Optimization Time Integration
Our newly constructed MPM multigrid structure can be used as a preconditioner by applying one Vcycle (Algorithm 2) per iteration for a conjugate gradient solver to achieve superior convergence in a positivedefinite fixed, inexact Newton method. In the following we denote this approach as “projected Newton, multigrid preconditioned conjugate gradient” or PNMGPCG. However, in practice, the cost of reconstructing the multigrid hierarchy at each Newton iteration of PNMGPCG is not wellcompensated by the convergence improvement, providing only little or moderate speedup compared to a baseline projected Newton PCG solver (PNPCG) (see Figs. 16 and 19) where a simple diagonal preconditioner is applied to CG. This is because in PNMGPCG, each outer iteration (PN) requires reconstructing the multigrid matrices, and each inner iteration (CG) performs one Vcycle. One reconstruction of the multigrid matrices would take around the time for one Vcycle and over the time for one Jacobi preconditioned PCG iteration. Unlike the Poisson system in Eulerian fluids simulation, the stiffnesses of elastodynamic systems are often not predictable as it varies a lot under different time step sizes, deformation, and dynamics. Therefore, it is hard for PNMGPCG to consistently well accelerate performance in all time steps.
5.1. Multigrid initialized quasiNewton method
Rather than applying MPM multigrid as a preconditioner for a Krylov method (which can still both be slow and increasingly expensive as we grow stiffness; see Fig. 19) we apply our MPM multigrid as an inner initializer for a modified LBFGS. In the resulting hierarchical method, multigrid then provides efficient secondorder information, while our outer quasiNewton lowrank updates [Li:2019:DOT] provide efficient curvature updates to maintain consistent performance for time steps with widely varying stiffness, deformations and conditions. In turn, following recent developments we choose a start of time step lagged model update [Brown2013; Li:2019:DOT]. We reconstruct our multgrid structure once at the start of each time step solve. This enables local second order information to efficiently bootstrap curvature updates from the successive lightweight, lowrank quasiNewton iterations.
This completes the core specifications of our Hierarchical Optimization Time (HOT) integrator algorithm. HOT multigrid hierarchy is constructed at the beginning of each time step. Then, for each LBFGS iteration, the multiplication of our initial Hessian inverse approximation to the vector is applied by our multigrid Vcycle. To ensure the symmetric positive definiteness of the Vcycle operator, we apply colored symmetric GaussSeidel as the smoother for finer levels and employ Jacobi preconditioned CG solves for our coarsest level system (see Algorithm 2). Here, PCG is applied for the coarsest level rather than a direct solver because setting up a direct solver requires additional computations (e.g. factorization) which could not be compensated by the subtle convergence improvement (Section 7.2
). Weighted Jacobi as a widelyused smoother for multigrid preconditioner in Eulerian fluids simulations loses its advantages on simplicity and performance here due to the difficulty in determining a proper weight that guarantees efficiency or even convergence for the nondiagonally dominant elastodynamic Hessian. Similarly, Chebyshev smoother is abandoned because estimating both upper and lower eigenvalues of the system matrix introduce large overhead.
In HOT, curvature information is updated by lowrank secant updates with window size , producing a new descent direction for line search at each LBFGS iteration. Pseudocode for the HOT method is presented in Algorithm 3 and we analyze its performance, consistency and robustness in Sec. 7.2 with comparisons to stateoftheart MPM solvers.
5.2. Convergence tolerance
To reliably obtain consistent results across heterogenous simulations while saving computational effort, we extend the Characteristic Norm (CN) [bcqn18; Li:2019:DOT] from FE mesh to MPM discretization, taking multimaterial domains into account. To simulate multiple materials with significantly varying material properties, coupled in a single simulated system, the traditional averaging L2 measure fails to characterize the multiscale incremental potential’s gradient.
For MPM we thus first derive a nodewise CN in MPM discretization, and then set tolerances with the L2 measure of the nodewise CN scaled incremental potential gradient. See our Appendix for details.
5.3. Inexact linear solves
Above we provide a consistent termination criterion for terminating outer nonlinear solve iterates. Within each outer nonlinear iteration, inner iterations are performed to solve the corresponding linear system to a certain degree of accuracy. For inexact NewtonKrylov solver, in the first few Newton iterations where the initial nonlinear residuals are generally far from convergence, inexact linear solves are preferred. As discussed in [gast:2015:tvcg], inexact linear solves can significantly relieve the computational burden in large time step simulations. Thus they set relative tolerance on the energy norm ( is the preconditioning matrix) of the initial residual for each linear solve in PN iteration as where is their nonlinear tolerance, and perform CG iterations until the current is smaller than .
However, this barely works for heterogeneous materials in two aspects. First, the L2 norm of the incremental potential does not take into account its multiscale nature, potentially providing too small relative tolerances for stiff materials. Second, as discussed earlier, the nonlinear tolerance in Gast et al. [gast:2015:tvcg] is hard to tune especially when stiff materials are present, thus again can make the scale of relative tolerances off. Therefore, we modify this inexact strategy for our baseline PNPCG by using as the relative tolerance to terminate CG iterations. Here the preconditioning matrix in the energy norm has the effect of scaling pernode residual in awareness of the different material stiffnesses. is simply Li et al.’s [Li:2019:DOT] tolerance on L2 measure characterizing the most stiff material in the running scene, ensuring that the tolerance would not be too small for stiff materials.
HOT similarly exploits our inexact solving criterion for the coarsest level PCG during early LBFGS iterations. Specifically, in each Vcycle, we recursively restrict the right hand side vector to the coarsest level to obtain . We then set the tolerance for the CG solver to be where is the diagonal matrix extracted from the system matrix at level . Note that the the same Vcycle and termination criterion is also adopted in our PNMGPCG. As LBFGS iterations proceed, the norm of decreases, leading to increasingly accurate solves at the coarsest level. As demonstrated in Sec. 7, this reduces computational effort — especially when the system matrices at the coarsest level are not well conditioned.
6. Implementation
Accompanying the submission, we submit all of our code including scripts for rerunning all examples using HOT and all the methods we are comparing against (except for ADMM MPM [Fang:2019:ViscousMPM]
, which has been opensourced separately by its authors). We will release the code as an opensource library with the publication of this work.
Here we provide remarks to the nontrivial implementation details that can significantly influence performance.
Lockfree multithreading.
For all particletogrid transfer operations (including each matrixvector multiplication in PNPCG(MF)), we adopt the highlyoptimized lockfree multithreading from Fang et al. [fang2018temporally]. This also enables the parallelization of our colored symmetric Gauss Seidel smoother for the multigrid Vcycle. All optimizations are thus consistently utilized (wherever applicable) across all compared methods so that our timing comparisons more reliably reflect algorithmic advantages of HOT.
Sparse matrix storage.
We apply the quadratic Bspline weighting kernel for particle finegrid transfers. The number of nonzero entries per row of the system matrix at the finest level can then be up to where d denotes dimension. In more coarsened levels, the number of nonzero entries decreases due to the linear embedding of nodes in our MPM multigrid, as can be seen from Fig. 7. Similarly, for our restriction/prolongation matrix, the number of nonzero entries per row/column is for linear kernel. Notice that in all cases, the maximum number of nonzeros per row can be predetermined, thus we employ diagonal storage format in our implementation to store all three matrix types for accelerating matrix computations.
Multigrid application
In our experiments (Section 7), our MPM multigrid is tested both as a preconditioner for the CG solver in each PNMGPCG outer iteration, and as the inner initializer for each LBFGS iteration of HOT.
Prolongation and restriction operator.
Our prolongation operator is defined similarly to traditional particlegrid transfers in hybrid methods – finer nodes are temporarily regarded as particles in the coarser level. Spatial hashing is applied to record the embedding relation between finer and coarser grid nodes for efficiency.
7. Results and Evaluation
7.1. Benchmark
Methods in comparison
We implement a common testharness code to enable the consistent evaluation comparisons between our HOT method and other possible NewtonKrylov methods, i.e. PNPCG, matrix free (MF) PNPCG, and prior stateoftheart (MF) from Gast et al. [gast:2015:tvcg]. To ensure consistency, PNPCG and MF PNPCG adopt our nodewise CN with the same tolerance. For Gast et al. [gast:2015:tvcg] which adopts a global tolerance for the residual norm, we manually pick the largest tolerance value () that produce artifactfree results for all experiments. In addition to Gast et al. [gast:2015:tvcg], we also compare to the ADMMbased stateoftheart implicit MPM method [Fang:2019:ViscousMPM] on the faceless example to demonstrate their difference in the order of convergence (Figure 15). Note that other than Gast et al. [gast:2015:tvcg] and Fang et al. [Fang:2019:ViscousMPM], all other compared methods are ones we apply to MPM for the first time in our ablation study.
To continue our ablation study on how each of our design choices in HOT shapes our performance and convergence, we also compare HOT with other new solvers that one may consider design, including (1) HOTquadratic: HOT’s framework with quadratic (rather than linear) embedding kernel; (2) LBFGSGMG: LBFGS with a more standard geometric multigrid as the initializer; (3) PNMGPCG: A NewtonKrylov solver replacing PNPCG’s Jacobiprecoditioned CG with HOT’s multigridpreconditioned CG; (4) and an MPM extension of the quasiNewton LBFGSH (FEM) from Li et al. [Li:2019:DOT]. Note that unlike in Li et al. [Li:2019:DOT] where the LBFGSH is based on fully factorizing the begining of time step hessian matrix with direct solver, here we only partially invert the hessian by conducting Jacobi preconditioned CG iterations with adaptive terminating criteria identical to that of the coarsestlevel solve in HOT (Section 5.3). In other words, it is an inexact LBFGSH equivalent to a singlelevel HOT. This inexact LBFGSH often leads to better performance than those with direct solvers in largescale problems.
All methods in our ablation study together with Gast et al. [gast:2015:tvcg] are implemented in C++ and consistently optimized (see Section 6). Assembly and evaluations are parallelized with Intel TBB.
Simulation settings
Time step size for all simulations is set to throughout all the simulations. Here is selected to fulfill the CFL condition. We find out throughout our tests that 3level multigrid preconditioner with 1 symmetric GaussSeidel smoothing and inexact Jacobi PCG as the coarsestlevel solver works best for HOT and PNMGPCG. We observe a window size of 8 for LBFGS methods yields favorable overall performance. In our experiments, across a wide range of scenes, delivers consistent visual quality for all examples as we vary materials with widely changing stiffness, shapes and deformations.
Fixed corotated elasticity (FCR) from Stomakhin et al. [stomakhin:2012:invertible] is applied as our elasticity energy in all examples. In addition to our scenes with purely elastic materials: Twist (Fig. 1), ArmaCat (Fig. 3), Chain (Fig. 9), and Faceless (Fig. 10), we also test with plastic models: von Mises for Sauce (Fig. 13) and for metal in Wheel (Fig. 11), the center box in Boxes (Fig. 4), and the bars in Donut (Fig. 12); and granular flow [stomakhin:2013:snow] in Boards (Fig. 5).
We perform and analyze extensive benchmark studies on challenging simulation settings, where the material parameters are all from real world data, and most of them are heterogenous. This not only significantly simplifies the material tuning procedure in animation production, but also helps to achieve realistic physical behavior and intricate coupling between materials with varying stiffness. Fig. 2 demonstrates that simulating a scene with aluminum sheets with inappropriate material parameters could end up getting unexpected behavior. See Table 1 for the physical parameters in daily life and Table 2 for material parameters used in our benchmark examples.
Detailed timing statistics for all examples are assembled in Table 1 and Table 2 from the supplemental document. As discussed in Section 5.2, all evaluated methods in our ablation study are terminated with the same tolerance computed from our extended characteristic norm for fair comparisons across all examples.
7.2. Performance
Example  Particle #  (m)  density (kg/m)  Young’s modulus (Pa)  

(Fig. 1)  Twist  k  /  0.4  
(Fig. 4)  Boxes  k  /  /  0.33  
(Fig. 12)  Donut  k  /  /  0.33  
(Fig. 3)  ArmaCat  k  //  //  0.4/0.47/0.4  
(Fig. 9)  Chain  k  /  /  0.4  
(Fig. 5)  Boards  k    0.33  
(Fig. 11)  Wheel  k  /  /  0.4/0.33  
(Fig. 10)  Faceless  k  0.3  
(Fig. 13)  Sauce  k  0.33 
, where singular values of the deformation gradient are clamped into
.We analyze relative performance of HOT in two sets of comparisons. First, we compare HOT against the existing stateoftheart implicit MPM methods [gast:2015:tvcg; Fang:2019:ViscousMPM]. Second, we perform an extensive ablation study to highlight the effectiveness of each of our design choices.
Comparison to stateoftheart
As discussed earlier, the performance and quality of results from Gast15 [gast:2015:tvcg] are highly dependent, perexample, to choice of tolerance parameter. Across many simulation runs we sought optimal settings for this parameter to efficiently produce visually plausible, highquality outputs. However, we find that suitable nonlinear tolerances vary extensively with simulation conditions that vary materials and boundary conditions. For example, we found an ideal tolerance for the Wheel example (Figure 11) at , while for the Faceless example (Figure 10) we found best. On the other hand applying the tolerance generates instabilities and even explosions for all other examples (see the supplemental document Figure 1), while using tolerance produces extremely slow performance especially for examples containing stiff materials (see the supplemental document Table 1). As for ADMM MPM [Fang:2019:ViscousMPM], as it is a firstorder method we observe slow convergence. Thus we postpone detailed analysis to our convergence discussion below (Section 7.3). In contrast HOT requires no parameter tuning. All results, timings and animations presented here and in the following were generated without parameter tuning using the same input settings to the solver. As demonstrated they efficiently converge to generate consistent visual quality output.
Ablation Study
We start with the homogeneous “faceless” example with a soft elastic material ( Pa); we rotate and raise its head and then release. As shown in Fig. 10, for this scene with moderate system conditioning, HOT already outperforms the two PNPCG methods from our ablation set in almost every frame. Here there is already a nearly 2 average speedup of HOT for the full simulation sequence compared to both the two PNPCG variations; while the overall maximum speedup per frame is around 6.
We then script a set of increasingly challenging stresstest scenarios across a wide range of material properties, object shapes, and resolutions; see, e.g., Figs. 3 and 13 as well as the supplemental video. For each simulation we apply HOT with three levels so that the number of nodes is generally several thousand or smaller at the coarsest level. In Table 1 from the supplemental document we summarize runtime statistics for these examples comparing HOT’s total wall clock speedup for the entire animation sequence, and maximum speedup factor per frame compared to PNPCG, PNPCG(MF), PNMGPCG, and LBFGSH across the full set of these examples.
Timings
Across this benchmark set we observe HOT has the fastest runtimes, for all but two examples (see below for discussion of these), over the best timing for each example across all methods: PNPCG, PNPCG(MF), PNMGPCG, and LBFGSH. Note that these variations for our ablation study are already well exceed the stateoftheart Gast15 in most of the examples. In general HOT ranges from 1.98 to 5.79 faster than PNPCG, from 1.05 to 5.76 faster than PNPCG(MF), from 2.26 to 10.67 faster than PNMGPCG, and from 1.03 to 4.79 faster than LBFGSH on total timing. The exceptions we observe are for the easy Sauce (Young’s Pa) and ArmaCat (Young’s Pa) examples, where materials are already quite soft and the deformation is moderate. In these two simple examples HOT performs closely to the leading LBFGSH method. However, when simulations become challenging we observe that LBFGSH can have trouble converging. This is most evident in the stiff aluminum Wheel example (Fig. 11), where the metal is undergoing severe deformation. Here HOT stays consistently efficient, outperforming all other methods. See our convergence discussion below for more details. Importantly, across examples, we observe that alternate methods PNPCG, PNPCG(MF), PNMGPCG, and LBFGSH swap as fastest per example so that it is never clear which would be best a priori as we change simulation conditions. While in some examples each method can have comparable performance within 2 slower than HOT, they also easily fall well behind to both HOT and other methods in other examples (Fig. 14). In other words, none of these other methods have even empirically consistent good performance across tested examples. The seemingly second best LBFGSH can even fail in some extreme cases. For most of the scenes with heterogenous materials or large deformations, e.g. Twist, Boxes, Donut, and Wheel, which results in more PN iterations, PNPCG is faster than its matrixfree counterpart PNPCG(MF). Among these examples only Boxes and Wheel can be wellaccelerated by using MGPCG for PN.
GaussSeidel preconditioned CG
Here we additionally compare the symmetric GaussSeidel (SGS) and Jacobi preconditioned PNPCG to show that a simple tradeoff between convergence and periteration cost might not easily lead to siginificant performance gain (Table 3). SGS preconditioned PNPCG significantly improves convergence compared to Jacobi preconditioning as one would expected, but due to its more expensive preconditioning computation, the performance is right at the same level and sometimes even worse. This is also why we applied Jacobi preconditioned CG to solve our coarsest level system.
Example  SGS  Jacobi  

avg time  avg CG iter  avg time  avg CG iter  
Twist  492.20  679.91  361.53  1054.16 
Boxes  1368.54  34.13  466.94  248.98 
Donut  410.34  45.59  240.65  375.39 
ArmaCat (soft)  109.08  29.04  111.04  66.43 
ArmaCat (stiff)  148.38  62.63  153.84  157.77 
Chain  398.79  47.27  572.01  92.67 
Boards  371.66  59.17  313.62  206.35 
Wheel  269.08  106.13  206.14  424.13 
Faceless  12.22  16.14  7.21  63.80 
Sauce  22.66  9.98  29.07  27.06 
Changing machines
Across different consumerlevel Intel CPU machines we tested (see the supplemental document Table 1), we see that HOT similarly maintains the fastest runtimes across all machines regardless of available CPU or memory resources, over the best timing for each example between PNPCG, PNPCG(MF), PNMGPCG, and LBFGSH.
7.3. Convergence
HOT balances efficient, hierarchical updates with global curvature information from gradient history. In this section we first compare HOT’s convergence to the stateoftheart ADMM MPM [Fang:2019:ViscousMPM], and then analyze the convergence behavior based on our ablation study. Here we exclude Gast15 as it uses different stopping criteria and, as discussed above, requires intensive parameter tuning.
Comparison to ADMM
Here we compare to the ADMMMPM [Fang:2019:ViscousMPM] on a pure elasticity example faceless (Fig. 10) by importing their opensourced code into our codebase and adopted our nodewise CN based termination criteria (Section 5.2). Despite their advantages on efficiently resolving fullyimplicit viscoelastoplasticity, on this pure elasticity example we observe that as a firstorder method, ADMM converges much slower than all other Newtontype or QuasiNewton methods including HOT (Figure 15). Although the overhead per iteration of ADMM is generally few times smaller, the number of iterations required to reach the requested accuracy is orders of magnitudes more. Nevertheless, ADMMMPM is more likely to robustly generate visually plausible results within first few iterations, while Newtontype or QuasiNewton methods might not.
Ablation Study
In Fig. 16 we compare convergence rates and timings across methods for a single time step of the Wheel example. In terms of convergence rate, we see in the top plot, that PNMGPCG obtains the best convergence, while HOT, PNPCG and PNPCG(MF) converge similarly. then, in this challenging example, LBFGSH struggles to reach even a modest tolerance as shown in the extension in the bottom plots of Fig. 16.
However, for overall simulation time, HOT outperforms all three variations of PN and LBFGSH due to its much lower periteration cost. PNMGPCG, although with the best convergence rate, falls well behind HOT and only behaves slightly better than the two PNPCG flavors, as the costly reconstruction of the multigrid hierarchy as well as the stiffness matrix is repeated in each Newton iteration. LBFGSH then struggles where we observe that many linear solves wellexceed the PCG iteration cap at . At the bottom of Fig. 16, we see that LBFGSH eventually converges after outer iterations. Here, it appears that the diagonal preconditioner at the coarsest level in HOT significantly promotes the convergence of the whole solver; in contrast, while the same preconditioner in LBFGSH loses its efficiency at the finest level — the system is much much larger and conditioning of the system matrix exponentially exacerbates.
Visualization of Convergence
In Fig. 17 we visualize the progressive convergence of HOT and LBFGSH w.r.t. the CNscaled nodal residuals for the stretched box example. Here HOT quickly smooths out lowfrequency errors as in iteration 6 the background color of the box becomes blue (small error) and the highfrequency errors are progressively removed until HOT converges in iteration 25. For LBFGSH, both the low and highfrequency errors are simultaneously removed slowly and it takes LBFGSH 106 iterations to converge.
Comparison to the baseline geometric multigrid
As discussed earlier, building a geometric multigrid directly from particle quadratures can result in nearly no speedup for coarser matrices and mismatch of degreeoffreedoms. We compare to this baseline geometric multigrid on the ArmaCat example (Fig. 2) by utilizing both multigrids in a PNMGPCG time integrator. As we see in the top plot of Fig. 18, geometric multigrid effectively achieves 5 faster convergence than PNPCG with Jacobi preconditioner, but still less effective than our 10 speedup in this specific time step. Then the bottom plot shows that this convergence relation among all three candidates remains consistent throughout the animation of the ArmaCat example. However, in very few cases (e.g. Boards) we occasionally observed that baseline geometric multigrid preconditioned PNMGPCG converges even more slowly than Jacobi preconditioned PNPCG.
Then we compare HOT to applying GMG in LBFGS (LBFGSGMG, see Table 2 in supplemental). We see that the convergence of LBFGSGMG is orders of magnitude slower than HOT for all scenes containing stiff materials. For the two scenes with soft materials only (Sauce and Faceless), even the convergence is only slightly slower than HOT, the timing is more than slower, which further demonstrate the inefficiency of the multigrid operations in GMG.
7.4. Varying Material Stiffness
Finally, to again consider consistency, we compare the convergence and overall performance of all the five methods on the same simulation setup as we progressively increase the Young’s moduli of a material. Here we form a bar composed of three materials (see inset of Fig.19). The two bar ends are kept with a fixed constant Young’s modulus of Pa across all simulations in the experiment. We then progressively increase the Young’s modulus of the middle bar from Pa up to Pa.
In the bottom plot of Fig. 19, we see that HOT maintains a low and close to flat growth in iterations for increasing stiffness with PNMGPCG demonstrating only a modestly greater iteration growth for stiffness increase. When we consider compute time however, the modest growth iterations for PNMGPCG translates into much less efficient simulations as stiffness increases, due to PNMGPCG’s much greater periteration cost. Here, despite greater iteration growth, LBFGSH does better for scaling to greater stiffness due to it’s lower periteration cost. However, HOT with both a closetoflat growth in iterations and low periteration cost maintains consistent behavior both with respect to iteration growth and generally best runtime performance with low growth across increasing stiffness.
7.5. Ablation study on HOT’s kernel
Since our multigrid can be constructed using either linear or quadratic Bsplines for node embedding which potentially leads to a tradeoff between convergence and periteration cost because of the resulting coarser level shape functions, we here use an ablation study on the two design choices to back up our decision on using linear kernels. As shown in Table 2 from the supplemental document, HOT with linear embedding performs equally well on convergence compared to using quadratic embedding. In a few cases e.g. “Twist” and “Boxes”, linear embedding even converges much faster. This is reasonable as we can see in Fig. 7, the resulting shape functions on the coarser level when using linear or quadratic node embedding do not have significant differences. But because linear embedding leads to much sparser coarse systems, it is much faster on timing than quadratic embedding.
8. Conclusions and Future Work
Starting from node embedding we derive, to our knowledge, the first MPM Galerkin multigrid. The resulting method is consistent with particle quadrature geometric multigrid while still providing efficiency in construction and automatically handling boundary conditions. Then we build HOT to utilize our multigrid as an inner initializer within LBFGS to avoid the repeated expensive matrix reconstruction cost required in traditional PNMGPCG. Instead efficient curvature updates ensures fast yet inexpensive convergence. HOT accelerates implicit MPM timestepping to a wide range of important material behaviors, achieving up to speedup compared to the stateoftheart methods and shines among an exhaustive set of variations. We hope that this will enable research to leverage HOTtype hierarchies to further address both spatial and temporal discretization limitations in current MPM pipelines.
Semiimplicit plasticity is a limitation of HOT. For current HOT simulations with plasticity, we have not encountered instabilities. Nevertheless, adding plasticity return mapping as a constraint within HOT’s stable implicit timestep optimization is an exciting and challenging future direction to explore. Another challenging but interesting extension would be to incorporate additional physical constraints in our minimization framework. One particularly useful case is volume preservation for e.g. simulating stiff granular materials. Currently HOT offers consistent and performant MPM simulations to gain the advantages of implicit time stepping without need for parameter tuning. We look forward to further extensions, such as those above, and its application as a standard toolkit to accelerate the applications of MPM for the physical sciences, animation, engineering and beyond.
On the implementation side, the construction of the finest level system matrix is one of the bottlenecks for HOT. In our code it is realized with scattering scheme, which suffers from cache misses. Therefore, exploring the performance potential of alternative gathering schemes for building the stiffness matrices can be a meaningful future work.
References
Appendix – Nodewise Characteristic Norm
Concretely, we compute the norm of the stress derivative evaluated at the undeformed configuration in the diagonal space, , for each particle and transfer this scalar field with mass weighting and normalization to corresponding grid nodes quantities . Then we compute the nodewise CN as
(11) 
per node where characterizes discretization, characterizes averaged material stiffness per node, and characterizes time step scaling. In meshbased FE, is the perimeter/area of the polygon/polyhedron formed by the onering elements connecting to node in 2D/3D [bcqn18]. For MPM, we correspodningly have in 2D and in 3D from the uniform backing Cartesian grid discretization.
Then, for convergence check, we scale each entry of the incremental potential gradient vector with the corresponding nodewise CN computed in Eq. (11), obtaining , termination queries then compare against , where represents the number of active grid nodes and is the selected accuracy tolerance. Note, as a check, that when a single uniform material is applied to a simulated domain, our extended CN measure reduces back to Li et al.’s [Li:2019:DOT] L2 measure.
See pages 1last of supdoc_small.pdf
Comments
There are no comments yet.