Hierarchical Optimization Time Integration for CFL-rate MPM Stepping

We propose Hierarchical Optimization Time Integration (HOT) for efficient implicit time stepping of the Material Point method (MPM) irrespective of simulated materials and conditions. HOT is a MPM-specialized hierarchical optimization algorithm that solves nonlinear time step problems for large-scale MPM systems near the CFL-limit, e.g., with step sizes around 0.01s. HOT provides "out-of-the-box" convergent simulations across widely varying materials and computational resolutions without parameter tuning. As the first MPM solver enhanced by h-multigrid, HOT is highly parallelizable and, as we show in our analysis, robustly maintains consistent and efficient performance even as we grow stiffness, increase deformation, and vary materials over a wide range of finite strain, elastodynamic and plastic examples. Through careful benchmark ablation studies, we compare the effectiveness of HOT against seemingly plausible alternative combinations of MPM with standard multigrid and other Newton-Krylov models. We show how these alternative designs result in severe issues and poor performance. In contrast, HOT outperforms existing state-of-the-art, heavily optimized implicit MPM codes with an up to 10x performance speedup across a wide range of challenging benchmark test simulations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

page 13

page 19

01/21/2021

A Gauss-Seidel projection method with the minimal number of updates for stray field in micromagnetic simulations

Magnetization dynamics in magnetic materials is often modeled by the Lan...
02/19/2021

Do NHL goalies get hot in the playoffs? A multilevel logistic regression analysis

The hot-hand theory posits that an athlete who has performed well in the...
04/23/2020

Rapid Detection of Hot-spots via Tensor Decomposition with applications to Crime Rate Data

We propose an efficient statistical method (denoted as SSR-Tensor) to ro...
07/25/2017

Kinetic Simulation of Collisional Magnetized Plasmas with Semi-Implicit Time Integration

Plasmas with varying collisionalities occur in many applications, such a...
07/12/2021

Combining p-multigrid and multigrid reduced in time methods to obtain a scalable solver for Isogeometric Analysis

Isogeometric Analysis (IgA) has become a viable alternative to the Finit...
06/07/2021

Smooth Implicit Hybrid Upwinding for Compositional Multiphase Flow in Porous Media

In subsurface multiphase flow simulations, poor nonlinear solver perform...
05/07/2008

Hot Roller Embossing for the Creation of Microfluidic Devices

We report on the hot roller embossing of polymer sheets for the creation...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Figure 1. HOT is naturally suited for simulating dynamic contact and fracture of heterogeneous solid materials with substantial stiffness discrepancy. In this bar twisting example, compared across all available state-of-the-art, heavily optimized implicit MPM codes, HOT achieves more than 4 speedup overall and up to 10 per-frame. HOT obtains rapid convergence without need for per-example hand-tuning of either outer nonlinear solver nor inner linear solver parameters.

The Material Point method (MPM) is a versatile and highly effective approach for simulating widely varying material behaviors ranging from stiff elastodynamics to viscous flows (e.g. Figures 11 and 13) in a common framework. As such MPM offers the promise of a single unified, consistent and predictive solver for simulating continuum dynamics across diverse and potentially heterogenous materials. However, to reach this promise, significant hurdles remain. Most significantly, obtaining accurate, consistent and robust solutions within a practical time budget is severely challenged by small timestep restrictions. This is most evidenced as we vary material properties, amounts of deformation and/or simulate heterogenous systems (see Table 1).

While MPM’s Eulerian grid resolution limits time step sizes to the CFL limit222A particle cannot travel more than one grid cell per time step while, in practice, a CFL number of is often used [gast:2015:tvcg]. [fang2018temporally], the explicit time integration methods commonly employed for MPM often require much smaller time steps. In particular, the stable timestep sizes of explicit MPM time integration methods remain several orders of magnitude below the CFL limit when simulating stiff materials like metal (see Table 1) and snow [stomakhin:2013:snow; fang2018temporally]. A natural solution then is to apply implicit numerical time integration methods, i.e. implicit Euler, which can enable larger stable time step sizes for MPM [gast:2015:tvcg; Fang:2019:ViscousMPM]. However, doing so requires solving challenging and potentially expensive nonlinear systems at every timestep.

1.1. Challenges to implicit MPM timestepping

While implicit MPM time timestepping methods in engineering provide larger step sizes, they still remain limited to impractically small step sizes, i.e., around s even for a small scale system with grids in 2D [nair2012implicit]. This is because in engineering, standard Newton method is applied without any stabilization strategy like line search or trust region to solve the nonlinear time stepping problem [nair2012implicit; charlton2017igimp]; with larger time step size, the nonlinearity grows and the 1st-order Taylor expansion becomes less accurate, which can then make Newton method unstable and even explode. More recently state-of-the-art implicit MPM methods in graphics have been introduced that enable time steps closer to the CFL limit. Gast and colleagues [gast:2015:tvcg] introduce a globalized Newton-Krylov method for MPM while Fang et al. [Fang:2019:ViscousMPM] extend ADMM to solve implicit MPM timesteps. However, their convergence and performance are limited for simulations involving heterogeneous and/or stiff materials, leading to slow computations and inconsistent, unpredictable and even unstable results. While ADMM [Fang:2019:ViscousMPM] for MPM is attractively efficient, the underlying ADMM algorithm has no guarantee of convergence for nonlinear continua problems. In practice it can at best achieve linear convergence. As we show in Section 7, when able to converge the ADMM solver is thus exceedingly slow to reach reasonable solutions.

On the other hand inexact Newton-Krylov methods exemplified by Gast et al. [gast:2015:tvcg] are seemingly ideal for solving implicit MPM problems where the sparsity structure of the Hessian can change at every timestep. Key to the efficiency and stability of these methods are the inexact iterative linear solve of each inner Newton iterate. In turn this requires setting a tolerance to terminate each such inner loop. However, no single tolerance setting works across examples. Instead, working tolerances, as we show in Section 7, can and will vary over many orders of magnitude per example and so must be experimentally determined as we change set-ups over many expensive, successive simulation trials. Otherwise, as we show in our Section 7 and our supplemental, a tolerance suitable for one simulated scene will generate extremely slow solves, non-physical artifacts and even explosions in other simulations.

Next, we observe that for each large-scale linear system solve in the inner loop of a Newton-Krylov method, classic Jacobi and Gauss-Seidel preconditioned CG solvers lose significant efficiency from slowed convergence as problem stiffnesses increase (Table 3). For such cases multigrid preconditioners [wang:2018:multigridcloth; Tamstorf:2015:multigridcloth; zhu2010efficient] are often effective solutions as the underlying hierarchy allows aggregation of multiple approximations of the system matrix inverse across a range of resolutions. This accelerates information propagation across the simulation domain, improving convergence.

However, h-multigrid methods for MPM have not previously been proposed in either the graphics or engineering and building a multigrid hierarchy for MPM is challenging. As we discuss below in Section 7.3, that while the direct application of multigrid by coarsening system DOFs on grid nodes improves convergence of inner linear solves, this seemingly reasonable hierarchy its computational overhead does not reduce the overall cost of MPM simulations. This is because construction and evaluation of system matrices in each coarser level can be as expensive as the fine level computation while DOFs art each level may not be matched – especially at the domain boundaries. Alternately we could try merging particles level by level to reduce matrix construction costs. However, merging lacks both error bounds and can worsen DOF mismatch.

1.2. Hierarchical Optimization Time Integration

We propose the HOT algorithm to address these existing limitations and so provide “out of the box” efficient MPM simulation. To enable consistent, automatic termination of both outer Newton iterations and inner inexact linear solves across simulation types we extend the characteristic norm [Li:2019:DOT; bcqn18] to inhomogenous MPM materials. As we show in Section 5.2) and Table 1 in our supplemental, this produces consistent, automatic, high-quality results for inexact Newton-Krylov simulations that match the quality and timing of the best hand-tuned results of Gast et al. [gast:2015:tvcg].

To obtain both improved convergence and performance for MPM systems with multigrid we develop a new, MPM-customized hierarchy. We begin by embedding progressively finer level grid nodes into coarser level nodes with the MPM kernel. We then construct coarse level matrices directly from their immediate finer level matrix entries. This avoids computation and storage of each coarse level’s geometric information, automatically handles boundaries and enables flexible control of sparsity via choice of MPM embedding kernel. As we will see this resulting multigrid then retains improved convergence and gains significantly improved performance; see Figure 18.

While offering a significant gain, our MPM-customized multigrid still requires explicit matrix construction. In some elastodynamic applications, matrix construction costs are alleviated by applying only a fixed number of Newton iterations irrespective of convergence. This sacrifices consistency. For example producing artificially softened materials and numerically damping dynamics so that artists can not control output. This also generates inaccurate results so that output can not be used for applications in engineering, experiment and design. Following recent developments in mesh-based elasticity decomposition methods [Li:2019:DOT] we instead observe that our hierarchy can be constructed once per timestep, and then applied as an efficient second-order initializer, in our case with one V-cycle per iteration, inside a quasi-Newton solve.

1.3. Contributions

HOT’s inner multigrid provides efficient second-order information, while its outer quasi-Newton low-rank updates provide efficient curvature updates. This enables HOT to maintain consistent, robust output with a significant speedup in performance – even as we grow stiffness, increase deformation and widely vary materials across the simulation domain. These, as we show in Section 7, are the regimes where traditional Newton-Krylov methods suffer from widely varying and often degraded performance.

We note that applying L-BFGS without HOT, either with a single level Hessian inverse approximation, or with the direct, baseline multigrid results in much worse performance (see Table 2 in our supplemental). Thus as we demonstrate in our ablation study in Section 7, it is HOT’s combined application of node-embedding multigrid, automatic termination, and customized integration of multigrid V-cycle into the quasi-Newton loop that jointly provide its significant and consistent performance gains. Thus, in summary, HOT’s contributions are

  • We derive a novel MPM-specific multigrid model exploiting the regularity of the background grid and constructing a Galerkin coarsening operator consistent with re-discretization via particle quadrature. To our knowledge, this is the first time h-multigrid is applied successfully for the MPM discretization of nonlinear elasticity.

  • We develop a new, node-wise Characteristic Norm [Li:2019:DOT; bcqn18] (CN) measure for MPM. Node-wise CN enables unified tolerancing across varying simulation resolutions, material parameters and heterogenous systems for both termination of inner solves in inexact Newton and convergence determination across methods. CN likewise ensures a fair comparison across all solvers in our experiments.

  • We construct HOT – an out-of-the-box implicit MPM time integrator by employing our multigrid as an efficient inner initializer inside a performant quasi-Newton MPM time step solve. A carefully designed set of algorithmic choices customized for MPM then achieve both efficiency and accuracy that we demonstrate on a diverse range of numerically challenging simulations.

  • We perform and analyze extensive benchmark studies on challenging, industrial scale simulations to determine these best data structure and algorithmic choices for MPM numerical time integration. Across these simulation examples, we compare HOT against a wide range of alternative, seemingly reasonable algorithmic choices to demonstrate their pitfalls and the carefully designed advantages of HOT.

Across a wide range of challenging elastodynamic and plastic test simulations we show (see Section 7) that HOT without the need of any parameter tuning outperforms existing state-of-the-art, heavily optimized implicit MPM codes. All alternative methods either exhibit significantly slower performance or else suffer from large variations across simulated examples. Our study then suggests HOT as robust, unified MPM time integrator with fast convergence and outstanding efficiency across a wide range of possible simulation input.

Density () Young’s modulus (Pa) Poisson’s ratio Yield stress (Pa) Tissue - Rubber - Bone - PVC Metal Ceramic -
Table 1. Parameters for solid materials studied in this paper.
Figure 2. Stiffness comparisons. A stiff lucky cat is smashed onto sheets with different youngs moduli starting from aluminum (6.9Gpa, with yield stress 240Mpa) and then being scaled down by , and . Different stiffness gives drastically different behavior for elatoplastic materials.

2. Related Work

2.1. Material Point Method

MPM was introduced by Sulsky et al. [sulsky:1994:history-materials] as a generalization of FLIP [brackbill1988flip; zhu:2005:sand-fluid] to solid mechanics. The accuracy, or the convergence to the analytical solution of MPM was demonstrated computationally and explained theoretically by Steffen et al. [steffen:2008:analysis] with a smooth e.g. quadratic B-spline basis to represent solutions on the grid, which is further verified by Wallstedt [wallstedt:2009:order] with manufactured solutions.

In graphics, MPM has been shown effective for snow [stomakhin:2013:snow], sand [gao2018gpu; yue2018hybrid; daviet:2016:smp; klar:2016:des], foam [yue:2015:mpm-plasticity; ram:2015:foam; Fang:2019:ViscousMPM], cloth [jiang:2017:cloth; guo2018material], rod [han2019hybrid; fei2019mmc], sauce [MS:2019], fracture [Wolper:2019:CDMPM; Wretborn:2017:ACP:3170009.3170083; wang2019simulation], multiphase flow [stomakhin:2014:augmented-mpm; andre:2017:wetsand; gao2018animating], and even baking of food [ding2019thermomechanical]. More recently, coupling MPM and rigid bodies was explored in both explicit [hu2018] and implicit [ding2019penalty] settings. While using rigid bodies to approximate stiff materials works sufficiently well in certain animation applications, it is however not a viable option for animating elastoplastic yielding or when accurate mechanical response measures are crucial.

An implicit time integration scheme such as implicit Euler is often the preferred choice for stiff materials and large deformations due to the unacceptable sound-speed CFL restriction [fang2018temporally] in explicit integrators, despite their superior level of parallelization. The first implicit MPM [guilkey:2003:implicit-mpm] used Newmark integration and demonstrated that besides stability, the accuracy of the implicit solution was also superior to the explicit MPM when compared to validated finite element solution. Recently, Nair and Roy [nair2012implicit] and Charlton et al. [charlton2017igimp]

further investigated implicit generalized interpolation MPM for hyperelasticity and elastoplasticity respectively. On the other hand, researchers in graphics have explored force linearization

[stomakhin:2013:snow] and an optimization-stabilized Newton-Raphson solver for both implicit Euler [gast:2015:tvcg] and implicit Midpoint [jiang2017angular] to achieve large time step sizes. In practice, to ensure the highly nonlinear systems are solved in an unconditionally stable way, optimization-based time integration is often applied.

Figure 3. ArmaCat. A soft armadillo and a stiff lucky cat are both dropped onto an elastic trampoline, producing interesting interactions between them.

2.2. Optimization and non-linear integrators

Numerical integration of partial differential systems can often be reformulated variationally as an optimization problem. These methods can then often achieve improved accuracy, robustness and performance by taking advantage of optimization methods. In computer graphics, simulation methods are increasingly applying this strategy to simulating both fluid [batty:2007:solid-fluid; Weiler2016-ud] and solid [gast:2015:tvcg; wang2016descent; bouaziz2014projective; Overby2017-ns; Dinev2018-pp; Dinev2018-zn] dynamics, often enabling large time stepping. For optimizations originating from nonlinear problems, Newton-type methods are generally the standard mechanism, delivering quadratic convergence near solutions. However, when the initial guess is far away from a solution, Newton’s method may fail to provide a reasonable search direction as the Hessian can be indefinite [Li:2019:DOT; liu2017quasi; Smith2018-hv]. Teran et al. [teran2005robust] propose a positive definite fix to project the Hessian to a symmetric positive definite form to guarantee a descent direction can be found; we compare with this method and further augment it with a backtracking line search to ensure energy decreases with each Newton iteration. We refer to this method as projected Newton (PN) throughout the paper.

In each PN iteration, a linear system has to be solved. For MPM simulations which often contains a large number of nodes, Krylov iterative linear solvers such as conjugate gradient (CG) are often more favorable than direct factorizations. To improve CG convergence, different preconditioning options exist. We apply the most efficient and straightforward Jacobi (diagonal) preconditioner as our baseline PN to which we refer as PN-PCG. To further minimize memory consumption and access cost, all the existing implicit MPM works in graphics apply a matrix-free (MF) version of PN-PCG where only matrix-vector product is implemented as a function without explicitly constructing the system matrix. Then if a lot of CG iterations are required especially when using large time step sizes and/or stiff materials, matrix-free might not necessarily be a better option than explicitly building the matrix (Section 

7). To further accelerate convergence in these challenging settings, more effective but expensive preconditioners such as the multigrid could potentially be applied. However, neither graphics nor engineering literature has explored the adoption of h-multigrid in MPM.

2.3. Multigrid methods

Multigrid methods [MGBook] have been widely employed in accelerating existing computational frameworks for solving both solid [McAdams2011-kv; zhu2010efficient; wang:2018:multigridcloth; Tamstorf:2015:multigridcloth; xian2019multigrid; tielen2019efficient] and fluid dynamics [fidkowski2005p; mcadams2010parallel; Setaluri:2014:spgrid; Zhang2015-lv; Zhang2016-hv; Zhang2014-io; gao2018animating; Aanjaneya2017-mi]. With multi-level structures, information of a particular cell can propagate faster to distant cells, making multigrid methods highly efficient for systems with long-range energy responses, or high stiffnesses. Unlike p-multigrid [fidkowski2005p; tielen2019efficient] that uses higher order shape functions on the same degree-of-freedoms to improve convergence, h-multigrid constructs coarser degree-of-freedoms that potentially has lower computational cost, which is often more favorable but has not been explored in MPM.

H-multigrid is generally categorized as geometric or algebraic. Unlike algebraic multigrid that relies purely on algebraic operations to define the coarser system matrices and does not utilize any geometric information [stuben2001review], geometric multigrid constructs the coarser level system matrices from coarsened grids or meshes. However, the mismatch at the irregular boundaries due to the geometric coarsening may require specialized treatment to ensure convergence improvement, e.g. extra smoothing at the boundary as in McAdam et al. [mcadams2010parallel]. Instead, Chentanez and Müller [nvidiaMGFlow] take a volume weighted discretization and demonstrate that more robust results can be obtained without extra smoothing at boundaries. On the other hand, Ando et al. [ando2015pressure] derive a multi-resolution pressure solver from a variational framework which handles boundaries using signed-distance functions.

On the other hand, galerkin multigrid [strangAndArikka1986] automatically handles the boundary conditions well by correctly passing the boundary conditions between different multigrid levels but do not maintain sparsity. Xian et al. [xian2019multigrid] designed their special galerkin projection criterion based on skinning space coordinates with piecewise constant weights to maintain sparsity, but their projection could potentially lead to singular coarser level matrices. In our work, we derive prolongation and restriction operators and the coarser level matrices all with our node embedding idea, and the resulting formulation turns out to be consistent with Galerkin multigrids. But due to the regularity of the MPM grid, our coarse level matrices are guaranteed to be full-rank and support sparsity control in a flexible way with kernel selection.

As in Ferstl et al. [mcadams2010parallel], McAdams et al. [AMGAdaptgrid], and Zhang et al. [Zhang2016-hv], one natural approach to utilize our multigrid model is to then augment the Krylov solver as a preconditioner. As demonstrated in our benchmark experiments, this superficially straightforward utilization does not outperform existing diagonally-preconditioned alternatives (PN-PCG) due to repeated expensive reconstruction of the multilevel hierarchy in each Newton step. We instead develop HOT, a highly performant quasi-Newton L-BFGS solver where the multigrid model is novelly adopted as an efficient inner initializer. HOT achieves significant performance gains and outperforms all alternating methods on a wide range of challenging examples.

2.4. Quasi-Newton Methods

Quasi-Newton methods e.g. L-BFGS have long been applied for simulating elastica  [Deuflhard2011-pf]. L-BFGS can be highly effective for minimizing potentials. However, an especially good choice of initializer is required and makes an enormous difference in convergence and efficiency [nocedal:2006:numerical]. Directly applying a lagged Hessian at the beginning of each time step is of course the most straightforward option which effectively introduces second order information [Brown2013]; unfortunately, it is generally a too costly option with limitations in terms of scalability. Liu et al. [liu2017quasi] propose to instead invert the Laplacian matrix which approximates the rest-shape Hessian as initializer. This provides better scalability and more efficient evaluations, but convergence speed drops quickly in nonuniform deformation cases [Li:2019:DOT]. Most recently Li et al. [Li:2019:DOT] propose a highly efficient domain-decomposed initializer for mesh-based FE that leverage start of time step Hessians — providing both scalability and fast convergence in challenging elastodynamic simulations.

For the MPM setting, where inexact Newton-Krylov methods are generally required given the scale of MPM simulation discretizations, the convergence of iterative linear solvers, rather than direct solvers, become critical for scalability. HOT leverages the start of time step Hessians at the beginning of each incremental potential solve, applying a new, inner hierarchical strategy for L-BFGS to build an efficient method that outperforms or closely matches best-per-example prior methods across all tested cases on state-of-the-art, heavily optimized implicit MPM codes.

Unlike Wen and Goldfarb [wen2009line] which requires many unintuitive parameter setting to alternate between multigrid and single-level smoothing and to convexify the objective function, HOT consistently apply V-cycles on our node embedding multigrid constructed from the projected Hessian [teran2005robust] as inner initializer, without the need of any parameter tuning.

3. Problem Statement and Preliminaries

3.1. Optimization-based Implicit MPM

MPM assembles a hybrid Lagrangian-Eulerian discretization of dynamics. A background Cartesian grid acts as the computational mesh while material states are tracked on particles.

Notation. In the following we apply subscripts for particles and for grid quantities respectively. We then remove subscripts entirely, as in , to denote vectors constructed by concatenating nodal quantities over all grid nodes. Superscripts , and then distinguish quantities at time steps , and .

Implicit MPM time stepping with implicit Euler from to is then performed by applying the operations:

  • Particles-to-grid (P2G) projection. Particle masses and velocities are transferred to the grid’s nodal masses and velocities by APIC [jiang:2015:apic].

  • Grid time stepping. Nodal velocity increments are computed by minimizing implicit Euler’s incremental potential as in Eq. (1) and are then applied to update nodal velocities by .

  • Grid-to-particles (G2P) interpolation. Particle velocities are interpolated from by APIC.

  • Particle strain-stress update. Particle strains (e.g. deformation gradients ) are updated by the velocity gradient via the updated Lagrangian. Where appropriate, inelasticity is likewise enforced through per-particle strain modification [stomakhin:2013:snow; gao2017adaptive].

  • Particle advection. Particle positions are advected by .

Here we focus on developing an efficient and robust solver for MPM grid time stepping in Step 2. All other steps are standard (ref. [jiang:2016:course]).

Assuming mechanical responses correspond to an MPM nodal-position dependent potential energy , e.g. a hyperelastic energy, Gast et al. [gast:2015:tvcg] observe that minimization of

(1)

subject to proper boundary conditions is equivalent to solving the MPM implicit Euler update , where is the implicit nodal force. Minimization of a corresponding incremental potential for the mesh-based elasticity has been widely explored for stable implicit Euler time stepping [bouaziz2014projective; liu2017quasi; Overby2017-ns; Li:2019:DOT]. For MPM, however, a critical difference is that nodal positions are virtually displaced from the Eulerian grid during the implicit solve, and are then reset to an empty Cartesian scratchpad. Significantly, in different time steps, the system matrix can also have different sparsity patterns. This is an essential reason why matrix-free Newton-Krylov methods are generally preferred over direct factorizations.

Figure 4. Boxes. A metal box is concatenated with two elastic boxes on both sides. As the sphere keeps pushing the metal box downwards, the elastic boxes end up being torn apart.
Given: ,
Output:
Initialize and Precompute:
      ,  
          // is defined in Eq. 1
while do    // termination criteria (§5.2)
       // [teran2005robust]
       // adaptive inexactness (§5.3)
       // as relative tolerance
       // back-tracking line search
   
   
   
end while
ALGORITHM 1 Inexact Newton-Krylov Method

3.2. Inexact Newton-Krylov Method

To minimize Eqn.(1) for high resolution scenarios where more than 100K degree of freedoms are common, direct solvers can be too slow or even run out of memory altogether for numerical factorization. Instead Newton-Krylov methods are generally obtained for scalability, and further computational savings can be achieved by employing inexact Newton. For example, Gast et al. [gast:2015:tvcg] propose to use L2 norm of incremental potential’s gradient to adaptively terminate Krylov iterations and computational effort in early Newton iterations can be saved by inexactly solving the linear system. However, Gast and colleagues mainly target softer materials while in more general cases objects may have Youngs modulus as large as e.g. for metal wheel as shown in Fig. 11. It becomes even more challenging when materials with widely varying stiffnesses interact with each other. In these cases the inexact Newton method in Gast et al. [gast:2015:tvcg] can simply fail to converge within practical amounts of time in our experiments on the Twist (Fig. 1) and Chain (Fig. 9) examples.

This observation is not new and has motivated researchers to question whether we can apply an early termination to the Newton iterations such that we can still get visually appealing results that are stable and consistent. Li et al. [Li:2019:DOT] extend the characteristic norm (CN) from distortion optimization [bcqn18] to elastodynamics and demonstrate its capability to obtain consistent, relative tolerance settings across a wide set of elastic simulation examples over a range of material moduli and mesh resolutions. However, for a scene with materials with drastically different stiffness parameters, the averaging L2 measure will not suffice to capture the multiscale incremental potential gradient in a balanced manner.

We propose an extended scaled-CN in our framework to support common multi-material applications in MPM. The incremental potential gradients are nonuniformly scaled such that the multiscale residuals can be captured effectively. We apply this new characteristic norm to terminate outer Newton iterations, and also improve the inexact strategy for the inner linear solve iterations with the same idea to create our baseline PN solver. See Algorithm 1 for our inexact Newton and details can be found in Section 5.

With this extended CN and the improved inexact solving strategy, Newton-Krylov methods may still suffer from poor convergence simply because the linear systems could easily become ill-conditioned when materials with large youngs moduli are involved. This triggers the desire of designing better preconditioning methods such as incomplete Cholesky and multigrid. However, the former one is not a viable option here as elastodynamic system matrices are not M-matrices [kershaw1978incomplete]. Likewise, we observe that the extra computational cost of the construction of the multigrid hierarchy per outer iteration may not be well compensated by the convergence improvement within our Newton-Krylov framework. We thus propose a novel MPM multigrid scheme in Section 4 and then HOT can be built up by employing this hybrid multigrid as an efficient inner initializer in a highly performant quasi-Newton loop as in Section 5.

Figure 5. Boards. A granular flow is dropped onto boards with varying Young’s moduli, generating coupled dynamics.

4. MPM-Multigrid

4.1. Motivation

We begin with an -level multigrid hierarchy. We denote level and level as the finest and coarsest levels respectively. System matrices are constructed for each single level along with the prolongation and restriction operators between each adjacent levels for exchanging information. Here is the prolongation matrix between level and level , and is the corresponding restriction matrix.

Unlike McAdams et al. [mcadams2010parallel] where the Poisson matrix can be completely derived from the Cartesian grid, the system matrix of MPM also depends on the particle positions and strains because the particles are used as the quadrature points. Thus for constructing the matrices for the coarser levels, traditional geometric multigrid methodology would require the definition of “coarsened” particles. The direct geometric solution is then to always use the actual particles as the quadrature points in all the coarse levels, which is extremely inefficient (Section 7.3). Another straightforward way is to merge nearby particles into “larger” particles as in Gao et al. [gao2017adaptive] and Yue et al. [yue:2015:mpm-plasticity]

; however, there only exists ad-hoc or heuristic methods for interpolating the deformation gradients that the resultant Hessians at coarser levels may fail to correctly encode the original problem and lead to slow convergence. Moreover, this method requires to construct nodes with particle-like attributes at each level introduces computating overhead. To remove these challenges, we propose to construct our hierarchy by embedding finer level grids into the coarser level grids analogously to MPM’s embedding of particles into grid nodes. Then by explicitly storing the system matrix we progressively constructing coarser level matrices, directly from the adjacent finer level matrix entries, avoiding the need to compute or store any coarser level geometric information. We next show that our multigrid is consistent with Galerkin multigrid where boundary conditions are automatically handled and by selecting different node embedding kernels we support flexible control on sparsity.

4.2. Node-embedding multigrid derivation

We first derive our restriction/prolongation operators. We begin by considering these two operations between levels and . Nodal forces in the finest level are

(2)
(3)

Here is the initial particle volume, is the energy density function, is the first Piola-Kirchhoff stress, is the deformation gradient and are the weights. Notice that in multigrid hierarchy, the residuals are restricted from finer levels to coarser levels.

Forces at nodes in the next level are then

. Embedding finer level nodes to coarser level nodes, we then can simply apply the chain rule, converting derivatives evaluated at a coarse node to those already available at the finer level:

(4)
(5)

This gives our restriction operation as with .

Prolongation is given by transpose . Recalling that MPM particle velocities are interpolated from grid node velocities as , so that

(6)

or

For coarse matrices, we similarly can verify by computing the second-order derivative of Eq. (1) w.r.t. . Applying chain rule to take as intermediate variable we then obtain

(7)

where is the Hessian of Eq. (1) w.r.t. . This gives

(8)

consistent with Galerkin multigrid. Since Dirichlet boundary conditions can be achieved by projecting the corresponding rows and columns of the system matrix and the entries of the right-hand-side, Galerkin multigrid automatically transfers boundary conditions from finer levels to coarser levels during the projection algebraically. This will often result in the blending of the boundary DOF’s and non-boundary DOF’s on the coarser level grid nodes, which is more consistent and thus effective compared to purely geometric multigrids.

Here our node embedding kernel can simply be MPM kernels such as linear or quadratic B-splines. Different choice of kernels can result in different sparsity pattern of the coarser matrices, which also leads to different convergence and performance (see the supplemental document Table 2). But because of the locality of B-splines, the sparsity patterns can be effectively controlled. Interestingly, by a careful selection of the kernel, our node-embedding multigrid can be mathematically equivalent to the particle quadrature geometric multigrid.

Figure 6. Algebraic-geometric equivalence. Left: in the finest level, particles’ properties are transferred to the grid nodes via the B-spline quadratic weighting function; middle: then the finer nodes transfer information to coarser nodes via linear embedding relationships — based on which we perform Galerkin coarsening; right: Galerkin coarsening can then be re-interpreted as a new weighing function, with a smaller kernel width, connecting coarser nodes directly with the particles.

4.3. Geometric multigrid perspective

Now applying the standard MPM derivation of the nodal force in level :

(9)

and rewriting Eq. (5) as

(10)

our multigrid then has a simple geometric interpretation as illustrated in Fig. 6. The particle-grid weight gradient between level and the original particles is substituted by . In other words, as a geometric multigrid, we can now define a new weighting function that bridges the particles and the coarse grid nodes such that the coarse grid can be generated by traversing all particles to find occupied coarse nodes. Similarly, a concatenation of prolongation operators for each coarse level, right multiplied by the original weight gradient, gives us the new weight gradients required in each level. With this weight gradient, the Hessian matrix can be easily defined to complete the multigrid model. We use the corresponding weighting function to plot the curves in Fig. 7.

Sparsity pattern in coarse levels.

As discussed in Zhang et al. [Zhang2016-hv], in fluid simulation, when the linear kernel relating fine and coarse cells is adopted for defining and standard Galerkin coarsening is applied, the sparsity of the multigrid levels become increasingy dense with coarsening and eventually the system matrix can even possibly end up being effectively a dense matrix. In contrast, in the MPM framework, with the B-spline quadratic function being the particle-grid weighting function in level , the sparsity pattern of coarse levels actually becomes increasingly sparser with the standard linear kernel for prolongation and restriction. As shown on the left of Fig. 7, the kernel width reduces from to as the level increases. Another option is to replace the linear kernel by the B-spline quadratic weighting to define prolongation and restriction operations. However, in this case the matrices would become increasingly denser in coarser levels (c.f. the right half of Fig. 7), making it computationally less attractive (see the supplemental document Table 2 for the comparison). Thus our MPM multigrid assumes that B-spline quadratic weighting is adopted for particle-grid transfers in finest level and the linear kernel is used for defining prolongation and restriction operators between adjacent levels in the hierarchy.

In Fig. 8, we visualize the matrix sparsity pattern for the ArmaCat example (see Fig. 3); the number of nonzero entries per row in our matrices decrease in our MPM multigrid as the grid coarsens. In contrast, the rightmost visualization shows that when the direct geometric multigrid is used, i.e. when the finest particle set are used to construct the matrix of level , the Hessian matrix is much denser.

Figure 7. Kernel width. The width of our geometric weighting function, equivalently in our algebraic derivation, changes with level increase. For linear embedding (our choice for HOT), width becomes smaller with coarsening while for quadratic embedding width becomes larger.
Figure 8. Sparsity pattern. Our MPM multigrid system matrices become increasingly sparse as levels increases: see the left two for our level-0 and level-4 matrices; while direct geometric multgrid would lead to a denser matrix for increasingly coarse levels: see the rightmost matrix at coarsened level of the direct geometric multigrid. Note that although in simulation we only used a 3-level multigrid, we here illustrate the sparsity patterns with 5 levels for clarity.

5. Hierarchical Optimization Time Integration

Our newly constructed MPM multigrid structure can be used as a preconditioner by applying one V-cycle (Algorithm 2) per iteration for a conjugate gradient solver to achieve superior convergence in a positive-definite fixed, inexact Newton method. In the following we denote this approach as “projected Newton, multigrid preconditioned conjugate gradient” or PN-MGPCG. However, in practice, the cost of reconstructing the multigrid hierarchy at each Newton iteration of PN-MGPCG is not well-compensated by the convergence improvement, providing only little or moderate speedup compared to a baseline projected Newton PCG solver (PN-PCG) (see Figs. 16 and 19) where a simple diagonal preconditioner is applied to CG. This is because in PN-MGPCG, each outer iteration (PN) requires reconstructing the multigrid matrices, and each inner iteration (CG) performs one V-cycle. One reconstruction of the multigrid matrices would take around the time for one V-cycle and over the time for one Jacobi preconditioned PCG iteration. Unlike the Poisson system in Eulerian fluids simulation, the stiffnesses of elastodynamic systems are often not predictable as it varies a lot under different time step sizes, deformation, and dynamics. Therefore, it is hard for PN-MGPCG to consistently well accelerate performance in all time steps.

5.1. Multigrid initialized quasi-Newton method

Rather than applying MPM multigrid as a preconditioner for a Krylov method (which can still both be slow and increasingly expensive as we grow stiffness; see Fig. 19) we apply our MPM multigrid as an inner initializer for a modified L-BFGS. In the resulting hierarchical method, multigrid then provides efficient second-order information, while our outer quasi-Newton low-rank updates [Li:2019:DOT] provide efficient curvature updates to maintain consistent performance for time steps with widely varying stiffness, deformations and conditions. In turn, following recent developments we choose a start of time step lagged model update [Brown2013; Li:2019:DOT]. We re-construct our multgrid structure once at the start of each time step solve. This enables local second order information to efficiently bootstrap curvature updates from the successive light-weight, low-rank quasi-Newton iterations.

Given: , ,    
Input: ,
Output:
for
   
   
   
end for
for
   
   
end for
ALGORITHM 2 Multigrid V-Cycle Preconditioner

This completes the core specifications of our Hierarchical Optimization Time (HOT) integrator algorithm. HOT multigrid hierarchy is constructed at the beginning of each time step. Then, for each L-BFGS iteration, the multiplication of our initial Hessian inverse approximation to the vector is applied by our multigrid V-cycle. To ensure the symmetric positive definiteness of the V-cycle operator, we apply colored symmetric Gauss-Seidel as the smoother for finer levels and employ Jacobi preconditioned CG solves for our coarsest level system (see Algorithm 2). Here, PCG is applied for the coarsest level rather than a direct solver because setting up a direct solver requires additional computations (e.g. factorization) which could not be compensated by the subtle convergence improvement (Section 7.2

). Weighted Jacobi as a widely-used smoother for multigrid preconditioner in Eulerian fluids simulations loses its advantages on simplicity and performance here due to the difficulty in determining a proper weight that guarantees efficiency or even convergence for the non-diagonally dominant elastodynamic Hessian. Similarly, Chebyshev smoother is abandoned because estimating both upper and lower eigenvalues of the system matrix introduce large overhead.

In HOT, curvature information is updated by low-rank secant updates with window size , producing a new descent direction for line search at each L-BFGS iteration. Pseudocode for the HOT method is presented in Algorithm 3 and we analyze its performance, consistency and robustness in Sec. 7.2 with comparisons to state-of-the-art MPM solvers.

Given: , , , ,     Output: Initialize and Precompute:       ,  
          // is defined in Eq. 1
          // [teran2005robust]           // Eq. 8
// Quasi-Newton loop to solve time step :
while do   // termination criteria (§5.2)
        // L-BFGS low-rank update
    for    // break if                          end for        // Algorithm 2     // L-BFGS low-rank update
    for    // skip (continue) until                   end for            // back-tracking line search             end while
ALGORITHM 3 Hierarchical Optimization Time Integrator (HOT)

5.2. Convergence tolerance

To reliably obtain consistent results across heterogenous simulations while saving computational effort, we extend the Characteristic Norm (CN) [bcqn18; Li:2019:DOT] from FE mesh to MPM discretization, taking multi-material domains into account. To simulate multiple materials with significantly varying material properties, coupled in a single simulated system, the traditional averaging L2 measure fails to characterize the multiscale incremental potential’s gradient.

For MPM we thus first derive a node-wise CN in MPM discretization, and then set tolerances with the L2 measure of the node-wise CN scaled incremental potential gradient. See our Appendix for details.

Figure 9. Rotating chain. A chain of alternating soft and stiff rings is rotated until soft rings fracture, dynamically releasing the chain.

5.3. Inexact linear solves

Above we provide a consistent termination criterion for terminating outer nonlinear solve iterates. Within each outer nonlinear iteration, inner iterations are performed to solve the corresponding linear system to a certain degree of accuracy. For inexact Newton-Krylov solver, in the first few Newton iterations where the initial nonlinear residuals are generally far from convergence, inexact linear solves are preferred. As discussed in [gast:2015:tvcg], inexact linear solves can significantly relieve the computational burden in large time step simulations. Thus they set relative tolerance on the energy norm ( is the preconditioning matrix) of the initial residual for each linear solve in PN iteration as where is their nonlinear tolerance, and perform CG iterations until the current is smaller than .

However, this barely works for heterogeneous materials in two aspects. First, the L2 norm of the incremental potential does not take into account its multi-scale nature, potentially providing too small relative tolerances for stiff materials. Second, as discussed earlier, the nonlinear tolerance in Gast et al. [gast:2015:tvcg] is hard to tune especially when stiff materials are present, thus again can make the scale of relative tolerances off. Therefore, we modify this inexact strategy for our baseline PN-PCG by using as the relative tolerance to terminate CG iterations. Here the preconditioning matrix in the energy norm has the effect of scaling per-node residual in awareness of the different material stiffnesses. is simply Li et al.’s [Li:2019:DOT] tolerance on L2 measure characterizing the most stiff material in the running scene, ensuring that the tolerance would not be too small for stiff materials.

HOT similarly exploits our inexact solving criterion for the coarsest level PCG during early L-BFGS iterations. Specifically, in each V-cycle, we recursively restrict the right hand side vector to the coarsest level to obtain . We then set the tolerance for the CG solver to be where is the diagonal matrix extracted from the system matrix at level . Note that the the same V-cycle and termination criterion is also adopted in our PN-MGPCG. As L-BFGS iterations proceed, the norm of decreases, leading to increasingly accurate solves at the coarsest level. As demonstrated in Sec. 7, this reduces computational effort — especially when the system matrices at the coarsest level are not well conditioned.

6. Implementation

Accompanying the submission, we submit all of our code including scripts for re-running all examples using HOT and all the methods we are comparing against (except for ADMM MPM [Fang:2019:ViscousMPM]

, which has been open-sourced separately by its authors). We will release the code as an open-source library with the publication of this work.

Here we provide remarks to the nontrivial implementation details that can significantly influence performance.

Lock-free multithreading.

For all particle-to-grid transfer operations (including each matrix-vector multiplication in PN-PCG(MF)), we adopt the highly-optimized lock-free multithreading from Fang et al. [fang2018temporally]. This also enables the parallelization of our colored symmetric Gauss Seidel smoother for the multigrid V-cycle. All optimizations are thus consistently utilized (wherever applicable) across all compared methods so that our timing comparisons more reliably reflect algorithmic advantages of HOT.

Sparse matrix storage.

We apply the quadratic B-spline weighting kernel for particle fine-grid transfers. The number of non-zero entries per row of the system matrix at the finest level can then be up to where d denotes dimension. In more coarsened levels, the number of non-zero entries decreases due to the linear embedding of nodes in our MPM multigrid, as can be seen from Fig. 7. Similarly, for our restriction/prolongation matrix, the number of non-zero entries per row/column is for linear kernel. Notice that in all cases, the maximum number of nonzeros per row can be pre-determined, thus we employ diagonal storage format in our implementation to store all three matrix types for accelerating matrix computations.

Multigrid application

In our experiments (Section 7), our MPM multigrid is tested both as a preconditioner for the CG solver in each PN-MGPCG outer iteration, and as the inner initializer for each L-BFGS iteration of HOT.

Prolongation and restriction operator.

Our prolongation operator is defined similarly to traditional particle-grid transfers in hybrid methods – finer nodes are temporarily regarded as particles in the coarser level. Spatial hashing is applied to record the embedding relation between finer and coarser grid nodes for efficiency.

Figure 10. Faceless. We rotate the cap and then released the head of the “faceless” mesh. Upon release dynamic rotation and expansion follow. Here we plot the compute time of each single time step frame for HOT and PN-PCG (both matrix-based and matrix-free). HOT outperforms across all time step solves during the simulation.

7. Results and Evaluation

7.1. Benchmark

Methods in comparison

We implement a common test-harness code to enable the consistent evaluation comparisons between our HOT method and other possible Newton-Krylov methods, i.e. PN-PCG, matrix free (MF) PN-PCG, and prior state-of-the-art (MF) from Gast et al. [gast:2015:tvcg]. To ensure consistency, PN-PCG and MF PN-PCG adopt our node-wise CN with the same tolerance. For Gast et al. [gast:2015:tvcg] which adopts a global tolerance for the residual norm, we manually pick the largest tolerance value () that produce artifact-free results for all experiments. In addition to Gast et al. [gast:2015:tvcg], we also compare to the ADMM-based state-of-the-art implicit MPM method [Fang:2019:ViscousMPM] on the faceless example to demonstrate their difference in the order of convergence (Figure 15). Note that other than Gast et al. [gast:2015:tvcg] and Fang et al. [Fang:2019:ViscousMPM], all other compared methods are ones we apply to MPM for the first time in our ablation study.

To continue our ablation study on how each of our design choices in HOT shapes our performance and convergence, we also compare HOT with other new solvers that one may consider design, including (1) HOT-quadratic: HOT’s framework with quadratic (rather than linear) embedding kernel; (2) LBFGS-GMG: LBFGS with a more standard geometric multigrid as the initializer; (3) PN-MGPCG: A Newton-Krylov solver replacing PN-PCG’s Jacobi-precoditioned CG with HOT’s multigrid-preconditioned CG; (4) and an MPM extension of the quasi-Newton LBFGS-H (FEM) from Li et al. [Li:2019:DOT]. Note that unlike in Li et al. [Li:2019:DOT] where the LBFGS-H is based on fully factorizing the begining of time step hessian matrix with direct solver, here we only partially invert the hessian by conducting Jacobi preconditioned CG iterations with adaptive terminating criteria identical to that of the coarsest-level solve in HOT (Section 5.3). In other words, it is an inexact LBFGS-H equivalent to a single-level HOT. This inexact LBFGS-H often leads to better performance than those with direct solvers in large-scale problems.

All methods in our ablation study together with Gast et al. [gast:2015:tvcg] are implemented in C++ and consistently optimized (see Section 6). Assembly and evaluations are parallelized with Intel TBB.

Figure 11. Wheel. Our HOT integrator enables unified, consistent and predictive simulations of metals with real-world mechanical parameters (with stress magnitude visualized).
Figure 12. Donut. An elastic torus is mounted between two metal bars. Collisions with rigid balls deform the attaching bars that then break the torus.
Figure 13. HOT Sauce. HOT sauce is poured onto a turkey.

Simulation settings

Time step size for all simulations is set to throughout all the simulations. Here is selected to fulfill the CFL condition. We find out throughout our tests that 3-level multigrid preconditioner with 1 symmetric Gauss-Seidel smoothing and inexact Jacobi PCG as the coarsest-level solver works best for HOT and PN-MGPCG. We observe a window size of 8 for LBFGS methods yields favorable overall performance. In our experiments, across a wide range of scenes, delivers consistent visual quality for all examples as we vary materials with widely changing stiffness, shapes and deformations.

Fixed corotated elasticity (FCR) from Stomakhin et al. [stomakhin:2012:invertible] is applied as our elasticity energy in all examples. In addition to our scenes with purely elastic materials: Twist (Fig. 1), ArmaCat (Fig. 3), Chain (Fig. 9), and Faceless (Fig. 10), we also test with plastic models: von Mises for Sauce (Fig. 13) and for metal in Wheel (Fig. 11), the center box in Boxes (Fig. 4), and the bars in Donut (Fig. 12); and granular flow [stomakhin:2013:snow] in Boards (Fig. 5).

We perform and analyze extensive benchmark studies on challenging simulation settings, where the material parameters are all from real world data, and most of them are heterogenous. This not only significantly simplifies the material tuning procedure in animation production, but also helps to achieve realistic physical behavior and intricate coupling between materials with varying stiffness. Fig. 2 demonstrates that simulating a scene with aluminum sheets with inappropriate material parameters could end up getting unexpected behavior. See Table 1 for the physical parameters in daily life and Table 2 for material parameters used in our benchmark examples.

Detailed timing statistics for all examples are assembled in Table 1 and Table 2 from the supplemental document. As discussed in Section 5.2, all evaluated methods in our ablation study are terminated with the same tolerance computed from our extended characteristic norm for fair comparisons across all examples.

7.2. Performance

Example Particle # (m) density (kg/m) Young’s modulus (Pa)
(Fig. 1) Twist k / 0.4
(Fig. 4) Boxes k / / 0.33
(Fig. 12) Donut k / / 0.33
(Fig. 3) ArmaCat k // // 0.4/0.47/0.4
(Fig. 9) Chain k / / 0.4
(Fig. 5) Boards k - 0.33
(Fig. 11) Wheel k / / 0.4/0.33
(Fig. 10) Faceless k 0.3
(Fig. 13) Sauce k 0.33
Table 2. Material parameters. The von Mises yield stress: Pa; Pa. The plastic flow from Stomakhin et al. [stomakhin:2013:snow]

, where singular values of the deformation gradient are clamped into

.

We analyze relative performance of HOT in two sets of comparisons. First, we compare HOT against the existing state-of-the-art implicit MPM methods [gast:2015:tvcg; Fang:2019:ViscousMPM]. Second, we perform an extensive ablation study to highlight the effectiveness of each of our design choices.

Comparison to state-of-the-art

As discussed earlier, the performance and quality of results from Gast15 [gast:2015:tvcg] are highly dependent, per-example, to choice of tolerance parameter. Across many simulation runs we sought optimal settings for this parameter to efficiently produce visually plausible, high-quality outputs. However, we find that suitable nonlinear tolerances vary extensively with simulation conditions that vary materials and boundary conditions. For example, we found an ideal tolerance for the Wheel example (Figure 11) at , while for the Faceless example (Figure 10) we found best. On the other hand applying the tolerance generates instabilities and even explosions for all other examples (see the supplemental document Figure 1), while using tolerance produces extremely slow performance especially for examples containing stiff materials (see the supplemental document Table 1). As for ADMM MPM [Fang:2019:ViscousMPM], as it is a first-order method we observe slow convergence. Thus we postpone detailed analysis to our convergence discussion below (Section 7.3). In contrast HOT requires no parameter tuning. All results, timings and animations presented here and in the following were generated without parameter tuning using the same input settings to the solver. As demonstrated they efficiently converge to generate consistent visual quality output.

Ablation Study

We start with the homogeneous “faceless” example with a soft elastic material ( Pa); we rotate and raise its head and then release. As shown in Fig. 10, for this scene with moderate system conditioning, HOT already outperforms the two PN-PCG methods from our ablation set in almost every frame. Here there is already a nearly 2 average speedup of HOT for the full simulation sequence compared to both the two PN-PCG variations; while the overall maximum speedup per frame is around 6.

We then script a set of increasingly challenging stress-test scenarios across a wide range of material properties, object shapes, and resolutions; see, e.g., Figs. 3 and 13 as well as the supplemental video. For each simulation we apply HOT with three levels so that the number of nodes is generally several thousand or smaller at the coarsest level. In Table 1 from the supplemental document we summarize runtime statistics for these examples comparing HOT’s total wall clock speedup for the entire animation sequence, and maximum speedup factor per frame compared to PN-PCG, PN-PCG(MF), PN-MGPCG, and LBFGS-H across the full set of these examples.

Timings

Across this benchmark set we observe HOT has the fastest runtimes, for all but two examples (see below for discussion of these), over the best timing for each example across all methods: PN-PCG, PN-PCG(MF), PN-MGPCG, and LBFGS-H. Note that these variations for our ablation study are already well exceed the state-of-the-art Gast15 in most of the examples. In general HOT ranges from 1.98 to 5.79 faster than PN-PCG, from 1.05 to 5.76 faster than PN-PCG(MF), from 2.26 to 10.67 faster than PN-MGPCG, and from 1.03 to 4.79 faster than LBFGS-H on total timing. The exceptions we observe are for the easy Sauce (Young’s Pa) and ArmaCat (Young’s Pa) examples, where materials are already quite soft and the deformation is moderate. In these two simple examples HOT performs closely to the leading LBFGS-H method. However, when simulations become challenging we observe that LBFGS-H can have trouble converging. This is most evident in the stiff aluminum Wheel example (Fig. 11), where the metal is undergoing severe deformation. Here HOT stays consistently efficient, outperforming all other methods. See our convergence discussion below for more details. Importantly, across examples, we observe that alternate methods PN-PCG, PN-PCG(MF), PN-MGPCG, and LBFGS-H swap as fastest per example so that it is never clear which would be best a priori as we change simulation conditions. While in some examples each method can have comparable performance within 2 slower than HOT, they also easily fall well behind to both HOT and other methods in other examples (Fig. 14). In other words, none of these other methods have even empirically consistent good performance across tested examples. The seemingly second best LBFGS-H can even fail in some extreme cases. For most of the scenes with heterogenous materials or large deformations, e.g. Twist, Boxes, Donut, and Wheel, which results in more PN iterations, PN-PCG is faster than its matrix-free counterpart PN-PCG(MF). Among these examples only Boxes and Wheel can be well-accelerated by using MGPCG for PN.

Figure 14. Speedup overview. Here (Top) we summarize method timings for all benchmark examples measuring the total runtime of each method normalized w.r.t the timing of the HOT algorithm (“how HOT”) over each simulation sequence; and so determine HOT’s speed-up. Bottom we comparably report the normalized maximum frame-wise timing of each method w.r.t HOT across all benchmark examples; and so again determine per-frame max speed-up of HOT.

Gauss-Seidel preconditioned CG

Here we additionally compare the symmetric Gauss-Seidel (SGS) and Jacobi preconditioned PN-PCG to show that a simple trade-off between convergence and per-iteration cost might not easily lead to siginificant performance gain (Table 3). SGS preconditioned PN-PCG significantly improves convergence compared to Jacobi preconditioning as one would expected, but due to its more expensive preconditioning computation, the performance is right at the same level and sometimes even worse. This is also why we applied Jacobi preconditioned CG to solve our coarsest level system.

Example SGS Jacobi
avg time avg CG iter avg time avg CG iter
Twist 492.20 679.91 361.53 1054.16
Boxes 1368.54 34.13 466.94 248.98
Donut 410.34 45.59 240.65 375.39
ArmaCat (soft) 109.08 29.04 111.04 66.43
ArmaCat (stiff) 148.38 62.63 153.84 157.77
Chain 398.79 47.27 572.01 92.67
Boards 371.66 59.17 313.62 206.35
Wheel 269.08 106.13 206.14 424.13
Faceless 12.22 16.14 7.21 63.80
Sauce 22.66 9.98 29.07 27.06
Table 3. Jacobi v.s. Gauss-Seidel preconditioned PN-PCG: Here we compare symmetric Gauss-Seidel and Jacobi preconditioned PN-PCG. The runtime environment for all benchmark examples are identical to Table 1 from the supplemental document. avg time measures average absolute cost (seconds) per playback frame, avg iter measures the average number of PCG iterations (per method) required per time step to achieve the requested accuracy.

Changing machines

Across different consumer-level Intel CPU machines we tested (see the supplemental document Table 1), we see that HOT similarly maintains the fastest runtimes across all machines regardless of available CPU or memory resources, over the best timing for each example between PN-PCG, PN-PCG(MF), PN-MGPCG, and LBFGS-H.

7.3. Convergence

HOT balances efficient, hierarchical updates with global curvature information from gradient history. In this section we first compare HOT’s convergence to the state-of-the-art ADMM MPM [Fang:2019:ViscousMPM], and then analyze the convergence behavior based on our ablation study. Here we exclude Gast15 as it uses different stopping criteria and, as discussed above, requires intensive parameter tuning.

Comparison to ADMM

Here we compare to the ADMM-MPM [Fang:2019:ViscousMPM] on a pure elasticity example faceless (Fig. 10) by importing their open-sourced code into our codebase and adopted our nodewise CN based termination criteria (Section 5.2). Despite their advantages on efficiently resolving fully-implicit visco-elasto-plasticity, on this pure elasticity example we observe that as a first-order method, ADMM converges much slower than all other Newton-type or Quasi-Newton methods including HOT (Figure 15). Although the overhead per iteration of ADMM is generally few times smaller, the number of iterations required to reach the requested accuracy is orders of magnitudes more. Nevertheless, ADMM-MPM is more likely to robustly generate visually plausible results within first few iterations, while Newton-type or Quasi-Newton methods might not.

Figure 15. Comparison to ADMM-MPM. CN-scaled gradient norm to timing and iteration plots for the first time step of the faceless example (Fig. 10) of all methods including ADMM-MPM [Fang:2019:ViscousMPM]. With much lower per-iteration cost, ADMM quickly reduces the residual within first few iterations (top). However, as a first-order method it converges slowly to the requested accuracy compared to all others which converges super-linearly. As a result, it is slower than our HOT.

Ablation Study

In Fig. 16 we compare convergence rates and timings across methods for a single time step of the Wheel example. In terms of convergence rate, we see in the top plot, that PN-MGPCG obtains the best convergence, while HOT, PN-PCG and PN-PCG(MF) converge similarly. then, in this challenging example, LBFGS-H struggles to reach even a modest tolerance as shown in the extension in the bottom plots of Fig. 16.

However, for overall simulation time, HOT outperforms all three variations of PN and LBFGS-H due to its much lower per-iteration cost. PN-MGPCG, although with the best convergence rate, falls well behind HOT and only behaves slightly better than the two PN-PCG flavors, as the costly reconstruction of the multigrid hierarchy as well as the stiffness matrix is repeated in each Newton iteration. LBFGS-H then struggles where we observe that many linear solves well-exceed the PCG iteration cap at . At the bottom of Fig. 16, we see that LBFGS-H eventually converges after outer iterations. Here, it appears that the diagonal preconditioner at the coarsest level in HOT significantly promotes the convergence of the whole solver; in contrast, while the same preconditioner in LBFGS-H loses its efficiency at the finest level — the system is much much larger and conditioning of the system matrix exponentially exacerbates.

Figure 16. Convergence comparisons. Top: the iteration counts for the Wheel example w.r.t. CN of different methods are visualized. Here PN-MGPCG demonstrates best convergence. Middle: total simulation times of all methods w.r.t. CN are plotted; here HOT, with low per-iteration cost obtains superior performance across all methods. Bottom: in this extreme deformation, high-stiffness example LBFGS-H converges at an extremely slow rate.

Visualization of Convergence

In Fig. 17 we visualize the progressive convergence of HOT and LBFGS-H w.r.t. the CN-scaled nodal residuals for the stretched box example. Here HOT quickly smooths out low-frequency errors as in iteration 6 the background color of the box becomes blue (small error) and the high-frequency errors are progressively removed until HOT converges in iteration 25. For LBFGS-H, both the low- and high-frequency errors are simultaneously removed slowly and it takes LBFGS-H 106 iterations to converge.

Figure 17. Convergence visualization on stretched box. A soft box deforms as its deformation gradient was initialized to some diagonal matrix with the diagonal entries randomly sampled in . The nodal characteristic norms of different iterations in the first time step are visualized on the rest shape. Here HOT quickly removes low-frequency errors and converges in 25 iterations, while LBFGS-H converges in 106 iterations, removing both low- and high-frequency errors simultaneously.

Comparison to the baseline geometric multigrid

As discussed earlier, building a geometric multigrid directly from particle quadratures can result in nearly no speedup for coarser matrices and mismatch of degree-of-freedoms. We compare to this baseline geometric multigrid on the ArmaCat example (Fig. 2) by utilizing both multigrids in a PN-MGPCG time integrator. As we see in the top plot of Fig. 18, geometric multigrid effectively achieves 5 faster convergence than PN-PCG with Jacobi preconditioner, but still less effective than our 10 speedup in this specific time step. Then the bottom plot shows that this convergence relation among all three candidates remains consistent throughout the animation of the ArmaCat example. However, in very few cases (e.g. Boards) we occasionally observed that baseline geometric multigrid preconditioned PN-MGPCG converges even more slowly than Jacobi preconditioned PN-PCG.

Figure 18. Comparison to the baseline geometric multigrid. Top: CG iteration counts in one of the time steps of the ArmaCat example w.r.t. CN of all methods. Bottom: per frame CG iteration counts. Here the convergence of PN-MGPCG when using the baseline geometric multigrid is worse than using our node embedding multigrid but better than Jacobi preconditioned PN-PCG. Moreover, its overall timing is worse than the other two.

Then we compare HOT to applying GMG in LBFGS (LBFGS-GMG, see Table 2 in supplemental). We see that the convergence of LBFGS-GMG is orders of magnitude slower than HOT for all scenes containing stiff materials. For the two scenes with soft materials only (Sauce and Faceless), even the convergence is only slightly slower than HOT, the timing is more than slower, which further demonstrate the inefficiency of the multigrid operations in GMG.

7.4. Varying Material Stiffness

Finally, to again consider consistency, we compare the convergence and overall performance of all the five methods on the same simulation set-up as we progressively increase the Young’s moduli of a material. Here we form a bar composed of three materials (see inset of Fig.19). The two bar ends are kept with a fixed constant Young’s modulus of Pa across all simulations in the experiment. We then progressively increase the Young’s modulus of the middle bar from Pa up to Pa.

In the bottom plot of Fig. 19, we see that HOT maintains a low and close to flat growth in iterations for increasing stiffness with PN-MGPCG demonstrating only a modestly greater iteration growth for stiffness increase. When we consider compute time however, the modest growth iterations for PN-MGPCG translates into much less efficient simulations as stiffness increases, due to PN-MGPCG’s much greater per-iteration cost. Here, despite greater iteration growth, L-BFGS-H does better for scaling to greater stiffness due to it’s lower per-iteration cost. However, HOT with both a close-to-flat growth in iterations and low per-iteration cost maintains consistent behavior both with respect to iteration growth and generally best runtime performance with low growth across increasing stiffness.

Figure 19. Convergence and performance consistency for increasing stiffness. Twisting a multimaterial bar we keep ends with fixed Young’s moduli and progressively increase Young’s for the middle segment. Across increasing stiffness HOT exhibits the best consistency w.r.t. both iteration count and the overall simulation time.

7.5. Ablation study on HOT’s kernel

Since our multigrid can be constructed using either linear or quadratic B-splines for node embedding which potentially leads to a trade-off between convergence and per-iteration cost because of the resulting coarser level shape functions, we here use an ablation study on the two design choices to back up our decision on using linear kernels. As shown in Table 2 from the supplemental document, HOT with linear embedding performs equally well on convergence compared to using quadratic embedding. In a few cases e.g. “Twist” and “Boxes”, linear embedding even converges much faster. This is reasonable as we can see in Fig. 7, the resulting shape functions on the coarser level when using linear or quadratic node embedding do not have significant differences. But because linear embedding leads to much sparser coarse systems, it is much faster on timing than quadratic embedding.

8. Conclusions and Future Work

Starting from node embedding we derive, to our knowledge, the first MPM Galerkin multigrid. The resulting method is consistent with particle quadrature geometric multigrid while still providing efficiency in construction and automatically handling boundary conditions. Then we build HOT to utilize our multigrid as an inner initializer within L-BFGS to avoid the repeated expensive matrix reconstruction cost required in traditional PN-MGPCG. Instead efficient curvature updates ensures fast yet inexpensive convergence. HOT accelerates implicit MPM time-stepping to a wide range of important material behaviors, achieving up to speedup compared to the state-of-the-art methods and shines among an exhaustive set of variations. We hope that this will enable research to leverage HOT-type hierarchies to further address both spatial and temporal discretization limitations in current MPM pipelines.

Semi-implicit plasticity is a limitation of HOT. For current HOT simulations with plasticity, we have not encountered instabilities. Nevertheless, adding plasticity return mapping as a constraint within HOT’s stable implicit time-step optimization is an exciting and challenging future direction to explore. Another challenging but interesting extension would be to incorporate additional physical constraints in our minimization framework. One particularly useful case is volume preservation for e.g. simulating stiff granular materials. Currently HOT offers consistent and performant MPM simulations to gain the advantages of implicit time stepping without need for parameter tuning. We look forward to further extensions, such as those above, and its application as a standard tool-kit to accelerate the applications of MPM for the physical sciences, animation, engineering and beyond.

On the implementation side, the construction of the finest level system matrix is one of the bottlenecks for HOT. In our code it is realized with scattering scheme, which suffers from cache misses. Therefore, exploring the performance potential of alternative gathering schemes for building the stiffness matrices can be a meaningful future work.

References

Appendix – Node-wise Characteristic Norm

Concretely, we compute the norm of the stress derivative evaluated at the undeformed configuration in the diagonal space, , for each particle and transfer this scalar field with mass weighting and normalization to corresponding grid nodes quantities . Then we compute the node-wise CN as

(11)

per node where characterizes discretization, characterizes averaged material stiffness per node, and characterizes time step scaling. In mesh-based FE, is the perimeter/area of the polygon/polyhedron formed by the one-ring elements connecting to node in 2D/3D [bcqn18]. For MPM, we correspodningly have in 2D and in 3D from the uniform backing Cartesian grid discretization.

Then, for convergence check, we scale each entry of the incremental potential gradient vector with the corresponding node-wise CN computed in Eq. (11), obtaining , termination queries then compare against , where represents the number of active grid nodes and is the selected accuracy tolerance. Note, as a check, that when a single uniform material is applied to a simulated domain, our extended CN measure reduces back to Li et al.’s [Li:2019:DOT] L2 measure.

See pages 1-last of supdoc_small.pdf