Fast and light-weight methods for animating 3D characters are desirable in various applications including computer games and film visual effects. Traditional skinning-based mesh deformation provides a fast geometric approach but often lacks realistic dynamics. On the other hand, physically-based simulation can add plausible secondary motion to skinned animations, augmenting them with visually realistic and vivid effects, but at the cost of heavy computation.
Recent research has explored deep learning methods to approximate physically-based simulation in a much more time-efficient manner. While some approaches have focused on accelerating specific parts of the simulation[luo2018nnwarp, fulton2019latent, meister2020deep], others have proposed end-to-end solutions that predict dynamics directly from mesh based features [bailey2018fast, holden2019subspace, holden2019subspace, santesteban2020softsmpl]
. While demonstrating impressive results, these methods still have some limitations. Most of them assume a fixed mesh topology and thus need to train different networks for different character meshes. Moreover, in order to avoid the computational complexity of training networks on high resolution meshes, some methods operate on reduced subspaces with limited degrees of freedom, leading to low accuracy.
In this paper, we propose a deep learning approach to predict secondary motion, i.e., the deformable dynamics of given skinned animations of 3D characters. Our method addresses the shortcomings of the recent learning-based approaches by designing a network architecture that can reflect the actual underlying physical process. Specifically, our network models the simulation using a volumetric mesh consisting of uniform tetrahedra surrounding the character mesh, where the mesh edges encode the internal forces that depend on the current state (i.e., displacements, velocities, accelerations), material properties (e.g., stiffness), and constraints on the vertices. Mesh vertices encode the inertia. Motivated by the observation that within a short time instance the secondary dynamics of a vertex is mostly affected by its current state, as well as the internal forces due to its neighbors, our network operates on local patches of the volumetric mesh. In addition to avoiding the computational complexity of encoding high resolution character meshes as large graphs, this also enables our method to be applied to any character mesh, independent of its topology. Finally, our network encodes per-vertex material properties and constraints, giving the user the ability to easily prescribe varying properties to different parts of the mesh to control the dynamic behaviour.
As a unique benefit of the generalization capability of our model, we demonstrate that it is not necessary to construct a massive training dataset of complex meshes and motions. Instead, we construct our training data from primitive geometries, such as a volumetric mesh of a sphere. Our network trained on this dataset can generate detailed and visually plausible secondary motions on much more complex 3D characters during testing. By assigning randomized motions to the primitives during training, we are able to let the local patches cover a broad motion space, which improves the network’s online predictions in unseen scenarios.
We evaluate our method on various character meshes and complex motion sequences. We demonstrate visually plausible and stable secondary motion while being over 30 times faster than the implicit Euler method commonly used in physically-based simulation. We also provide comparisons to faster methods such as the explicit central differences method and other learning-based approaches that utilize graph convolutional networks. Our method outperforms those approaches both in terms of accuracy and robustness.
2 Related Work
2.1 Physically based simulation methods
Complementing skinning-based animations with secondary motion is a well-studied problem. Traditional approaches resort to using physically-based simulation [Zhang:CompDynamics:2020, Wang:2020:ACS]. However, it is well-known that physically based methods often suffer from computational complexity. Therefore, in the last decade, a series of methods were proposed to accelerate the computation process, including example-based dynamic skinning [shi2008example], efficient elasticity calculation [mcadams2011efficient], formulation of motion equations in the rig subspace [hahn2012rig, hahn2013efficient], and the coupling of the skeleton dynamics and the soft body dynamics [liu2013simulation]. These approaches still have some limitations such as robustness issues due to explicit integration, or unnatural deformation effects due to remeshing, while our method is much more robust in handling various characters and complex motions.
2.2 Learning based methods
Grzeszczuk et al. [grzeszczuk1998neuroanimator] presented one of the earliest works that demonstrated the possibility of replacing numerical computations with a neural network. Since then research in this area has advanced, especially in the last few years. While some approaches have presented hybrid solutions where a neural network replaces a particular component of the physically based simulation process, others have presented end-to-end solutions.
In the context of hybrid approaches, plug-in deep neural networks were applied in combination with the Finite Elements Method (FEM), to help accelerate the simulation. For example, the node-wise NNWarp [luo2018nnwarp] was proposed to efficiently map the linear nodal displacements to nonlinear ones. Fulton et al.[fulton2019latent]
utilized an autoencoder to project the target mesh to a lower dimensional space to increase the computation speed. Similarly, Tan et al.[tan2020realtime] designed a CNN-based network for dimension reduction to accelerate thin-shell deformable simulations. Romero et al. [ROCP20] built a data-driven statistical model to kinematically drive the FEM mechanical simulation. Meister et al. [meister2020deep] explored the use of neural networks to accelerate the time integration step of the Total Lagrangian Explicit Dynamics (TLED) for complex soft tissue deformation simulation. Finally, Deng et al. [deng2020alternating] modeled the force propagation mechanism in their neural networks. Those approaches improved efficiency but at the cost of accuracy and are not friendly to end users who are not familiar with physical techniques. Ours, instead, allows the user to adjust the animation by simply painting the constraints and stiffness properties.
End-to-end approaches assume the target mesh is provided as input and directly predict the dynamics behaviour. For instance, Bailey et al. [bailey2018fast] enriched the real-time skinning animation by adding the nonlinear deformations learned from film-quality character rigs. The work of Holden et al [holden2019subspace] first trained an autoencoder to reduce the simulation space and then learned to efficiently approximate the dynamics projected to the subspace. Similarly, SoftSMPL [santesteban2020softsmpl] modeled the realistic soft-tissue dynamics based on a novel motion descriptor and a neural-network-based recurrent regressor that ran in the nonlinear deformation subspace extracted from an autoencoder. While all these approaches presented impressive results, their main drawback was the assumption of a fixed mesh topology requiring different networks to be trained for different meshes. Our approach, on the other hand, operates at a local patch level and can therefore generalize to different meshes at test time.
Lately, researchers started to utilize the Graph Convolutional Network (GCN) for simulation tasks due to its advantage in handling topology-free graphs. The GCN encodes the vertex positional information and aggregates the latent features to a certain node by using the propagation rule. For particle-based systems, graphs are constructed based on the local adjacency of the particles at each frame and fed into GCNs [li2018learning, ummenhofer2019lagrangian, sanchez2020learning, de2020combining]. Concurrently, Pfaff et al. [pfaff2020learning] proposed a GCN for surface mesh-based simulation. While these GCN models interpret the mesh dynamics prediction as a general spatio-temporal problem, we incorporate physics into the design of our network architecture, e.g. inferring latent embedding for inertia and internal forces, which enables us to achieve more stable and accurate results (Section 4.3).
Given a 3D character and its primary motion sequence obtained using standard linear blend skinning techniques [skinningcourse:2014], we first construct a volumetric (tetrahedral) mesh and a set of barycentric weights to linearly embed the vertices of the character’s surface mesh into the volumetric mesh [James:2004:Squashing], as shown in Figure 1. Our network operates on the volumetric mesh and predicts the updated vertex positions with deformable dynamics (also called the secondary motion) at each frame given the primary motion, the constraints and the material properties. The updated volumetric mesh vertex positions then drive the original surface mesh via the barycentric embedding, and the surface mesh is used for rendering; such a setup is very common and standard in computer animation.
We denote the reference tetrahedral mesh and its number of vertices by and respectively. The skinned animation (primary motion) is represented as a set of time-varying positions . Similarly, we denote the predicted dynamic mesh by and its positions by .
Our method additionally encodes mass and stiffness properties. The stiffness is represented as Young’s modulus. By painting different material properties per vertex over the mesh, users can control the dynamic effects, namely the deformation magnitude.
In contrast to previous works [santesteban2020softsmpl, pfaff2020learning] which trained neural networks directly on the surface mesh, we choose to operate on the volumetric mesh for several reasons. First, volumetric meshes provide a more efficient coarse representation and can handle character meshes that consist of multiple disconnected components. For example, in our experiments the “Michelle” character (see Figure 1) consists of vertices whereas the corresponding volumetric mesh only has vertices. In addition, the “Big Vegas” character mesh (see Figure LABEL:teaser) has eight disconnected components, requiring the artist to build a watertight mesh first if using a method that learns directly on the surface mesh. Furthermore, volumetric meshes not only capture the surface of the character but also the interior, leading to more accurate learning of the internal forces. Finally, we use a uniformly voxelized mesh subdivided into tetrahedra as our volumetric mesh, which enables our method to generalize across character meshes with varying shapes and resolutions.
Next, we will first explain the motion equations in physically-based simulation and then discuss our method in detail, drawing inspiration from the physical process.
3.1 Physically-based Motion Equations
In constraint-based physically-based simulation [baraff2001physically], the equations of motion are
where is the diagonal (lumped) mass matrix (as commonly employed in interactive applications), is the Rayleigh damping matrix, and , and represent the positions, velocities and accelerations, respectively. The quantity represents the internal elastic forces. Secondary dynamics occurs because the constraint part of the mesh “drives” the free part of the mesh. Constraints are specified via the constraint matrix and the selection matrix . In order to leave room for secondary dynamics for 3D characters, we typically do not constrain all the vertices of the mesh, but only a subset. For example, in the Big Vegas example (see Figure LABEL:teaser), we constrain the legs, the arms and the core inside the torso and head, but do not constrain the belly and hair, so that we can generate secondary dynamics in those unconstrained regions.
One approach to timestep Equation 1 is to use an explicit integrator, such as central differences:
where and denote the state of the mesh in the current and next frames, respectively, and is the timestep. While the explicit integration is fast, it suffers from stability issues. Hence, the slower but stable implicit backward Euler integrator is often preferred in physically-based simulation [Baraff:1998:LSI]:
We propose to approximate implicit integration as
where is a differentiable function constructed as a neural network with learned parameters .
3.2 Network design
to predict all the degrees of freedom at once would lead to a huge and impractical network, which would furthermore not be applicable to input meshes with varying number of vertices and topologies. Inspired by the intuition that within a very short time moment, the motion of a vertex is mostly affected by its own inertia and the internal forces from its neighboring vertices, we design our network to operate on a local patch instead. As illustrated in Figure2, the 1-ring local patch consists of one center vertex along with its immediate neighbors in the volumetric mesh. Even though two characters might have very different mesh topologies, as shown in Figure 1, their local patches will often be more similar, boosting the generalization ability of our network.
The internal forces are caused by the local stress, and the aggregation of the internal forces acts to pull the vertices to their positions in the reference motion, to reduce the elastic energy. Thus, the knowledge of the per-edge deformation and the per-vertex reference motion are needed for secondary motion prediction.
Hence, we propose to emulate this process as follows:
where , and
are three different multi-layer perceptrons (MLPs) as shown in Figure3, are neighboring vertices of (excluding ), and the double indices denote the central vertex and a neighbor Quantities and
are high dimensional latent vectors that represent an embedding for inertia dynamics and the internal forces from each neighboring vertex, respectively. Perceptronreceives the concatenation of and the sum of to predict the final acceleration of a vertex . In practice, for simplicity, we train to directly predict since we assume a fixed timestep of in our experiments.
We implement all the three MLPs with four hidden fully connected layers activated by the Tanh function, and one output layer. During training, we provide the ground truth positions in the dynamic mesh as input. During testing, we provide the predictions of the network as input in a recurrent manner. Next, we discuss the details of these components.
This perceptron focuses on the center vertex itself, encoding the “self-inertia” information. That is, the center vertex tends to continue its current motion, driven by both the velocity and acceleration. The input to is the position of the center vertex in the last three frames both on the dynamic and skinned mesh, ,, and ,, , as well as its material properties, . The positions are represented in local coordinates with respect to , the current position of the center vertex in the reference motion. The positions in the last three frames implicitly encode the velocity and the acceleration. Since we know that the net force applied on the central vertex is divided by its mass in Equation 4 and it is relatively hard for the network to learn multiplication or division, we also include explicitly in the input. The hidden layer and output size is 64.
For an unconstrained center vertex , perceptron encodes the “internal forces” contributed by its neighbors. The input to the MLP is similar to except that we provide information both for the center vertex as well as its neighbors. For each neighboring vertex , we also provide the constraint information ( if a free vertex; if constrained). Each provides a latent vector for the central vertex. The hidden layer and output size is 128.
This module receives the concatenated outputs from and the aggregation of , and predicts the final displacement of the central vertex in the dynamic mesh. The input and hidden layer size is 192.
We train the final network with the mean square error loss:
is the ground truth. We adopted the Adam optimizer for training, with a learning rate starting from 0.0001 along with a decay factor of 0.96 at each epoch.
3.3 Training Primitives
Because our method operates on local patches, it is not necessary to train it on complex character meshes. In fact, we found that a training dataset constructed by simulating basic primitives, such as a sphere (under various motions and material properties), is sufficient to generalize to various character meshes at test time. Specifically, we generate random motion sequences by prescribing random rigid body motion of a constrained beam-shaped core inside the spherical mesh. The motion of this rigid core excites dynamic deformations in the rest of the sphere volumetric mesh. Each motion sequence starts by applying, to the rigid core, a random acceleration and angular velocity with respect to a random rotation axis. Next, we reverse the acceleration so that the primitive returns back to its starting position, and let the primitive’s secondary dynamics oscillate out for a few frames. While the still motions ensure that we cover the cases where local patches are stationary (but there is still residual secondary dynamics from primary motion), the random accelerations help to sample a diverse set of motions of local patches as much as possible. Doing so enhances the networks’s prediction stability.
In this section, we show qualitative and quantitative results of our method, as well as comparisons to other methods. We also run an ablation study to verify why explicitly providing the position information on the reference mesh as input is necessary.
4.1 Dataset and evaluation metrics
For training, we use a uniform tetrahedral mesh of a sphere. We generate random motion sequences at 24 fps, using the Vega FEM simulator [Vega, sin2013vega]. For each motion sequence, we use seven different material settings. Each motion sequence consists of 456 frames resulting in a total of 255k frames in our training set.
We evaluate our method on 3D character animations obtained from Adobe’s Mixamo dataset [mixamo]. Neither the character meshes nor the primary motion sequences are seen in our training data. We create test cases for five different character meshes as listed in Table 1 and 15 motions in total. The volumetric meshes for the test characters use the same uniform tetrahedron size as our training data. For all the experiments, we report three types of metrics:
Single-frame RMSE: We measure the average root-mean-square error (RMSE) between the prediction and the ground truth over all frames, while providing the ground truth positions of the previous frames as input.
Rollout RMSE: We provide the previous predictions of the network as input to the current frame in a recurrent manner and measure the average RMSE between the prediction and the ground truth over all frames.
: We use the concept of elastic energy in physically-based simulation to detect abnormalities in the deformation sequence, or any possible mesh explosions. For each frame, we calculate the elastic energy based on the current mesh displacements with respect to its reference state. We list the the ,
as well as the standard deviation (to show the energy distribution across the animation.
4.2 Analysis of Our Method
In Table 1, we show the speed of our method, as well as that of the ground truth method and a baseline method
. For each method, we record the time to calculate the dynamic mesh but exclude other components such as initialization, rendering and mesh interpolation.
We adopted the implicit backward Euler approach (Equation 3) as ground truth and the faster explicit central differences integration (Equation 2) as the baseline. Both our baseline and ground truth were optimized using the deformable object simulation library, Vega FEM [Vega, sin2013vega], and accelerated using multi-cores via Intel Thread Building Blocks (TBB), with 8 cores for assembling the internal forces and 16 cores for solving the linear system. The experiment platform is with 2.90 GHz Intel Xeon(R) CPU E5-2690 (32 GB RAM) which provides for a highly competitive baseline/ground truth implementation. We ran our trained model on a GeForce RTX 2080 graphics card (8 GB RAM). We also tested it on CPU, without any multi-thread acceleration.
Moreover, we also provide performance results for the same character mesh (Big Vegas) with different voxel resolutions. To handle different resolutions of testing meshes, we resize the volumetric mesh to have the local patch similar to the training data (i.e., the shortest edge length is 0.2).
Results indicate that when ran on GPU (CPU), our method is around 30 (20) times faster than the implicit integrator and 3 (2) times faster than the explicit integrator, per frame. Under an increasing number of vertices, our method has an even more competitive performance. Although the explicit method has comparable speed to our method, the simulation explodes after a few frames. In practice, explicit methods require much smaller time steps, which required additional 100 sub-steps in our experiments, to achieve stable quality. We provide a more detailed report on the speed-stability relationship of explicit integration in the supplementary material.
We train the network on the sphere dataset and achieve a single frame RMSE of 0.0026 on the testing split of this dataset (the sphere has a radius of ). As listed in Table 2, when tested on characters, our method achieves a single frame RMSE of , showing remarkable generalization capability (we note that the shortest edge length on the volumetric character meshes is ). The mean rollout error increases to after running the whole sequences due to error accumulation, but elastic energy statistics are still close to the ground truth. From the visualization of the ground truth and our results in Figure 6, we can see that although the predicted secondary dynamics have slight deviation from the ground truth, they are still visually plausible. We further plot the rollout prediction RMSE and elastic energy of the Big Vegas character in Figure 4. It can be seen that the prediction error remains under , and the mean elastic energy of our method is always close to the ground truth for the whole sequence, whereas the baseline method explodes quickly. We provide such rollout prediction plots for all characters in the supplemental and the video results in supplemental material.
|Ours w/o ref. motion||0.050||0.20||0.38||10.09|
Figure 5 shows how to control the dynamics by painting non-homogeneous material properties over the mesh. Varying stiffness values are painted on the hair and the breast region on the volumetric mesh. For better visualization, we render the material settings across the surface mesh in the figure. We display three different material settings, by assigning different stiffness values. Larger means stiffer material, hence the corresponding region exhibits less dynamics. In contrast, the regions with smaller show significant dynamic effects. This result demonstrates that our method correctly models the effect of material properties while providing an interface for the artist to efficiently adjust the desired dynamic effects.
To demonstrate that it is necessary to incorporate the reference mesh motion into the input features of our network, we performed an ablation study. To ensure that the constrained vertices are still driving the dynamic mesh in the absence of the reference information, we update the positions of the constrained vertices based on the reference motion, at the beginning of each iteration. As input to our network architecture, we use the same set of features except the positions on the reference mesh. The results of “Ours w/o ref. motion” in Table 2 and Figure 6 and 4 demonstrate that this version is inferior to our original method, especially when running the network over a long time sequence. This establishes that the reference mesh is indispensable to the quality of the network’s approximation.
4.3 Comparison to Previous Work
As discussed in Section 2, several recent particle-based physics and mesh-based deformation systems utilized graph convolutional networks (GCNs). In this section, we train these network models on the same training set as our method and test on our character meshes.
We implemented our version of the CFD-GCN architecture, adopting the convolution kernel of [kipf2016semi]
. However, we ignored the remeshing part because we assume that the mesh topology remains fixed when predicting secondary motion. As input, we provide the same information as our method, namely the constraint states of the vertices, the displacements and the material properties. We found that the network structure recommended in the paper resulted in a high training error. We then replaced the originally proposed ReLu activation function with the Tanh activation (as used in our method), which significantly improved the training performance. Even so, as shown in Table2 and Figure 4, the rollout prediction explodes very quickly. We speculate that although the model aggregates the features from the neighbors to a central vertex via an adjacency matrix, it treats the center and the neighboring vertices equally, whereas in reality, their roles in physically-based simulation are distinct.
The recently proposed GNS [sanchez2020learning] architecture is also a graph network designed for particle systems. The model first separately encodes node features and edge features in the graph and then generalizes the GraphNet blocks in [pmlr-v80-sanchez-gonzalez18a] to pass messages across the graph. Finally, a decoder is used to extract the prediction target from the GraphNet block output. The original paper embeds the particles in a graph by adding edges between vertices under a given radius threshold. In our implementation, we instead utilized the mesh topology to construct the graph. We used two blocks in the “processor” [sanchez2020learning] to achieve a network capacity similar to ours. In contrast to CFD-GCN [de2020combining], the GraphNet block can represent the interaction between the nodes and edges more efficiently, resulting in a significant performance improvement in rollout prediction settings. However, we still observe mesh explosions after a few frames, as shown in Figure 6 and in the supplementary video.
In concurrent work to us, MeshGraphNets [pfaff2020learning] were presented for physically-based simulation on a mesh, with an architecture similar to GNS [sanchez2020learning]. The Lagrangian cloth system presented in their paper is the most closely related approach to our work. Therefore, we followed the input formulation of their example, except that we used the reference mesh to represent the undeformed mesh space as the edge feature. In our implementation, we keep the originally proposed encoders and that embed the edge and node features, but exclude the global (world) feature encoder because it is not applicable to our problem setting. Similarly, we kept the MLPs and but removed the inside the graph block. We used 15 graph blocks in the model, as suggested by their paper. The network has 10 times more parameters than ours; 2,333,187 parameters compared to our 237,571 parameters. Training lasted for 11 days, whereas our network was trained in less than a day.
We report how MeshGraphNets perform on our test character motion sequences in Table 2. The overall average rollout RMSE of MeshGraphNets is worse than GNS [sanchez2020learning]. Nevertheless, we note that out of 15 motions, this approach achieved 5 stable rollout predictions without explosions, while GNS [sanchez2020learning] failed on all of them. Our method outperforms each of the compared methods with respect to the investigated metrics.
We proposed a Deep Emulator for enhancing skinning-based animations of 3D characters with vivid secondary motion. Our method is inspired by the underlying physical simulation. Specifically, we train a neural network that operates on a local patch of a volumetric simulation mesh of the character, and predicts the updated vertex positions from the current acceleration, velocity, and positions. Being a local method, our network generalizes across 3D character meshes of arbitrary topology.
While our method demonstrates plausible secondary dynamics for various 3D characters under complex motions, there are still certain limitations we would like to address in future work. Specifically, we demonstrated that our network trained on a dataset of a volumetric mesh of a sphere can generalize to 3D characters with varying topologies. However, if the local geometric detail of a character is significantly different to those seen during training, e.g., the ears of the mousey character containing many local neighborhood not present in the sphere training data, the quality of our output decreases. One potential avenue for addressing this is to add additional primitive types to training, beyond tetrahedralized spheres. A thorough study on the type of training primitives and motion sequences required to cover the underlying problem domain is an interesting future direction.
This research was sponsored in part by NSF (IIS-1911224), USC Annenberg Fellowship to Mianlun Zheng, Bosch Research and Adobe Research.
We sincerely request readers to refer to the link below for more visualization results: https://zhengmianlun.github.io/publications/deepEmulator.html.
a.1 Dataset Information
In this paper, we trained our network on a sphere dataset but tested it on five character meshes from the Adobe’s Mixamo dataset [mixamo]. Table A.1 provides detailed information about the five character meshes, including the vertex number and the edge length on the original surface mesh as well as the corresponding uniform volumetric mesh.
In Figure A.1, we show how we set constraints for each of the meshes, from a side view. The red vertices are constrained to move based on the skinned animation and drive the free vertices to deform with secondary motion.
a.2 Full Quantitative and Qualitative Results
In Tables A.2- A.16, we provide the quantitative results of our network tested on the five character meshes and 15 motions. The corresponding error plots are given in Figures A.2-A.16. We also provide the error plots for the compared methods. Across all the test cases, our method achieves the most stable rollout prediction with the lowest error.
In DeepEmulator.html, we provide animation sequences of our results as well as other comparison methods.
a.3 Further Analysis of Baseline Performance
As introduced in Section 4.2, we adopted the implicit backward Euler approach (Equation 3) as ground truth and the faster explicit central differences integration (Equation 2) as the baseline. Although the baseline method is 10 times faster than the implicit integrator with the same time step (1/24 second), it explodes after a few frames. In order to achieve stable simulation results, we found that it requires at least 100 sub-steps (). In Table A.17, we provide the per-frame running time of the explicit integration with 50 and 100 steps.
a.4 Choice of the Training Dataset
In Section 5, we mentioned a future direction of expanding the training dataset beyond primitive-based datasets such as spheres. Here, we analyze an alternative training dataset, namely the “Ortiz Dataset”, created by running our physically-based simulator on the volumetric mesh surrounding the Ortiz character (same mesh as in Table A.1), with motions acquired from Adobe’s Mixamo. In both datasets, we use the same number of frames. We report our results in Table A.18 to A.22.
Our experiments show that the network trained on the Sphere Dataset in most cases (75%) outperforms the Ortiz Dataset. We think there are two reasons for this. First, the local patches in the sphere are general and not specific to any geometry, making the learned neural network more general and therefore more suitable for characters other than Ortiz. Second, the motions in the Ortiz Dataset were created by human artists, and as such these motions follow certain human-selected artistic patterns. The motions in the Sphere Dataset, however, consist of random translations and rotations, which provides a denser sampling of motions in the possible motion space, and therefore improves the robustness of the network.
a.5 Analysis of the Local Patch Size
In the main paper, we show our network architecture for 1-ring local patches (Figure 3). Namely, in the main paper the MLP learns to predict the internal forces from the 1-ring neighbors around the center vertex. Here, we present an ablation study whereby the network learns based on 2-ring local patches, and 3-ring local patches, respectively. For 2-ring local patches, we add an additional MLP that receives the inputs from the 2-ring neighbors of the center vertex. The output latent vector is concatenated to the input of the MLP. Similar operation is adopted for the 3-ring local patch network by adding another MLP for the 3-ring internal forces.
For the training loss, the network achieves the RMSE of 0.00257, 0.00159 and 0.00146 for 1-ring, 2-ring and 3-ring local patches, respectively. In Table A.23, we provide the corresponding test results on the five characters. Overall, we didn’t see obvious improvements by increasing the local patch size. This could be because 2-ring and 3-ring local patches exhibits larger variability of structure, different to the sphere mesh, particularly for a center vertex close to the boundary. Therefore, we adopt 1-ring local patches in our paper.
|Ours w/o ref. motion||0.058||0.19||0.55||7.70|
|Ours w/o ref. motion||0.061||0.31||0.87||14.19|
|Ours w/o ref. motion||0.047||0.15||0.36||21.37|
|Ours w/o ref. motion||0.038||0.13||0.40||13.88|
|Ours w/o ref. motion||0.040||0.35||0.90||12.82|
|Ours w/o ref. motion||0.042||0.097||0.15||20.02|
|Ours w/o ref. motion||0.030||0.23||0.41||3.16|
|Ours w/o ref. motion||0.060||0.12||0.19||5.24|
|Ours w/o ref. motion||0.082||0.13||0.15||15.20|
|Ours w/o ref. motion||0.086||0.14||0.16||20.14|
|Ours w/o ref. motion||0.057||0.24||0.42||1.43|
|Ours w/o ref. motion||0.036||0.19||0.28||2.95|
|Ours w/o ref. motion||0.039||0.36||0.35||7.86|
|Ours w/o ref. motion||0.041||0.29||0.40||1.20|
|Ours w/o ref. motion||0.034||0.10||0.17||4.18|
|Test Dataset||Patch size||Single Frame||Rollout-24||Rollout-48||Rollout-All|