Game engines have long been supporting physics based simulation of rigid body collisions that allowed to build natural looking virtual worlds that, with a certain amount of imagination allowed the user to dive into the experience. Nowadays, the cloth and fluid simulation have become ubiquitous. The adoption of physics based character control is not that widespread.
Machine learning is experiencing an unprecedented growth these years. Neural networks, being universal funtion approximators have shown their ability to model complex scenarios in Online Sales, Finance, and Computer Vision. Building animator controller requires much manual effort. They rely on graphs that involve multiple nodes and non trivial logic. Thus, they are perfectly suited for being modeled with neural networks. There exist successful applications of Machine Learning techniques to animation.(Holden et al., 2017) uses neural networks to generate a natural looking kinematic controller with neural networks. The game developer community is interested in working with a robust controller that exhibits realistic behavior and interacts with a simulated environment. This is where the interests of the two communities, game developers and optimal control researchers, meet.
2 Related work
The policy based techniques took off when it was shown that PPO (Schulman et al., 2017) algorithm manages to train a physics based Mujoco (Todorov et al., 2012) humanoid character to run by only using the joint rotations and positions, without prior knowledge of inverse kinematics, physics, or humanoid model. However, the moves that the agent learned were unnatural. The humanoid model doesn’t precisely repeat every bone and joint of a real human, and thus the optimal control learned isn’t guaranteed to look anywhere near to realistic.
On the other hand, (Clavet, 2016) focuses on motion matching that focused on realistic behavior rather than the physics based interaction. Having a description of current character joint positions and velocities, and desired trajectory for the next several seconds obtained from the user input, they search the motion capture dataset for most similar looking motion and blend it with the current character motion.
The DeepMimic authors (Peng et al., 2018) bridged the gap between the two approaches and introduced the technique of simultaneous motion capture tracking and physics based simulation. The authors of (Bergamin et al., 2019) made the next step by introducing the user input into play, and use the PD controller as in (Chentanez et al., 2018).
2.1 Existing Implementations
The PPO algorithm, along with an array of other learning based algorithms are implemented in OpenAI Gym (Brockman et al., 2016). The DeepMimic (Peng et al., 2018) authors based their research on the Bullet physics engine (Coumans, 2015) and open sourced their project. To our knowledge, there is no open source implementation of (Bergamin et al., 2019). Our work is based on ML Agents framework (Juliani et al., 2018) and extends the solution (Booth and Booth, 2019) that is similar to OpenAI Gym. Our code, videos and getting started tutorials are hosted at https://github.com/Unity-Technologies/marathon-envs.
In this paper, we focus primarily on two tasks. First is mimicking animation. And second is following user input while maintaining similarity to an animation. Rather than using a motion capture dataset, we build upon standard Unity animation approach: an animation .fbx file is used as a source for an animator controller. We use a humanoid character for the analysis.
3.2 Character Controller with no User Input
For the task of animation mimicking without following user control input, we use the phase of animation in the observation state. For looping animations, e.g. the Walking or Running, the phase periodically goes from 0 to 1, while for non-looping animations, e.g. Backflip, or Kick animation, the phase goes from 0 to 1 once.
The joint rotations, positions, velocities and angular velocities are used as input as well. The features r of mass velocity and its position are added to the observation state as well. We also explore using the angular momentum for more complex tasks like Backflip.
For this task we use Unity’s ConfigurableJoints. The actions are used as target rotation for the joints. As a result, the model has observation space size of 258, and action space dimension of 21.
3.3 Character Controller with User Input
In this setup, we have an animated character that is controlled by user input. At the same time, we spawn another character, a physics based one, that aims to mimic the animated character. As in the previous task, we use the physics based character’s joint rotation, positions, angular velocities and velocities to describe the observation features. However, here we also add the difference between the animated character’s joint features, and the physics based character features. The phase input isn’t necessary for the user input task, since in this task we explicitly provide the animated character’s joint details to the model. We also add the information about the humanoid as a whole by providing the center of mass coordinates and velocity.
For this task, we keep only a fraction of body parts for tracking. Therefore, the observation state size is smaller, 115.
We also upgraded to the recently introduced ArticulationBody entity when working with joints. The actions are mapped to target positions of the joints. The ArticulatinoBody has an option of specifying stiffness and damping in addition to the target positions. We explore mapping actions to these parameters as well. The action dimension is at least 21 for the experiments.
4.1 Character Controller with no User Input
We aim at delivering a feasible behavior within 128 million learning steps. Which corresponds to approximately 24 hours’ training on an average desktop machine. Using GPU as a tensorflow backend only reduces the training speed in our experience. The reward terms weights are kept unchanged over the span of all animations we trained. We also use early stopping condition: when reward falls below a distinct for each animation threshold, we interrupt training. Setting correct threshold is crucial, for example, to prevent a humanoid that is supposed to be running from running on knees.
4.2 Angular Momentum in Observations
The authors of DeepMimic paper mentioned a peculiarity of backflip training. The agent would never learn to make a full mid air flip. It would make a jump and lift its leg only. We also observed this event when training. The DeepMimic paper treats this by initializing the agent at a sampled state from reference motion. However, we found that we can do without it. Our trick is to add angular momentum to observations. And use the difference between the angular momentum of reference motion and the simulated character’s angular momentum as a reward signal. When a motion is mostly based on balancing, which is the case for Walking and Running animations, the term isn’t necessary. For backflip, however, the rotational part of the motion cannot be neglected. Contrast the two learned behaviors with and without the angular momentum term shown on Figure 4.
4.3 Character Controller with User Input
To benchmark the performance of our implementation with the presence of user control, we set up training with random control input that picks a direction from a range of [-45, +45] degrees. The power of input is sampled from a range of [0, 1], where 0 corresponds to Idle animation and 1 to Running.
The work (Bergamin et al., 2019) uses a PD controller to filter the predicted actions of the neural network. The Unity’s ArticulationBody joints we use in our work provide a possibility of specifying a target of rotation, along with stiffness and damping. This is similar to the behavior of a PD controller. We benchmarked the performance of the agent when the damping and stiffness parameters are learnt. We initialize stiffness at 30, and damping at 100. If the neural network learns multiplier of the initial stiffness and damping, the performance gets worse: see Figure 5
. The reasoning is probably that the multiplier provides too steep changes to the agent.
On the other hand, When the agent learns an addition to the initial damping and stiffness, the reward gets better. See Figure 6.
4.4 Reference Motion teleport
In the previous experiments we reset the scene once the reward falls below a predefined threshold. The controller direction is sampled from a fixed range of values. In this experiment, we reset the scene only when the agent falls on ground. And the direction range isn’t capped. In case the reward drops lower than a predefined threshold, we teleport the reference motion to the center of mass coordinates of the simulated agent. As a result, the agent learns to make a 180 degree turn, even though the programmed animator controller we use for reference motion doesn’t include the animation for the turn. This is shown on Figure 7
In this paper we present an implementation of realistic Physics based character control. The implementation provides an easy to learn framework for a widely used game engine. Game developers, optimal control researchers and enthusiasts are welcome to introduce new animations, and train agents within a reasonable time frame.
6 Future work
For the future work a number of improvements can be made. This work presents benchmarks for a simple controller. A more sophisticated controller and new moves beyond Walk and Run can be introduced. For example, Backflip and Jump. The reference motions we work with are animations, and we employ animator controllers. An important addition would be using motion capture data that would enrich the behavior of the agent. This can be done using Kinematica package. Right now, even though the agent looks natural, and it is able to closely track the reference motion, they often do not match closely, see Figure 1. Training with a motion capture dataset, or with a non-trivial animator controller can lead to improvements.
- DReCon: data-driven responsive control of physics-based characters. ACM Trans. Graph. 38 (6), pp. 206:1–206:11. External Links: Cited by: §2.1, §2, §4.3.
- Marathon environments: multi-agent continuous control benchmarks in a modern video game engine. CoRR abs/1902.09097. External Links: Cited by: §2.1.
- OpenAI gym. CoRR abs/1606.01540. External Links: Cited by: §2.1.
Physics-based motion capture imitation with deep reinforcement learning. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games, MIG 2018, Limassol, Cyprus, November 08-10, 2018, P. Charalambous, Y. Chrysanthou, B. Jones, and J. Lee (Eds.), pp. 1:1–1:10. External Links: Cited by: §2.
- Bullet physics simulation. In Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH ’15, Los Angeles, CA, USA, August 9-13, 2015, Courses, pp. 7:1. External Links: Cited by: §2.1.
- Phase-functioned neural networks for character control. ACM Trans. Graph. 36 (4), pp. 42:1–42:13. External Links: Cited by: §1.
- Unity: A general platform for intelligent agents. CoRR abs/1809.02627. External Links: Cited by: §2.1.
- DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37 (4), pp. 143:1–143:14. External Links: Cited by: §2.1, §2.
- Proximal policy optimization algorithms. CoRR abs/1707.06347. External Links: Cited by: §2.
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, pp. 5026–5033. External Links: Cited by: §2.