Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots

10/08/2019 ∙ by Kyo Kutsuzawa, et al. ∙ Preferred Infrastructure 0

When robots work in a cluttered environment, the constraints for motions change frequently and the required action can change even for the same task. However, planning complex motions from direct calculation has the risk of resulting in poor performance local optima. In addition, machine learning approaches often require relearning for novel situations. In this paper, we propose a method of searching appropriate motions by using conditional Generative Adversarial Networks (cGANs), which can generate motions based on the conditions by mimicking training datasets. By training cGANs with various motions for a task, its latent space is fulfilled with the valid motions for the task. The appropriate motions can be found efficiently by searching the latent space of the trained cGANs instead of the motion space, while avoiding poor local optima. We demonstrate that the proposed method successfully works for an object-throwing task to given target positions in both numerical simulation and real-robot experiments. The proposed method resulted in three times higher accuracy with 2.5 times faster calculation time than searching the action space directly.



There are no comments yet.


page 1

page 4

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the near future, robots are expected to work in our daily lives, which are often cluttered with objects such as furniture [4]. They should execute various tasks, such as cooking meals, cleaning rooms, and carrying dishes, while avoiding obstacles.

Many motion planning methods have been proposed to obtain robot motions for such complex tasks. For example, Model Predictive Control (MPC) [10, 9, 32] is a common approach for the motion optimization. This method has been applied to self-driving vehicles[26], aerial vehicles[11], and humanoid robots[3]. For tasks in which the derivatives of the models are unknown, sampling-based methods are also studied [35, 36]. Such optimization methods, however, should search for solutions from all possible motions. Searching all possible motions carries a risk of bad initial conditions resulting in a poor performance local optimum [19, 7].

Machine learning approaches based on deep neural networks have also been studied widely

[29]. This approach can be used for complex tasks that are difficult to solve with analytical methods [22]. In addition, once a neural network is trained, the neural network can generate motions with lower computational costs than optimization from scratch [39]. Neural networks can be trained from various sources such as labeled images[23], human demonstrations[18, 8, 37], and other optimization results [39]

. Deep neural networks, however, require a large amount of training data. It is difficult to prepare a large training dataset in which situations and their optimal motions are associated. Moreover, generalization may not be possible for novel situations that did not appear in the training dataset. In addition to supervised learning, reinforcement learning has also been applied to learn the association between situations and actions

[28, 13, 38]. However, reinforcement learning is difficult to train and must often be retrained for situations not encountered during training. To achieve a high generalization performance for a variety of situations, many trials are necessary. Some studies [30, 31] aim to learn robust policies for condition changes, such as physical parameters, but they cannot adapt to totally new situations.

To summarize the above discussion, it is difficult to plan motions for complex tasks in various situations. Generic motion optimization methods have a risk of poor local optima, because they search solutions from all the motions including inappropriate conditions. On the other hand, learning-based methods require relearning for novel situations.

In this paper, we propose a motion planning method based on conditional Generative Adversarial Networks (cGANs)[24], which generates various motions from given conditions and latent variables by mimicking training datasets. cGANs do not require designing the characteristics of the latent space by hand, unlike other methods[5]. The proposed method aims to divide the motion planning into two phases: 1) training various motions to cGANs regardless of the relationship to their situations and 2) searching appropriate motions for given situations from the latent space of the trained cGANs that is associated only with valid motions. The proposed method searches for the best motion from only valid motions represented in the latent space, avoiding possible convergence on an invalid motion. In addition, relearning is not required, because cGANs can generate motions suitable for specific situations.

While cGANs can generate various kinds of motions, in this paper, we look at a task where a robot throws objects to target positions positions. Throwing enables object movement beyond the robots’ reachable spaces rapidly; in living spaces cluttered with objects, there may be places where robots cannot enter or reach. However, it is more difficult to plan throwing motions than in the normal object placement tasks. This is because the contact condition between the robot and the object changes discontinuously based on the motions, which makes the dynamics complex. In this task, there exist various possible throwing motions to reach the target and robots need to choose a motion according to the situation. We cannot determine object trajectories and robot motions uniquely even by giving target positions. Therefore, it is not enough to consider end-effector positions and parabolas of the object. Although many studies tackled the robotic throwing task in the aspects of model-based approach [34, 25, 33, 20] and learning-based approach [21, 6], they were unable deal with novel situations without replanning from scratch or relearning.

Ii Background

In this section, we introduce conditional Wasserstein GANs with Gradient Penalty (cWGANs-GP) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES).

Ii-a cGANs

cGANs are generative models that can generate various samples corresponding to condition inputs, by mimicking training datasets [24]. cGANs are extension of GANs, which are composed of a generator network and a discriminator network [12]. The generator network aims to generate samples imitating a given dataset, while the discriminator network aims to distinguish generated data and actual data in the dataset. The generator is implemented as a deep neural network that maps a condition input

and a random variable

sampled from a uniform distribution to data

. The discriminator is also implemented as a deep neural network which distinguishes the actual data from the dataset and the generated sample from the generator.

Wasserstein GANs (WGANs) [2]

are kinds of GANs that use Wasserstein distance to measure the difference of probability distributions between the training dataset and the generated data. WGANs are stable to learning the probability distributions thanks to the use of Wasserstein distance instead of Jensen-Shannon divergence, which is used in the generic GANs

[1]. In addition, WGANs can be improved by applying a penalty term called the gradient penalty during training [14]

. Such WGANs are called as WGANs-GP. Parameters of the generator and discriminator of WGANs-GP are optimized by the following loss functions:




Here, and are the generator and the discriminator, respectively. denotes expectation under the probability . is the training loss, is a sample in the training dataset, and indicates the uniform distribution.

is a hyperparameter for the gradient penalty.

cGANs and WGANs-GP can be combined. Such models are called cWGANs-GP. The simplest implementation of the cWGANs-GP is to input the conditions to both the generators and the discriminators of WGANs-GP. The loss function is expressed as follows:




Here, denotes the condition of the training sample .

Ii-B Cma-Es

CMA-ES is a stochastic optimization method for nonlinear, nonconvex functions [16, 17].

This method can be applied even when the objective functions are multimodal and ill-scaled. In addition, all hyperparameters have recommended values which are only depend on the number of dimension [15]. Therefore, CMA-ES is expected to work regardless of the geometry of the objective functions in the searching space. Moreover, because CMA-ES is a gradient-free method, it can be used for even nondifferentiable objective functions.

The search in CMA-ES progresses based on a Gaussian distribution. Their mean vector and covariance matrix are updated iteratively according to the evaluation values of the samples from the Gaussian distribution.

Iii Method

This section explains the proposed method by using the throwing task as an example. The proposed method handles the issue by dividing it into two phases: 1) letting a generative model learn various motions and 2) finding a motion optimized for a given situation from the trained generative model.

Iii-a Generative Model of Various Motions

To train various motions, we use cWGANs-GP. The data to be generated by the cWGANs-GP, , is the motion of the robots for the task. Motion primitives can be used to represent motions as explained in Section IV. The condition is the goal of the task. The design of the conditions depends on the task. For example, in the placing tasks, target positions to place objects can be candidates of the condition.

The generated samples of cGANs depend on the quality of training datasets. The proposed method requires diverse and valid motions. To obtain these motions for the training datasets, we generated random actions and used the actions for training if they satisfy the target task’s condition.

Iii-B Searching in Latent Space

After the cGANs is trained, valid motions can be found by searching the latent space. Thanks to the latent variables expressing only valid motions, it is not necessary to filtering out inappropriate motions for the task. We use CMA-ES to search the latent spaces due to the good properties explained in Section II-B. It should be noted that other searching methods can be used instead of CMA-ES.

The search process consists of sampling latent variables and evaluating with an arbitrary objective function according to the situation. The algorithm is as follows:

  1. Define an objective function.

  2. Sample latent variables from the latent space according to the parameters of CMA-ES.

  3. Generate motions and evaluate them with the objective function.

  4. Update parameters of CMA-ES based on the values of the objective function.

  5. Repeat the above steps until a suitable solution is found.

During searching latent variables, CMA-ES samples solution candidates from the whole latent space. However, should be sampled from the range of to . Therefore, we applied function to the solution candidates to limit the range of .

Iv Task Specification

This paper takes the task of robotic throwing as illustrated in Fig. 1.

Iv-a Physics Model

This paper considers throwing motions by a manipulator with three degrees of freedom (DoF) in planar physics.

The state variables of the robot are defined as follows:


The state variables follow the following state equation:




Here, denotes the sampling interval. and

are the identity matrix and the zero matrix, respectively.

The object is described as a mass point. Therefore, the motion can be described as follows:






The manipulator has a bowl-shaped end-effector at the tip of the arm. The contact models are expressed as follows:


Here, is the acceleration of the end-effector and is the gravity. is the normal vector to the end-effector. Thus, the object is constrained to the end-effector until .

Iv-B Representation of Motions

Although the manipulator can be controlled by specifying , it is difficult for neural networks to handle the time series of the state variables directly [27]. To reduce the burden of the neural networks, we employ the idea of motion primitives, which represent complex motions as the combinations of simple primitive motions.

Motions of the manipulator are expressed as linear combinations of motion primitives. The -th joint angle at time , , is expressed as follows:


Here, is the initial joint angle and is a weight for the primitives. is an S-curve defined as follows:




Here, is the length of the motions and is the number of primitives. Fig. 2 shows the primitives with and , which are used in the experiments.

Fig. 1: Overview of the task considered in this paper.
Fig. 2: Motion primitives.

V Results of Training cGANs

This section explains the details of the dataset, model implementation, and results of the training.

V-a Dataset

A large amount of training data is necessary to train cGANs. Because it is not easy to obtain large datasets with actual robots, we used a simulator based on the physics model explained in Section IV-A to generate throwing motions.

The dataset was obtained in a self-supervised manner with the following procedure. At first, a random initial pose and a random action are generated. Here, both and were generated from the uniform distribution. was sampled from the ranges in Table I, while each component of was sampled from . Then, the action is simulated to obtain the flying distance, . After that, , , and are added to the dataset if they are valid. The validity is defined as follows:

  • The flying distance is in the range 0.8–1.4 m. Because the reachable distance of the robot is 0.78 m, this range can be reached only by the throwing motions.

  • The contact between the end-effector and the object is maintained at .

  • The motion does not exceed the limitation in Table I.

Finally, we reduced the obtained data to level the occurrence of the target positions evenly. In total, we obtained 282500 throwing motions. Examples of the training data are shown in Fig. 3.

V-B Implementation

We used a cWGAN-GP with the architecture illustrated in Fig. 4. The generator receives a latent variable taken from a uniform distribution and a goal distance . Then, it outputs the initial pose and the weights of the motion primitives . The discriminator receives the initial pose , the action , and the goal distance . Then, it outputs a scalar to distinguish the dataset and the generated values. The hyperparameters for training are detailed in Table II.

V-C Results

The training losses of the generator and discriminator are shown in Fig. 5. We generated throwing motions with the trained cWGAN-GP by sampling the latent variable and the target position from the uniform distribution. Snapshots of throwing motions generated by the trained cWGAN-GP are shown in Fig. 6. Various throwing motions were observed. In addition, the object was thrown to the target position in most cases. In some cases, however, the robot dropped the object resulted in large error as shown in Fig. 6LABEL:sub@fig:failure_case. This is considered to be caused by the errors of the acceleration, which result in releasing the object at an undesired timing.

The relationship between the target position and the flying distance by the generated throwing motions is shown in Fig. 7. We evaluated 1000 motions with the target positions 0.8–1.4 m. The generated motions were able to throw the objects near the target positions. The average error between the target positions and the flying distance was 9 % (9 cm).

(a) target position: 84.8 cm
(b) target position: 128.8 cm
(c) target position: 99.2 cm
Fig. 3: Snapshots of throwing motions in the training data. The yellow circles indicate the objects. The red circles indicate the landing positions.
joint 0 joint 1 joint 2
Min angle [deg]
Max angle [deg]
Min velocity [deg/s]
Max velocity [deg/s]
TABLE I: Limitations of the Joints
(a) Generator
(b) Discriminator
Fig. 4: Architecture of cWGAN-GP.
Item Value
# training data

# epochs

TABLE II: Hyperparameters for Training
Fig. 5: Training losses.
(a) Success cases
(b) Failure case
Fig. 6: Snapshots of generated motions. The trained cWGAN-GP generated various motions. In addition, the robot succeeded in throwing near the target positions in most cases. In some cases, the robot dropped the object and failed in the task.
Fig. 7: Relationship between the target positions and the flying distance. Although small errors remain, the trained cWGAN-GP generated motions mostly land near the target positions.

Vi Simulation Results

Here the results of the proposed method are described.

Vi-a Setup

We consider cases that the robot should throw objects within the given workspace. Such constraints for the situations are implemented as objective functions of CMA-ES. Here, two examples are verified.

The first objective function is designed as follows:


where indicates the positions of joints and the end-effector. and are the penalty for joint angles and joint angular velocity, respectively. These are hinge functions whose values increase as the states exceed the limitation. Here, the limit of the joint angle was the same as Table I, while the limit of the angular velocity was narrowed to rad, rad, and rad for joint 0, 1, and 2, respectively. That is to avoid a large control deviation when the actual robot reproduces the throwing motions. is also a hinge function whose value increases when the robot came below 0.2 m. This penalty aims to avoid the robot hitting the floor. is a limitation of the range of the motions. It increases when the robot exceeded 0.1 m behind.

The second objective function is designed as follows:


Here, is a limitation of the range of the motions. The value increases when the robot exceeded 0.5 m in front.

It should be noted that the objective functions do not require any penalty terms related to the target reaching. This is because the flying distance can be specified as a condition to the trained cWGAN-GP.

The number of samples that CMA-ES samples at each iteration was set to 64. The initial parameters of the searching distribution were set to a mean of 0.0 and the standard deviation of 0.4. The search was continued until the value of the objective function is reached zero.

Vi-B Results

The obtained motion by is shown in Fig. 8LABEL:sub@fig:snapshot_task_1. Robots did not hit to the wall during the motion.

The obtained motion by is shown in Fig. 8LABEL:sub@fig:snapshot_task_2. A different motion was obtained compared with Fig. (a)a. Robots did not hit to the obstacle during the motion.

In both cases, the values of the objective functions were zero.

Vi-C Comparison with direct search in the action space

To verify the effectiveness of searching the latent space, we compare it with searching the action space.

Here, the action space consists of and , which are the same as the output of the cWGAN-GP. Its dimension is 33: the initial pose and the weights of the motions .

For comparison, we used the following objective function:


Here, is the landing position of the object. is set to 1.0 m in this case. This is similar to in (21), while a term for flying distance is added.

At first, we searched in the action space directly. As a result, we could not reach the value for the objective function of zero. The value converged to 0.28350 in the 64th update of CMA-ES in 796 s. The landing position of the object was 80 cm, while the target position was 100 cm. Therefore, the error was 20 %. Snapshots of the obtained motion are shown in Fig. 9LABEL:sub@fig:snapshot_action_space_search. The motion seems to just extend the arm to the limit and drop the object. Such motion cannot carry the object beyond the reachable space. We conducted the same evaluation five times and each resulted in the same kind of failure.

On the other hand, the proposed method found the solution that makes the value of the objective function zero in the 9th update in 295 s. The landing distance of the object was 94 cm, that is, the error was 6 %. Therefore, the proposed method resulted in over three times higher accuracy in about 40 % of the calculation time from searching the action space directly. The snapshots are shown in Fig. 9LABEL:sub@fig:snapshot_latent_space_search. We conducted the same evaluation five times, with successful throwing motions in all trials.

(a) the throwing motion obtained by .
(b) the throwing motion obtained by .
Fig. 8: Snapshots of the throwing motion obtained by searching the latent space. The orange line indicates the boundary of the motion range.
(a) searching the action space (landing position: 80 cm)
(b) searching the latent space (landing position: 94 cm)
Fig. 9: Snapshots of the comparison of the searching space.

Vii Real Robot Experiments

To verify that the proposed method also works in actual robots, we conducted the real robot experiments.

Vii-a Setup

We used a seven degrees of freedom robot arm, ToroboArm, supplied by Tokyo Robotics. Its overview is shown in Fig. 10LABEL:sub@fig:toroboarm. Although it has seven joints, we used only three joints for throwing motions. Each joint is controlled based on control commands of the angle, angular velocity, and angular acceleration.

The robot was equipped with an end-effector as shown in Fig. 10LABEL:sub@fig:end-effector.

Vii-B Results

At first, we evaluated throwing motions generated by the trained cWGAN-GP by the actual robot. The results are described in Table III. The snapshots are shown in Fig. 11. In most cases, the landing distance was almost the same as the simulation. The largest error was about 7 %. The cause for the errors is believed to be modeling errors of the end-effector and control deviations from the generated motions.

Next, we evaluated motions subject to movement restrictions obtained in Section VI. The results are described in Table IV. Snapshots are shown in Fig. 12. The deviations to the simulation were larger than the above results. We believe that the modeling errors of the contact model of the end-effector appeared due to the pose for object avoidance.

(a) Manipulator
(b) End-effector
Fig. 10: Experimental setup.
No. Target Simulation Actual robot
1 cm cm cm
2 cm cm cm
3 cm cm cm
4 cm cm cm
5 cm cm cm
6 cm cm cm
TABLE III: Experimental results of throwing motions
(a) Trajectory 2 (actual flying distance: 101.8 cm)
(b) Trajectory 6 (actual flying distance: 117.4 cm)
Fig. 11: Snapshots of throwing motions by the actual robot.
No. Target Simulation Actual robot
7 cm cm cm
8 cm cm cm
TABLE IV: Experimental results of throwing motions obtained by searching the latent space
(a) Trajectory 7, obtained by (actual flying distance: 115.0 cm)
(b) Trajectory 8, obtained by (actual flying distance: 104.1 cm)
Fig. 12: Snapshots that the actual robot executes the throwing motion obtained by searching the latent space.

Viii Conclusion and Future Work

For robots to work in our daily lives, they will need to adjust their motions depending on surrounding objects even when performing the same task. In this paper, we proposed a method based on cGANs to tackle this issue. By searching latent spaces of cGANs that learned various motions, appropriate motions for novel situations can be obtained. We use robotic throwing as an example. We showed that the trained cWGAN-GP can generate various throwing motions. In addition, we verified that the proposed method can find appropriate throwing motions to different situations by simulation and real-robot experiments. The appropriate throwing motions could be found without considering the flying distance with objective functions thanks to specifying the condition to the cWGAN-GP. We also observed that the proposed method could avoid poor local optima (i.e., motions not satisfying the objective) by searching the latent space which represents only valid motions. As the results, the proposed method resulted in higher accuracy with less calculation time than searching the action space directly.

In this paper, we used a two-dimensional simulator to obtain a large amount of training data. To apply the proposed method to other tasks, there are some future works remained. In tasks which are difficult to simulate such as picking and walking, we should use actual data. Also, if three-dimensional motion planning is necessary, the motions will become higher degrees of freedom, which cause sampling efficiency lower. Next steps would be to look for methods to reduce the amount of training data needed to use the data obtained with the actual environment.


The authors would like to thank Crissman Loomis and Kohei Hayashi for useful discussions and advice.


  • [1] M. Arjovsky and L. Bottou (2017-01) Towards Principled Methods for Training Generative Adversarial Networks. Cited by: §II-A.
  • [2] M. Arjovsky, S. Chintala, and L. Bottou (2017-01) Wasserstein GAN. Cited by: §II-A.
  • [3] H. Audren, J. Vaillant, A. Kheddar, A. Escande, K. Kaneko, and E. Yoshida (2014) Model preview control in multi-contact motion-application to a humanoid robot. In Proc. IEEE Int. Conf. Intell. Robot. Syst., pp. 4030–4035. Cited by: §I.
  • [4] G. Bugmann and S. N. Copleston (2011) What can a personal robot do for you?. In Proc. Towar. Auton. Robot. Syst., pp. 360–371. Cited by: §I.
  • [5] A. Cully, J. Clune, D. Tarapore, and J. Mouret (2015) Robots that can adapt like animals. Nature 521 (7553), pp. 503. Cited by: §I.
  • [6] B. C. Da Silva, G. Baldassarre, G. Konidaris, and A. Barto (2014) Learning parameterized motor skills on a humanoid robot. In Proc. IEEE Int. Conf. Robot. Autom., Cited by: §I.
  • [7] J. S. Dæhlen, G. O. Eikrem, and T. A. Johansen (2014) Nonlinear model predictive control using trust-region derivative-free optimization. Journal of Process Control 24 (7), pp. 1106–1120. Cited by: §I.
  • [8] Y. Duan, M. Andrychowicz, B. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba (2017)

    One-shot imitation learning

    In Proc. Int. Conf. Neural Inf. Process. Syst., Cited by: §I.
  • [9] R. Findeisen and F. Allgöwer (2002) An introduction to nonlinear model predictive control. In 21st Benelux Meeting on Systems and Control, Vol. 11, pp. 119–141. Cited by: §I.
  • [10] C. E. Garcia, D. M. Prett, and M. Morari (1989) Model predictive control: theory and practice—a survey. Automatica 25 (3), pp. 335–348. Cited by: §I.
  • [11] G. Garimella and M. Kobilarov (2015) Towards model-predictive control for aerial pick-and-place. In Proc. IEEE Int. Conf. Robot. Autom., pp. 4692–4697. Cited by: §I.
  • [12] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative Adversarial Nets. In Proc. Neural Inf. Process. Syst., pp. 2672–2680. Cited by: §II-A.
  • [13] S. Gu, E. Holly, T. Lillicrap, and S. Levine (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proc. IEEE Int. Conf. Robot. Autom., pp. 3389–3396. Cited by: §I.
  • [14] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville (2017-03) Improved Training of Wasserstein GANs. In Proc. Neural Inf. Process. Syst., pp. 5767–5777. Cited by: §II-A.
  • [15] N. Hansen and A. Auger (2014) Principled design of continuous stochastic search: from theory to practice. In Theory and Principled Methods for the Design of Metaheuristics, Y. Borenstein and A. Moraglio (Eds.), pp. 145–180. Cited by: §II-B.
  • [16] N. Hansen and A. Ostermeier (2001) Completely Derandomized Self-Adaptation in Evolution Strategies. Evol. Comput. 9 (2), pp. 159–195. Cited by: §II-B.
  • [17] N. Hansen (2016) The CMA Evolution Strategy: A Tutorial. arXiv preprint: arXiv 1604.00772. Cited by: §II-B.
  • [18] J. Ho and S. Ermon (2016) Generative Adversarial Imitation Learning. In Proc. Neural Inf. Process. Syst., pp. 4565–4573. Cited by: §I.
  • [19] A. Kelman, Y. Ma, and F. Borrelli (2011) Analysis of local optima in predictive control for energy efficient buildings. In Proc. 2011 50th IEEE Conf. Decis. Control Eur. Control Conf., pp. 5125–5130. Cited by: §I.
  • [20] S. Kim and S. Doncieux (2017) Learning Highly Diverse Robot Throwing Movements through Quality Diversity Search. In Proc. Genet. Evol. Comput. Conf. 2017, pp. 1177–1178. Cited by: §I.
  • [21] J. Kober, A. Wilhelm, E. Oztop, and J. Peters (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Springer Tracts Adv. Robot.. Cited by: §I.
  • [22] I. Lenz, R. Knepper, and A. Saxena (2015) DeepMPC: Learning Deep Latent Features for Model Predictive Control. Robot. Sci. Syst. XI. Cited by: §I.
  • [23] I. Lenz, H. Lee, and A. Saxena (2015) Deep Learning for Detecting Robotic Grasps. Int. J. Rob. Res. 34 (4-5), pp. 705–724. Cited by: §I.
  • [24] M. Mirza and S. Osindero (2014) Conditional Generative Adversarial Nets. arXiv Prepr. arXiv1411.1784. Cited by: §I, §II-A.
  • [25] H. Miyashita, T. Yamawaki, and M. Yashima (2009) Control for Throwing Manipulation by One Joint Robot. In Proc. IEEE Int. Conf. Robot. Autom., pp. 1273–1278. Cited by: §I.
  • [26] B. Paden, M. Cap, S. Z. Yong, D. Yershov, and E. Frazzoli (2016) A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles. IEEE Transactions on Intelligent Vehicles 1 (1), pp. 33–55. Cited by: §I.
  • [27] R. Pascanu, T. Mikolov, and Y. Bengio (2013)

    On the difficulty of training recurrent neural networks

    In Proc. 30th Int. Conf. Int. Conf. Mach. Learn., pp. 1310–1318. Cited by: §IV-B.
  • [28] X. B. Peng, G. Berseth, K. Yin, and M. van de Panne (2017) DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning. ACM Trans. Graph. 36 (4), pp. 41:1–41:13. Cited by: §I.
  • [29] H. A. Pierson and M. S. Gashler (2017) Deep Learning in Robotics: A Review of Recent Research. Adv. Robot. 31 (16), pp. 821–835. Cited by: §I.
  • [30] L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta (2017) Robust adversarial reinforcement learning. arXiv preprint arXiv:1703.02702. Cited by: §I.
  • [31] A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine (2016) Epopt: learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283. Cited by: §I.
  • [32] A. V. Rao (2014) Trajectory optimization: a survey. In Optimization and optimal control in automotive systems, pp. 3–21. Cited by: §I.
  • [33] A. Sintov and A. Shapiro (2015) A Stochastic Dynamic Motion Planning Algorithm for Object-Throwing. In Proc. IEEE Int. Conf. Robot. Autom., pp. 2475–2480. Cited by: §I.
  • [34] T. Tabata and Y. Aiyama (2002) Tossing Manipulation by 1 Degree of Freedom Manipulator.. J. Robot. Soc. Japan 20 (8), pp. 876–882. Cited by: §I.
  • [35] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou (2016) Aggressive driving with model predictive path integral control. In Proc. IEEE Int. Conf. Robot. Autom., pp. 1433–1440. Cited by: §I.
  • [36] G. Williams, B. Goldfain, P. Drews, K. Saigol, J. Rehg, and E. Theodorou (2018) Robust sampling based model predictive control with sparse objective information. Cited by: §I.
  • [37] P. Yang, K. Sasaki, K. Suzuki, K. Kase, S. Sugano, and T. Ogata (2017) Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning. IEEE Robot. Autom. Lett. 2 (2), pp. 397–403. Cited by: §I.
  • [38] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser (2018) Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning. Cited by: §I.
  • [39] T. Zhang, G. Kahn, S. Levine, and P. Abbeel (2016) Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In Proc. IEEE Int. Conf. Robot. Autom., pp. 528–535. Cited by: §I.