1 Introduction
Motion planning for robot with high DegreeofFreedoms (DoFs) presents many challenges, especially in the presence of constraints such as obstacle avoidance, joint limits, etc. To handle the highdimensionality and the various constraints, many works focus on
trajectory optimization methods that attempt to find a locally optimal solution. In this approach, the motion planning problem is formulated as an optimization problem(1) 
where denotes the robot’s configurations from time step to ; , and are the cost, the inequality and the equality constraints. The solution of (1) is the path , with the dimension of . When the path is parameterized by time, it is called trajectory.
As an example, consider the planning problem depicted in Fig. 3, where the PR2 robot has to move its base around an object or to perform a dualarm motion to pick items from the shelves. If the task is to move from an initial configuration to a goal configuration while minimizing the total joint velocity, the optimization problem can be written as
(2) 
Other constraints can also be added, e.g. to avoid collisions, to comply with joint limits, etc.
Such optimization problems are in general nonconvex, especially due to the collision constraints, which makes finding the global optimum very difficult. Trajectory optimization methods such as TrajOpt [1], AICO [2], CHOMP [3], or STOMP [4] solve the nonconvex problem by iteratively optimizing around the current solution. While such approach is very popular and yields good practical results, the convergence and the quality of the solution are very sensitive to the choice of the initial guess. If it is far from the optimal solution, the method can get stuck at a poor local optimum.
To overcome this problem, our approach builds a memory of motion that learns how to provide a good initialization (i.e., a warmstart) to the solver based on previously solved problems. Functionally, the memory of motion is expected to learn the mapping that maps each task to the robot path in the database. Such mapping can be highly nonlinear and multimodal (i.e., one task can be associated to several robot paths ), and the dimension of
is typically very high. Our proposed method relies on machine learning techniques such as function approximation and dimensionality reduction to learn this mapping effectively. Note that we use the term
memory of motion to denote not only the database of motions but also the algorithms used to predict the warmstart.The contribution of this paper is the following. First, we propose the use of function approximation methods to learn the mapping . We consider three methods: Nearest Neighbor (NN), Gaussian Process Regressor (GPR) and Bayesian Gaussian Mixture Regression (BGMR), and discuss the different characteristics on various planning problems. We show in particular that BGMR handles multimodal output very well. Furthermore, we demonstrate that using an ensemble of function approximators to provide the warmstart outperforms any individual function approximator. Finally, we show that the resulting memory of motion can be used as a metric to choose optimally from several possible goals.
The paper is organized as follows. In Section 2 we discuss the related work that use the concept of memory of motion for various problems. Section 3 explains the methods for constructing and using the memory of motion. The experimental results are presented and discussed in Section 4. Finally, Section 5 concludes the paper.
2 Related Work
The idea of using a memory of motion to warmstart an optimization solver has previously been explored in the context of optimal control and motion planning. In [5] a trajectory library is constructed to learn a control policy. A set of trajectories are planned offline using algorithm and stored as library, then NearestNeighbor is used online to determine the action to perform at each state. In [6], they use similar approach to predict an initial guess for balancing a twolink robot, which is then optimized by Differential Dynamic Programming [7]. An iterative method to build a memory of motion to initialize an optimal control solver is proposed in [8]
. They use neural networks to approximate the mapping from the task descriptors (initial and goal states) to the state and control trajectories. Another neural network is trained to approximate the value function, which is then used as a metric to determine how close two states are dynamically. In
[9] Gaussian Process Regression (GPR) is used to predict new trajectories based on the library of demonstrated movements encoded as Dynamic Movement Primitives (DMP) [10]. GPR is used to map the task descriptors to the DMP parameters.In robot motion planning, Probabilistic Roadmap (PRM) [11] can be seen as the construction of memory of motion by precomputing the graph connecting robot configurations. Some works exploit Rapidlyexploring Random Trees (RRT) [12], another popular samplingbased method. For example, in [13] an offline computation is used to speedup the online planning in the form of an additional bias in sampling the configurations. The Lightning framework is proposed in [14] to plan paths in highdimensional spaces by learning from experience. The path library is constructed incrementally. Given the current path library and a task to be executed, the algorithm runs two versions of the planner online, one that plans from scratch and the other one initialized by the library. In [15] a highdimensional (791) task descriptor is constructed, which consists of the initial and target robot configurations and the pairwise distance between the obstacles and the robot body parts. Given a current task descriptor, they use Nearest Neighbor and Locally Weighted Regression to retrieve the initial trajectory, which is then optimized by a trajectory optimizer. The metric between the task descriptors is refined to minimize the necessary refinement of the initial trajectory using norm, such that they obtain a sparse metric and hence sparse descriptors. In [16], a subindexing is used to reduce the amount of memory storage and to use the subtrajectories of the original solutions. In addition, for a given task descriptor that is associated to a trajectory solution, the algorithm samples new task descriptors from the region near the current descriptor, and runs the trajectory optimization using the current trajectory solution as the initial guess. If the optimization converges to the good solution, the new task descriptor is mapped to the current trajectory solution.
As compared to the above works, our proposed method has the following differences: (i) none of the above methods attempt to handle the multimodal output cases. We show that BGMR can handle well such cases, (ii) we show that using an ensemble of methods to provide the warmstart outperforms the individual methods, and (iii) we show that using the memory to select between different goals can improve the solver performance significantly.
3 Method
Section 3.1 discusses the main idea of building the memory of motion using function approximation and dimensionality reduction techniques to learn the mapping between the task and the associated robot path. Section 3.2 then explains how the performance of the memory motion can be improved by using an ensemble method. Finally, Section 3.3 explains how the memory of motion can be used as a metric to choose between different goals.
3.1 Function Approximations and Dimensionality Reduction Techniques
To learn the mapping , we firstly generate a large number of tasks and the corresponding robot paths . This is done by sampling from the space of possible tasks and run the trajectory optimizer to obtain the robot paths until we obtain samples (,). Let and . The mapping can then be learned by training function approximators using the database . In this paper we consider three function approximators: NN, GPR, and BGMR.
3.1.1 Nearest Neighbor (Nn)
NN is a very simple nonparametric method. Given a task , the algorithm finds samples in the database where are the nearest to according to a chosen metric (in this paper the Euclidean metric is used). It then predicts the corresponding robot path by taking the average
. The method is very simple to implement and it works well if there is a sufficiently dense dataset, but it suffers from the curse of dimensionality; as the dimension of
increases, the number of data that needs to be stored increases exponentially. This method is mainly considered as the baseline against the next two methods.3.1.2 Gaussian Process Regressor (GPR)
Like NN, GPR [17] is a nonparametric method which improves its accuracy as the number of data increases. While having higher computational complexity as compared to
NN, GPR tends to better interpolate, resulting in higher approximation accuracy. Given the database
, GPR assigns a Gaussian prior to the joint probability of
, i.e., . is the mean function and is the covariance matrix constructed with elements , where is the kernel function that measures the similarity between the inputs and. In this paper we use Radial Basis Function (RBF) as the kernel function, and the mean function
is set to be zero as usually done in GPR.To predict the output given a new input
, GPR constructs the joint probability distribution of the training data and the prediction, and then conditions on the training data to obtain the predictive distribution of the output,
, where is the posterior mean computed as(3) 
and is the posterior covariance which provides a measure of uncertainty on the output. In this work we simply use the posterior mean as the output, i.e., .
While having good approximation accuracy, one major limitation with GPR is that it does not scale well with very large datasets. There are variants of GPR that attempt to overcome this problem, e.g., sparse GPR [18] or using Stochastic Variational Inference (SVI) [19]. More details on GPR can be found in [17] and [20].
3.1.3 Bayesian Gaussian Mixture Regression (BGMR)
When using RBF as the covariance function, GPR assumes that the mapping from to is smooth and continuous. When this assumption is met it performs very well, but otherwise it will yield poor results. For example, when there is discontinuity in the mapping or there are multimodal outputs, GPR tends to average the solutions from both sides of the discontinuity or from both modes. This characteristic is also shared by many other function approximators. To handle discontinuity and multimodality problems, using local models is one of the possible solutions. Each local model can be fit to each side of the discontinuity or to each mode.
Gaussian Mixture Regression (GMR) is an example of such local models approaches [21]
. It can be seen as a probabilistic mixture of linear regressions. Given the database
it can be used to construct the joint probability of as a mixture of Gaussians(4) 
where , , and are the th component’s mixing coefficient, mean, and covariance, respectively. Given a query , the conditional probability of the output is also a mixture of Gaussians.
In GMR, the parameters , and
are determined from the data by ExpectationMaximization method, while the number of Gaussians
is usually determined by the user. Bayesian GMR (BGMR) [22]is a Bayesian extension of GMR that allows us to estimate the posterior distribution of the mixture parameters (instead of relying on a single point estimate as in GMR). The number of components
can also be automatically determined from the data. As a Bayesian model, BGMR gives priors to the parameters , and , and computes the posterior distribution of those parameters given the data. The prediction , given the input , is then computed by marginalizing over the posterior distribution and conditioning on . The resulting predictive distribution of is a mixture of tdistributions,(5) 
where is the probability of belonging to the th component of the mixture, and is a multivariate tdistribution, the mean of which is linear in . We can interpret (5) as probabilistic linear regression models, each of which has the probability of .
To obtain a pointprediction from (5), there are several approaches. One of the most used is to take the mean of the predictive distribution in (5
) using moment matching. While this approach can provide smooth estimates (as required in many applications), the same problems as in GPR will appear in the case of discontinuity and multimodality; taking average in those cases will give us poor results. Instead, we propose to take, as the point prediction, the mean
^{1}^{1}1As in Gaussian distribution, the mean of a multivariate tdistribution is also its mode.
of the component in (5) having the highest probability, which approximately corresponds to the mode of the multimodal distribution. Alternatively, we can also use the mean of each tdistributions as separate predictions, which gives us several possible solutions. In some cases (e.g., when we would like to retrieve all possible solutions) this approach can be very useful, as will be presented in Section 4.1.3.1.4 Dimensionality reduction
In our problem, the path
is a vector consisting of the sequence of configurations with dimension
during time steps, which can be very high. This motivates us to use dimensionality reduction techniques to reduce the dimension of . For example, whenis large and the time interval is small, RBF can be used to represent the evolution of each degree of freedom as weights of the basis functions. Techniques such as Principal Component Analysis (PCA), Independent Component Analysis, Factor Analysis, and Variational Autoencoder
[20] [23] can also be used. The mapping to be learned then becomes the mapping from to , where is the projection of to the lower dimensional subspace. The advantage is that the memory required to store the data is reduced significantly, while the approximation performance is maintained or even improved because the important correlations between the variables are preserved. In this work, since the number of time steps is not large, we choose to use PCA to reduce the dimension of .3.2 Using Ensemble Method to Provide WarmStart
In machine learning, methods such as AdaBoost [24]
and Random Forests
[25] have shown that using an ensemble of methods often yields improved performances as compared to choosing a single method. We propose to use a simple ensemble method where we run multiple trajectory optimizations in parallel, each one warmstarted by one of the function approximators in Section 3.1, and once one of them finds a successful path the others are terminated. Since each function approximator has different learning characteristics, combining them in this way can significantly improve the motion planning performance.3.3 Using the Memory as a Metric
In some planning problems, there can be several alternative goals to be achieved. For example, in robot drilling task [26], the orientation around the drilling axis is free (the number of possible goals is infinite). A simple way is to choose one of the goals randomly, plan the motion, and if it fails then select another goal. While this is simple to implement, it does not make use of the benefit of having multiple goals. Another method is to plan the paths to each goal and select the one having the smallest cost, but this is computationally expensive. It will be useful, then, to have a metric that measures the cost to a given goal. Our proposed idea is to use the memory of motion as the metric.
In Section 3.1, the memory of motion learns how to predict an initial guess to achieve a task . The possible goals can then be formulated as multiple tasks . For each task , the memory of motion predicts the initial guess corresponding to the task, and the cost can be computed. The initial guess and the corresponding task with the lowest cost is then taken as the chosen goal to be given to the trajectory optimizer. If the computation of the cost can be done quickly relative to the time required for solving the optimization, such as the total discrete velocity in (2), this approach can yield significant improvements to the trajectory optimizer performance.
4 Experiments
To evaluate the proposed method, we use several examples of motion planning for PR2 robot (see Fig. 3). TrajOpt [1] is used as the trajectory optimizer to be warmstarted. The output is the robot path that accomplishes the given task. In this paper we only work with robot path as the output, but the method can be applied to robot trajectory as well.
We consider four motion planning cases presented in an ascending order of complexity. Each case is chosen to demonstrate certain characteristics of the proposed method. For each case, we follow the following procedures. First we generate the dataset by sampling tasks and run TrajOpt to find the paths achieving the tasks. The number of time steps is set to . In all cases, the cost is defined as the discrete velocity of the states, as defined in (2). The number of
is different for each case, depending on the complexity of the task. The function approximators are then trained with or without PCA using the dataset. We heuristically set
components for the PCA; for the NN, we use .To validate the performance, we sample random tasks and use the various methods to warmstart TrajOpt. The solutions are compared in terms of convergence time, success rate and cost. The planning is considered successful if the solution is feasible. In the presented results, we use the label ‘STD’ to refer to the solution obtained by warmstarting the solver with a straightline path (via waypoint, if any), and the names of the function approximators for the rest. The subscript ‘PCA’ is added when the PCA is used. The query time for predicting the warmstarts by each method is negligible w.r.t. the convergence time (less than 50 ms), so they are not included in the comparison.
Method  Success  Conv.  Cost 

(%)  time (s)  (rad/s)  
STD  81.0  0.52  1.457 
NN  97.0  0.32  1.480 
GPR  95.0  0.32  1.388 
BGMR  97.0  0.31  1.412 
Method  Success  Conv.  Cost 

(%)  time (s)  (rad/s)  
STD  84.0  0.45  1.542 
NN  96.0  0.35  1.666 
GPR  00.0     
BGMR  96.0  0.27  1.477 
4.1 Base motion planning
The task is to plan the motion for the PR2 mobile base from a random pose in front of the kitchen to another random pose behind the kitchen (Fig. (a)a). In this case, the state is the 3 DoF planar pose of the base. The task parameter is then . The database is constructed with samples and the evaluation is performed with . Although this is an easy problem, TrajOpt actually finds it difficult to solve without a proper initialization. For example, initializing TrajOpt with a straightline interpolation from to never manages to find a feasible solution because it results in a path that moves the robot through the kitchen while colliding, and the solver get stuck in poor local optima due to the conflicting gradients. To obtain better initialization for building the database, we initialize TrajOpt with two manually chosen waypoints on the left and on the right of the kitchen ( and , respectively).
We consider two cases of building the database: in the first one, we only use as waypoint, while in the second we use both and . We initialize TrajOpt with the straightline motion from to the waypoint and from the waypoint to . With this setting we build the database, train the function approximators, and obtain the results shown in Table 2 and 2.
In the first case, the mapping from to is unimodal because all movements go through the right. Table 2 shows that the performance of NN, GPR and BGMR are quite similar. In the second case, however, the output is multimodal because the database contains two possible ways (modes) to accomplish the same task. This affects GPR significantly (see Table 2), as GPR takes the average of both modes and outputs a path that goes through the kitchen, while NN and BGMR are not affected. NN does not average the modes because we use , while BGMR overcomes the multimodality by constructing local models for each mode automatically.
Fig. 8 shows the examples of warmstarts produced by each method in the second case. As expected, GPR provides a warmstart that goes through the kitchen (hence 0 success rate). With BGMR, if we retrieve the components with the two highest probability, both possible solutions are obtained.
Method  Success  Conv.  Cost 

(%)  time (s)  (rad/s)  
STD  80.0  0.76  1.831 
NN  91.2  0.57  1.928 
GPR  92.0  0.62  1.840 
GPR  94.0  0.62  1.826 
BGMR  90.8  0.62  1.854 
BGMR  90.8  0.63  1.841 
Method  Success  Conv.  Cost 

(%)  time (s)  (rad/s)  
STD  71.6  0.81  1.211 
NN  72.4  1.05  1.420 
GPR  82.4  0.77  1.234 
GPR  84.4  0.76  1.239 
BGMR  76.0  0.77  1.274 
BGMR  74.8  0.78  1.240 
Ensemble  98.0  1.00  1.317 
4.2 Planning from a fixed initial configuration to a random goal configuration
In this case, the state consists of joint angles of the two DoFs arms of PR2. The task is to move from a fixed to a random goal configuration . The task can then be defined as the goal configuration (). The database is constructed with , and the evaluation results with are presented in Table 4.
Since each PR2 arm is redundant, the path from to can be multimodal, which may pose a problem for GPR. However, Table 4 shows that GPR and BGMR perform similarly. This is due to the fact that, although redundant robots can achieve a goal configuration in many different ways, planning using optimization results in similar motions for similar goal configurations. The use of PCA does not appear to improve the performance significantly, but it still helps to reduce the size of the data. In this case, for each path it reduces the number of variables from () to (number of PCA components), more than 8 times reduction while maintaining the performance.
4.3 Planning from a random initial configuration to a random goal configuration
To proceed with a more complex case, the task here is to plan a path from a random initial configurations to a random target . The task consists of the initial and goal configurations, . The database is constructed with and evaluated with . The result is presented in Table 4.
NN performs poorly here, similar to STD, due to the dimension of the input space that is much larger as compared to Section 4.2. To achieve good performance, NN requires a much denser dataset. GPR outperforms BGMR by a wide margin.
The last row of Table 4 shows the result of the ensemble method described in Section 3.2. Given an input , the method uses all function approximators to provide different warmstarts, each of which is used to initialize an instance of TrajOpt in parallel. Once a valid solution is obtained, the other instances of TrajOpt are terminated. This method results in a huge boost of the success rate, with comparable convergence time and cost to the other methods. If having a lower cost is more important than the computational time, the parallel execution can also be programmed such that we wait until all instances of TrajOpt finish the optimization, then the cost of the resulting paths are compared and the one with the lowest cost is chosen. This alternative implementation can reduce the cost, but at the expense of higher computational time.
Method  Success (%)  Conv. time (s)  Cost (rad/s) 

STD  65.6  0.98  1.849 
NN  73.6  1.16  1.838 
GPR  65.2  1.69  1.791 
GPR  66.8  1.49  1.810 
BGMR  72.4  1.30  1.755 
BGMR  79.2  1.12  1.817 
METRIC GPR  85.6  0.63  1.425 
4.4 Planning to Cartesian goals from a fixed initial configuration
In Section 4.2 and 4.3, we use TrajOpt to plan to goals in configuration space. In practical situations, however, the task is often to reach a certain Cartesian pose using the endeffector (e.g., to pick an object on the shelf), instead of planning to a specific joint configuration. One way to solve this problem is to first compute a configuration that achieves the Cartesian pose using inverse kinematics and plan to this configuration, but it does not make use of the flexibility inherent in the task. TrajOpt has an option to plan directly to a Cartesian goal, but it typically requires longer convergence time and lower success rate than planning to a joint configuration goal.
We present two approaches to use the memory of motion in this problem. In the first approach, we rely on the similar procedure as in previous cases: we formulate the task as where and are the Cartesian positions of the right and left hand of PR2. The database is then constructed with and the function approximators are trained. In this approach, TrajOpt plans to a Cartesian goal directly. The second approach relies on the fact that a Cartesian goal corresponds to multiple goals in configuration space. In Section 4.2 we have already constructed a memory of motion that can predict an initial guess , given a goal in configuration space. The second approach uses this memory as a metric (Sect. 3.3) to choose between the different goals in configuration space. First, given a Cartesian goal , we run Inverse Kinematics (IK) solver to find joint configurations that satisfy this pose. For each joint configuration, we use the memory of motion to predict the initial guess of the robot path to reach that configuration, and we compute the cost of each path. Finally, the goal configuration and the path with the lowest cost are chosen, and TrajOpt is run to reach this goal configuration with the given path as the warmstart. Note that in this approach, TrajOpt plans to a joint configuration instead of a Cartesian goal. For this approach we choose the method from the Section 4.2, and use the term ‘’ to differentiate from the first approach (denoted in standard notation).
We present the results in Table 5 with . Among the methods using the first approach, we note that BGMR yields better result than GPR because the mapping from the Cartesian goal to the robot path here is multimodal, as planning to a Cartesian pose has more redundancy as compared to planning to a joint configuration. This again demonstrates that BGMR handles multimodal output better than GPR. However, the second approach outperforms even BGMR. The improvement over the first approach is very significant in all three criteria. This demonstrates that using the memory of motion as a metric to choose the optimal goal results in large improvements. We point out that the additional computational time required to find IK solutions and the corresponding warmstarts is only around s, which is negligible compared to the convergence time.
5 Conclusion
We have presented an approach to build a memory of motion to warmstart trajectory optimization solver, and demonstrate through experiments with PR2 robot that the warmstart can improve the convergence time and the success rate while improving the quality of the solution. Function approximators and dimensionality reduction are used to learn the mapping between the task descriptor and the corresponding robot path. Three function approximators are considered: NN as baseline, GPR and BGMR. NN usually requires a dense dataset to obtain good performance, so it performs well when the size of the input space is small (as shown in Section 4.1, 4.2, and 4.4), while for larger input space (Section 4.4) it does not yield good results. GPR performs the best when the output is unimodal, while for multimodal output BGMR has a better performance than GPR. The use of PCA also improves the solution, although not too significantly, while reducing the memory storage. Using the ensemble method to warmstart the trajectory optimization boosts the success rate significantly. Finally, we have shown that the memory of motion can be used as a metric to choose optimally between several alternative goals, and this results in a significantly improved performance for the case of Cartesian goal planning.
Acknowledgment
This work was supported by the European Union under the EU H2020 collaborative project MEMMO (Memory of Motion, http://www.memmoproject.eu/), Grant Agreement No. 780684.
References
 [1] J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel, “Finding locally optimal, collisionfree trajectories with sequential convex optimization,” in Proc. Robotics: Science and Systems (RSS), vol. 9, no. 1, 2013, pp. 1–10.
 [2] M. Toussaint, “Robot trajectory optimization using approximate inference,” in Proc. Intl Conf. on Machine Learning (ICML). ACM, 2009, pp. 1049–1056.
 [3] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covariant hamiltonian optimization for motion planning,” Intl Journal of Robotics Research, vol. 32, no. 910, pp. 1164–1193, 2013.
 [4] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “Stomp: Stochastic trajectory optimization for motion planning,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), 2011, pp. 4569–4574.
 [5] M. Stolle and C. G. Atkeson, “Policies based on trajectory libraries.” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), 2006, pp. 3344–3349.
 [6] C. Liu and C. G. Atkeson, “Standing balance control using a trajectory library,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), 2009, pp. 3031–3036.
 [7] D. H. Jacobson and D. Q. Mayne, “Differential dynamic programming,” 1970.
 [8] N. Mansard, A. Del Prete, M. Geisert, S. Tonneau, and O. Stasse, “Using a memory of motion to efficiently warmstart a nonlinear predictive controller,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), 2018, pp. 2986–2993.
 [9] D. Forte, A. Gams, J. Morimoto, and A. Ude, “Online motion synthesis and adaptation using a trajectory database,” Robotics and Autonomous Systems, vol. 60, no. 10, pp. 1327–1339, 2012.
 [10] A. J. Ijspeert, J. Nakanishi, and S. Schaal, “Learning rhythmic movements by demonstration using nonlinear oscillators,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), 2002, pp. 958–963.
 [11] L. E. Kavraki, P. Svestka, J. . Latombe, and M. H. Overmars, “Probabilistic roadmaps for robot path planning,” IEEE Transactions on Robotics and Automation, vol. 12, no. 4, pp. 566–580, 1996.
 [12] S. M. LaValle, “Rapidlyexploring random trees: A new tool for path planning,” Iowa State University, Computer Science Department, Tech. Rep. TR 9811, 1998.
 [13] S. R. Martin, S. E. Wright, and J. W. Sheppard, “Offline and online evolutionary bidirectional rrt algorithms for efficient replanning in dynamic environments,” in Proc. IEEE Intl Conf. on Automation Science and Engineering (CASE), 2007, pp. 1131–1136.
 [14] D. Berenson, P. Abbeel, and K. Goldberg, “A robot path planning framework that learns from experience,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), 2012, pp. 3671–3678.
 [15] N. Jetchev and M. Toussaint, “Trajectory prediction: learning to map situations to robot trajectories,” in Proc. Intl Conf. on Machine Learning (ICML), 2009, pp. 449–456.
 [16] W. Merkt, V. Ivan, and S. Vijayakumar, “Leveraging precomputation with problem encoding for warmstarting trajectory optimization in complex environments,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), 2018, pp. 5877–5884.
 [17] C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. Cambridge, MA, USA: MIT Press, 2006.
 [18] J. QuiñoneroCandela and C. E. Rasmussen, “A unifying view of sparse approximate gaussian process regression,” Journal of Machine Learning Research, vol. 6, no. Dec, pp. 1939–1959, 2005.
 [19] J. Hensman, N. Fusi, and N. D. Lawrence, “Gaussian processes for big data,” in Conference on Uncertainty in Artificial Intellegence, 2013, pp. 282–290.
 [20] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
 [21] S. Calinon, “A tutorial on taskparameterized movement learning and retrieval,” Intelligent Service Robotics, vol. 9, no. 1, pp. 1–29, 2016.
 [22] E. Pignat and S. Calinon, “Bayesian gaussian mixture model for robotic policy imitation,” arXiv preprint arXiv:1904.10716, 2019.
 [23] C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908, 2016.
 [24] Y. Freund and R. E. Schapire, “A decisiontheoretic generalization of online learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997.
 [25] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
 [26] F. SuárezRuiz, T. S. Lembono, and Q.C. Pham, “Robotsp–a fast solution to the robotic task sequencing problem,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), 2018, pp. 1611–1616.
Appendix A Supplementary Materials
a.1 Bayesian Gaussian Mixture Regression (BGMR)
As discussed in the main text, given the training dataset GMR constructs the joint probability of as a mixture of Gaussians,
(6) 
where , , and are the th component’s mixing coefficient, mean, and covariance, respectively. Let , denoting the GMR parameters to be determined from the data.
We can decompose and according to and as
(7) 
Given a query , the predictive distribution of can then computed by conditioning on ,
(8) 
where is the probability of belonging to the th component,
(9) 
and ) is the predictive distribution of according to the th component,
(10) 
which is a Gaussian distribution with the mean being linear in . The resulting predictive distribution (8) is then a mixture of Gaussians.
In GMR, is usually estimated from the dataset by using ExpectationMinimization method that outputs a single value for , and the prediction is computed using this value. Instead, Bayesian GMR (BGMR) puts priors to the parameter , and given the dataset , it computes the posterior distribution . Hence, when we do prediction we have to integrate over the posterior distribution of ,
(11) 
The resulting predictive distribution is then a mixture of multivariate tdistributions,
(12) 
where is the probability of belonging to the th component and is the predictive distribution of according to the th component. These are calculated using the formula similar to (9) and (10), but the Gaussian distributions are replaced by tdistributions. is a tdistribution (with the mean being linear w.r.t. ) instead of a Gaussian distribution as in (10), due to the integration over the posterior distribution in (11).
Note that the predictive distribution of GMR (8) and BGMR (12) are very similar, but (8) is a mixture of Gaussians whereas (12) is a mixture of tdistributions. Both can be seen as a mixture of linear regressions, since the mean prediction according to each th component (10) is linear in . The advantage of having a Bayesian version is GMR is the following:

When the dimension of and is high, the number of parameters to be estimated in is large and the estimation of in GMR is prone to overfitting, unless the number of training data is very large. Putting the priors to in BGMR helps mitigation of overfitting and makes it possible to learn with fewer number of datapoints.

By putting Dirichlet Process prior to the parameters , the number of components can be determined automatically.
More details about the priors and the derivations of BGMR can be found in [22].
To illustrate how BGMR can handle multimodal output, we show a toy example in Figure 12, which depicts a mapping that is discontinuous at = 4 and = 6, and has two modes at . The data points corresponding to are shown in black. First, BGMR fits a mixture of Gaussians to {}, shown as red ellipses in Figure 12a. We can see that the Gaussians fit well each mode and each side of discontinuity.
Given a query point , the predictive distribution of is a mixture of tdistributions given by (12). Figure 12b shows an example of for . We can see that there are two peaks corresponding to the two modes of the mapping (i.e., = 25.2 and = 46.7), having roughly the same probability ().
To have a singlepoint prediction from the predictive distribution, we can take the mean of the component that has the highest probability. Figure 12c shows the BGMR predictions over the range of , by taking the mean of the components with the highest and the secondhighest probability in (12). We see that it predicts both modes very well, and it still performs well at discontinuous region (i.e., it does not average the both sides of discontinuity).
a.2 Algorithms
We present here the pseudocode for the methods discussed in Section 3.
Comments
There are no comments yet.