I Introduction
Learning is rarely random and it typically follows an intended, often greedy curriculum. Actively seeking to fill knowledge gaps is, in fact, tantamount to faster learning. This setup is especially advantageous when the data is scarce and expensive to acquire. To actively seek the missing knowledge, the learning algorithm is endowed with the ability to query an oracle for the next datum, or the next datum to label. The oracle is commonly a human labeller or a demonstrator. To this end, active learning often boils down to two components: a model that quantifies the learner’s uncertainty and a utility function. The supremum of the utility function determines the next datum to be queried. Examples of utility functions include the Information Gain [1], Upper Confidence Bounds [2], or the Expected Improvement [3]. Activelearning approaches are efficient as, in general, require fewer data to achieve high performance [4]. Therefore, in robotics, where data acquisition is considered expensive, utilising active learning is crucial. In this work, we introduce an activelearning algorithm that is primarily aimed for motion planning. Our approach uses latentvariable models and proposes a novel utility function that operates in the latent space. It leverages recent advances in metric learning for latentvariable models [5, 6]
and augments them with uncertainty estimation.
Current approaches in active learning for robotics, further discussed in Section II
, mainly focus on acquiring demonstrations for learning new trajectories when data points are missing i.e., they can successfully quantify the missing information. However, they do not provide a datadriven interpolation approach between the already acquired data points, e.g., when generalising skills or transitioning between skills. In practice, a shortest path approach is used, but it suffers from limitations. First, the interpolation does not follow the shortest path in the data manifold and consequently, might drive the robot away from the known statespace regions that have been acquired from demonstrations, resulting, e.g., in collisions or in reaching joint limits. Second, the smoothness of the movement is also not taken into account. Yet, smooth and predicable motion is a key ingredient for safe interactions with robots.
We propose a novel activelearning algorithm that supports datadriven generalisation by allowing interpolations in the data manifold, while it is capable to simultaneously detect nonsmooth, abrupt changes. Additionally, it quantifies the uncertainty during the interpolation and, hence, suggests new demonstrations to be provided by an oracle. Specifically, our approach leverages from latentvariable models in order to infer a meaningful representations of the motion trajectories and exploits the Jacobian of the data likelihood along the movement to discover abrupt motions. We demonstrate the benefits of our approach in a set of experiments by generating smooth generalisations of movements and, in addition, we demonstrate how the uncertainty of a movement can be used to implicitly avoid obstacles. For the experimental evaluation, we model the motion of a pendulum assuming that we have access only to image observations and we use a 7DoF anthropomorphic arm to demonstrate our approach on avoiding obstacles.
Ii Related work
Active learning is a common component in robotic systems, especially when it comes to efficiently acquiring new samples during the learning [7, 8], or to performing exploration in the action space [9, 10]. Interested by the problem of teaching robots by humans, the authors in [11] leverage the uncertainty of the hypothesis space in order to efficiently request demonstrations from a human operator. In [12], the robot is endowed with the ability to ask questions, in order to acquire new labels, new demonstrations, or new skill representations. The uncertainty estimation of Gaussian Processes is used in [13] in order to learn to broaden the robot reaching skills by querying new demonstrations whenever the uncertainty reaches a specified threshold. Active learning is also used to improve over random exploration for grasping tasks based on visual sensory input [9]. Also for grasping, active learning is combined with reactive control in order to explore interesting poses using an upper confidence bound (UCB) policy [14]. In [15], a goaldriven active learning approach is developed for learning skills in continuous sensorimotor spaces. In [16]
, the authors combine modelfree and modelbased reinforcement learning methods—and the uncertainty thereof, in order to acquire robotic manipulation skills.
Our robot experiments involves a querybased learning system. The scenario is similar to [13], however, in the latter work the user has to manually choose the trigger.
More generally, active learning has been intensively studied in the machine learning literature
[4]. In the latter survey, the author distinguishes three types of scenario, depending on how to select the query to be labelled by the oracle; namely membership queries synthesis, streambased selective sampling, and poolbased sampling. In the membership queries synthesis scenario (e.g., [17, 18]), the learner first generates (or synthesises) the query to be annotated, instead of sampling it from an observed pool of data. In streambased selective sampling (e.g., [7]), the learner further filters the generated queries to be labelled and hence can decide to discard it based on a given “informativeness measure”. Finally, poolbased sampling (e.g., [19, 20]) is motivated by applications wherein a large amount of unlabelled data can be collected but labelling by the oracle is costly. In our illustrative example in Section IVA—the pendulum experiment—poolbased sampling is used. The original problem of the experiment corresponds to an unsupervised learning task and does not require labels, however the setup allows us to select the most useful data from the pool and evaluate our method. In Section
IVB and IVC however, streambased selective sampling is used. These experiments imply robot interaction and it remains expensive to obtain unlabelled data for such manipulations.In order to model the uncertainty, kernelbased methods are commonly used in the active learning literature, such as Support Vector Machines and “marginbased uncertainty”
[21]—when dealing with lowdimensional data. In [22], the authors combine Radial Basis Functions (RBF) kernels with information density. Bayesian Neural Networks are also used for active learning, as recently shown by
[20] in the context of images classification. Following [23], we use an RBF network to model data uncertainty. Our method is an alternative for modelling defect detection by measuring model sensitivity.Iii Applying Riemannian geometry to latent variable models for active learning
Latentvariable models (LVM), defined by
(1) 
are widely used to find a representation of observable data through latent variables based on hidden, nonlinear regularities in .
We use latent variables for generating a sequence of observable data points, with the condition that every generated point has a high similarity to the previous one. However, the similarity between the successive data points depends on the information provided to the LVM. In case of a low similarity we apply active learning to get targeted the missing information.
Gauging the similarity of two data points in the latent space is one of the main topics in this paper. To solve this problem, we take the Jacobian of the likelihood into account by treating the latent space as a Riemannian manifold. The Riemannian metric defines a relationship based on the Jacobian of the likelihood, due to change of variables when moving from Riemannian (latent) to Euclidean (observation) space.
A smooth interpolation through our observable data can be obtained by following the geodesic, i.e. the lengthminimising curve between two points in the Riemannian space. Here, smooth refers to a strong similarity of successive data points.
However, even when following the geodesic, finding a smooth interpolation will fail under certain circumstances. For instance, when we are trying to interpolate between different classes. The distance between the data manifolds of the different classes in the observation space typically results in a high Jacobian value of the likelihood mean when interpolating from one class to the other. This implies that information is missing to provide a sequence of similar data points to connect the different manifolds smoothly. As a consequence, the variance of the likelihood changes as well.
Since the Jacobians of both the mean and the variance are taken into account by the Riemannian metric, this property can be turned to advantage when dealing with active learning. For instance, when trying to interpolate between different robot movements. Because missing data can be queried specifically if such boundaries are passed.
Building on that, the focus of our paper lies on applying Riemannian geometry to LVMs for active learning of robot movements.
Iiia Importanceweighted autoencoder
Since in most LVMs the integral in Eq. (1) is intractable, approximations are used which base on sampling [24][25] or on variational inference [26][27]. In the latter case, the problem is reformulated as the maximisation of the evidence lower bound (ELBO). The distribution approximates the intractable posterior and , defined as the generative model and parameterised by , approximates the likelihood. Let be observable data and the corresponding latent variables. Then,
(2) 
Implementing with a neural network parameterised by
, we obtain the variational autoencoder (VAE) introduced in
[26, 27].IiiB Riemannian geometry in latent variable models
Riemannian space is a differentiable manifold
which contains as an additional characteristic a metric to describe its geometric properties. The corresponding metric tensor
assigns to each point in the latent space an inner product on the tangent space , defined by(5) 
with and .
Let us assume we have a curve in the Riemannian (latent) space that is transformed by a continuous function to an dimensional Euclidean (observation) space, where . The length of this curve in the Euclidean space is defined as
(6) 
with the metric tensor , where is the Jacobian matrix of the likelihood and the time derivative of .
IiiC Using geodesics for trajectory generation
To approximate the geodesic we use a neural network that is optimised by minimising
. A singularvalue decomposition of
ensures the geodesic is following the data manifold, as introduced in [5].Although this method takes the sensitivity of the model into account, it does not capture data uncertainty. In other words: the high Jacobian values of the likelihood mean at the boundaries between different data manifolds are taken into account, but there is nothing that tells us where our model is uncertain due to missing data. The reason is a global variance of the generative model. To remedy that, the neural network of the generative model is extended by radial basis function (RBF) networks to be able to represent the likelihood variance too [6, 23].
In contrast to [6]
, we update the weights of the RBF networks during the training of the generative model and define a rule for an autonomous hyperparameter selection after the training is finished. The RBFs
and the precision of the generative model are given by(7)  
where is the number of the radial basis functions. and are variables representing bandwidth and centres, respectively. are the weights to be optimised. The bandwidth is defined by
(8) 
where , a hyperparameter, denotes the curvature of the Riemannian metric. Since the variance is the reciprocal of the precision:
(9) 
it increases with the distance to the centres and the uncertainty of the model, respectively. It is not possible to directly compute the Jacobian of a sample . Hence, we reparameterise it by [26, 30]:
(10) 
where and
, the Jacobians of the mean and the standard deviation of the likelihood, represent the sensitivity and the data uncertainty of the model, respectively. The changes in the likelihood variance have influence on
, hence the equation introduced in [5] has to be updated. To simplify the calculation, we remove the stochasticity in by taking the expectation [6](11) 
We differ from [6] in the optimisation procedure of the model: the centres
are computed by Kmeans and updated at every
th iteration step during the training of the IWAE. For both the computation of the centres and the RBFs , the mean of is used. The weights are optimised by backpropagation. is treated as a hyperparameter during the IWAE training. After the training is finished, is updated to satisfy(12) 
Satisfying Eq. (12) guarantees that the mean and the variance have a similar effect on the Riemannian metric tensor.
IiiD Active learning for robot trajectory generation
Active learning can be applied to targeted reduce the uncertainty of our model, which leads to smoother trajectory generations. In active learning an acquisition function is used to detect where a model is uncertain, so missing labels can be queried specifically:
(13) 
Our goal is to guarantee a smooth interpolation along the geodesic. This is only possible if there are no abrupt changes in the Jacobian of the likelihood, which is expressed by the determinant of the metric tensor for a specific pair . This leads to the following acquisition function:
(14) 
also defined as the magnification factor () [31]. The can be interpreted as the scaling factor when moving from the Riemannian (latent) to the Euclidean (observation) space, due to the change of variables.
In addition to the acquisition function, a threshold is necessary to tell the active learning algorithm whether a interpolation is smooth or not. The threshold is defined as
(15) 
where , , and is the cardinality of .
When applying active learning to robot movements, we use a set of pairs of start and end points in the observation space . For each pair the geodesic is computed. To decide whether the movement (interpolation) between a start and an end point is smooth, only points along the geodesic are taken into account. Thus, in contrast to the active learning approach described in Eq. (13), refers to the latent space. Based on whether the values of the magnification factor along the geodesic exceed , the active learning algorithm decides if the trajectory between and is required to be demonstrated. In case of a required demonstration, retraining the model with the new data leads to low s along .
Hence, the final result of our approach is a smooth movement or rather a smooth combination of movements of the robot—realised by reconstructing the latent variables along the geodesic.
Iv Experimental Evaluation
We evaluate our approach in multiple scenarios. First, we use an artificial twodimensional dataset to illustrate how our approach works. Then, we demonstrate that our approach can work efficiently with highdimensional data on simulated pendulum where the state is given by images. Finally, we present our results on controlling a 7DoF robotic arm where smooth reaching movements are generated. When our approach detects that a trajectory would cross regions of the state space where not enough data have been acquired, it asks for additional demonstrations. Hence, it is used to implicitly avoid collisions and joint limits. The architectural design and the hyperparameters used for our experiments are listed in the appendix.
Iva Illustrative experiment
In the first experiment, we evaluate the efficiency of our approach in reducing the reconstruction error by actively acquiring data points from regions where the model does not have enough information. To better illustrate our approach, we generated an artificial twodimensional dataset, where the IWAE maps the observation space to a twodimensional latent space. The training dataset is depicted in Fig. 1(a), whereas the resulting latent space and are shown in Fig. 1(b).
Our algorithm asks for new data points from regions where the is high, and therefore it selects the central region. In contrast, the Max Entropy strategy assumes that enough information form the central region is present.
As a result, our activelearning approach can efficiently reduce the reconstruction error using fewer samples than the Max Entropy or the random acquisition approach. The results are shown in Fig. 3.
IvB Trajectory planning for pendulum
We demonstrate the trajectory planning capabilities of our approach in a simulated 1DoF pendulum system. The simulator provides a pixel image of the current state of the pendulum, which we use as input to our algorithm. We gathered an image dataset by collecting images for two different joint angle ranges, and degrees. Subsequently, we augmented the dataset by adding Gaussian noise to each pixel, to avoid over fitting and to improve the coherence of the latent space.
After training, we generated four trajectories between the two datasets by following the geodesic. The generated trajectories are illustrated in Fig. 3(a). The trajectories and move across regions of the state space where the exceeds a predetermined threshold, as not enough information has been collected from those regions. An illustration of the trajectories is shown in Fig. 3(c).
Our approach requested for additional demonstrations from regions where the exceeds the threshold. Afterwards the model is retrained with the new data. As a result, the is reduced in these regions, as shown in Fig. 4(a). The corresponding trajectories are significantly smoother after our activelearning approach was applied, as can be seen in Fig. 4(b).
IvC Generating robot trajectories with active learning
Deciding whether the robot is able to perform a task or a demonstration is required is not trivial. Therefore, we evaluate our approach in a robot trajectory generation setting, where the robot should consult the human operator to avoid collisions with the environment. Also, the generated trajectories should not have abrupt changes to enable the robot to precisely follow them. For this experiment, we used a Panda robot from FRANKA, a lightweight 7DoF robotic arm with joint torque sensors.
For training our model, we provided demonstrations of reaching objects that were placed at two distinct locations. We used kinaesthetic teaching, i.e., the human demonstrator could freely move the robot to acquire a dataset of five demonstrations per reaching location. The setup is depicted in Fig. 1. During the demonstrations we recorded the joint angles of the robot at a rate of . Additionally, obstacles where placed in the workspace of the robot. Naively generating movements based on the demonstration dataset likely results in collisions.
We generated trajectories by computing an interpolation between the two distinct locations by following the geodesic, as shown in Fig. 5(a). The geodesic trajectory crosses a region with high values.
Since the values along the proposed trajectory exceed the threshold, the algorithm asks the user for additional data. Thus, after collecting the data of the queried trajectory, we retrained our model on the new data and recomputed the geodesic. The updated latent space is shown in Fig. 5(b), where the geodesic does not cross high regions anymore. The endeffector trajectories before and after retraining are depicted in Fig. 5(c). As a result, the robot arm moves close to the demonstrated path and avoids collisions with obstacles. A visualisation of the robot trajectory and its environment can be found in Fig. 7.



V Conclusion
We introduced a new activelearning method, based on the model sensitivity in deep generative models. We showed that our method is suitable for efficiently learning new skills from demonstrations while maintaining some smoothness between known motions. In addition to triggering a query for new demonstrations, the magnification factor also indicates whether the observed data contains unrealistic postures, sudden fast movements, or indicates previously unseen/untrained movements.
Currently, the model is retrained when new data is acquired, in order to prevent the optimisation procedure to get stuck in local minima. We tackle this issue in future development and investigate alternative optimisation procedures that effectively allow for an online update.
Acknowledgment
We are very grateful to Justin Bayer for valuable suggestions concerning this work.
References
 [1] N. Houlsby, F. Huszár, Z. Ghahramani, and M. Lengyel, “Bayesian active learning for classification and preference learning,” arXiv preprint arXiv:1112.5745, 2011.
 [2] R. Ganti and A. G. Gray, “Building bridges: viewing active learning from the multiarmed bandit lens,” arXiv preprint arXiv:1309.6830, 2013.
 [3] E. Brochu, V. M. Cora, and N. De Freitas, “A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,” arXiv preprint arXiv:1012.2599, 2010.
 [4] B. Settles, “Active learning literature survey,” Computer Science Technical Report, 2010.

[5]
N. Chen, A. Klushyn, R. Kurle, X. Jiang, J. Bayer, and P. van der Smagt,
“Metrics for deep generative models,” in
International Conference on Artificial Intelligence and Statistics (AISTATS)
, 2018.  [6] G. Arvanitidis, L. K. Hansen, and S. Hauberg, “Latent space oddity: on the curvature of deep generative models,” in International Conference on Learning Representations (ICLR), 2018.
 [7] L. E. Atlas, D. A. Cohn, and R. E. Ladner, “Training connectionist networks with queries and selective sampling,” in Advances in neural information processing systems, 1990, pp. 566–573.
 [8] D. A. Cohn, Z. Ghahramani, and M. I. Jordan, “Active learning with statistical models,” Journal of artificial intelligence research, vol. 4, pp. 129–145, 1996.
 [9] M. Salganicoff, L. H. Ungar, and R. Bajcsy, “Active learning for visionbased robot grasping,” Machine Learning, vol. 23, no. 23, pp. 251–278, 1996.
 [10] A. Morales, E. Chinellato, A. H. Fagg, and A. P. del Pobil, “An active learning approach for assessing robot grasp reliability,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, 2004, pp. 485–490.
 [11] C. Chao, M. Cakmak, and A. L. Thomaz, “Transparent active learning for robots,” in ACM/IEEE International Conference on HumanRobot Interaction (HRI), 2010, pp. 317–324.
 [12] M. Cakmak and A. L. Thomaz, “Designing robot learners that ask good questions,” in ACM/IEEE international conference on HumanRobot Interaction, 2012, pp. 17–24.
 [13] G. Maeda, M. Ewerton, T. Osa, B. Busch, and J. Peters, “Active incremental learning of robot movement primitives,” in Conference on Robot Learning (CORL), 2017.
 [14] O. Kroemer, R. Detry, J. Piater, and J. Peters, “Combining active learning and reactive control for robot grasping,” Robotics and Autonomous systems, vol. 58, no. 9, pp. 1105–1116, 2010.
 [15] A. Baranes and P.Y. Oudeyer, “Active learning of inverse models with intrinsically motivated goal exploration in robots,” Robotics and Autonomous Systems, vol. 61, no. 1, pp. 49–73, 2013.
 [16] S. Hangl, V. Dunjko, H. Briegel, and J. H. Piater, “Skill learning by autonomous robotic playing using active learning and creativity,” CoRR.
 [17] D. Angluin, “Queries and concept learning,” Machine learning, vol. 2, no. 4, pp. 319–342, 1988.
 [18] R. D. King, J. Rowland, S. G. Oliver, M. Young, W. Aubrey, E. Byrne, M. Liakata, M. Markham, P. Pir, L. N. Soldatova et al., “The automation of science,” Science, vol. 324, no. 5923, pp. 85–89, 2009.

[19]
D. D. Lewis and W. A. Gale, “A sequential algorithm for training text classifiers,” in
Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 1994, pp. 3–12.  [20] Y. Gal, R. Islam, and Z. Ghahramani, “Deep Bayesian Active Learning with Image Data,” in Proceedings of the 34th International Conference on Machine Learning, 2017.

[21]
A. J. Joshi, F. Porikli, and N. Papanikolopoulos, “Multiclass active learning
for image classification,” in
IEEE Conference on Computer Vision and Pattern Recognition
, 2009, pp. 2372–2379.  [22] X. Li and Y. Guo, “Adaptive active learning for image classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 859–866.

[23]
Q. Que and M. Belkin, “Back to the future: Radial basis function networks revisited,” in
International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 51, 2016, pp. 1375–1383. 
[24]
W. K. Hastings, “Monte carlo sampling methods using markov chains and their applications,” 1970.
 [25] A. E. Gelfand and A. F. Smith, “Samplingbased approaches to calculating marginal densities,” Journal of the American statistical association, vol. 85, no. 410, pp. 398–409, 1990.
 [26] D. P. Kingma and M. Welling, “Autoencoding variational Bayes,” CoRR, vol. abs/1312.6114, 2013.

[27]
D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” pp. 1278–1286, 2014.
 [28] Y. Burda, R. B. Grosse, and R. Salakhutdinov, “Importance weighted autoencoders,” CoRR, vol. abs/1509.00519, 2015.
 [29] C. Cremer, Q. Morris, and D. Duvenaud, “Reinterpreting importanceweighted autoencoders,” International Conference on Learning Represenations Workshop Track, 2017.
 [30] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in Proceedings of the 31th International Conference on Machine Learning (ICML), 2014, pp. 1278–1286.

[31]
C. M. Bishop, M. Svens’ en, and C. K. Williams, “Magnification factors for the
SOM and GTM algorithms,” in
Proceedings Workshop on SelfOrganizing Maps
, 1997.  [32] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization.” CoRR, vol. abs/1412.6980, 2014.
 [33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
A. Details of the training procedure
The Adam optimiser [32] was used for optimising the models of the three experiments. In the Tables I, II, and III, we provide the parameters we used during training. We abbreviate the fully connected layers by FC. Residual networks [33] are used for the pendulum and the FRANKA experiments. Increasing the depth of the generative model led to a more sensible and smoother magnification factor. With , we refer to the number of importanceweighted samples.
Comments
There are no comments yet.