I Introduction
Nowadays, most of the robots used in the industry are preprogrammed and necessitate a welldefined and controlled environment. Reprogramming these robots is often an expensive process necessitating an expert. Enabling the robot to learn tasks by demonstrating them would streamline the robot installation and task reprogramming. That is why, Robot Learning from Demonstrations (RLfD) [1] is one of the key research areas in the field of robotics. However, constructing a robot that is able to learn by observation is still a challenging problem. Although, prototype platforms for robot learning by demonstration have been around for more than a decade, many complications have restrained the robots to operate only in restricted laboratory environments. Some of the key challenges are perception, task recognition, task generalization, motion planning, and object manipulation.
RLfD refers to the technique of teaching skills to a robot by giving examples of the desired behavior through human demonstrations. It is similar to the way human beings learn a new skill from demonstrations performed by the teacher. One of the great advantage of this technique is that it eliminates the need of expert level technical knowledge to program a robot. Moreover, it captures key features from the demonstration provided by a teacher who is expert in the specific task, which a robotic programmer might not be able to program. Thus, this technique has great potential in industrial as well as home robotic applications.
Learning the mapping between world state and actions is called as a policy. This allows a robot to choose an action based on its current state. In RLfD, a robot learns a new policy from demonstrations provided by the human teacher. A demonstration is defined as sequence of state–action pairs that are recorded throughout the performance of the required robot behavior by a human teacher. RLfD uses the dataset of recorded demonstrations to form policy to reproduce the demonstrated behavior.
We focus on learning, representing, and generalizing of trajectory based skills from demonstrations. We propose a novel RLfD framework by which a collaborative robot can learn trajectory based skills from kinesthetic demonstrations performed by a human teacher. The prosed RLfD framework has three fundamental phases: data collection from demonstrations, learning behavior by deriving a policy, and robot execution or reproduction of the demonstrated behavior. It extracts key information from multiple demonstrations to learn a stateaction policy, which is used to produce a generalized trajectory based skill.
Ii Background
RLfD is a technique to enable a robot to perform new tasks autonomously. Instead of necessitating users to logically decompose and manually program a robot for a desired behavior, RLfD enables to derive robot controller by observing human’s own performance. The goal is to easily extend and adapt robot capabilities to novel situations, even by users who lag programming ability. Robot learns a model of a task based on the demonstrations performed by the teacher. In the last two decades, several RLfD approaches have been developed [2]. However, until date most of the robots are tediously hand programmed for each of the task they have to perform. RLfD tries to minimize this challenging step by allowing users to teach their robot to fit in their specific needs. RLfD techniques are user friendly and enable robots to be employed more widely in daily activities with no need of specialist humans. Moreover, by using expert knowledge of human teacher, in form of demonstrations, the overall learning process would be faster as compared to handengineered control policy and learning by trialanderror.
One of the wellknown works in RLfD is by Calinon et al. [3] [4] [5]
, in which a probabilistic representation of the demonstrations is built using Gaussian Mixture Model (GMM) and a smooth trajectory is reproduced using Gaussian Mixture Regression. In another work by Schneider
et al. introduced a Gaussian process regression model that clusters the input space from performed demonstrations into smaller subsets and work individually on these subsets [6]. Both of these Gaussian techniques generalized over a set of demonstrated trajectories, but the GMM is computationally expensive.Rozo et al. proposed an endtoend RLfD framework for teaching forcebased manipulation tasks [7]. The demonstrations are encoded using a Hidden Markov Model (HMM), and the reproduction is done using a modified version of Gaussian Mixture Regression. Asfour et al. used continuous HMM to generalize movements demonstrated to a robot multiple times. The experiments were performed using a kinematics model of the human upper body to simulate the reproduction of arm movements [8].
In a recent work, Ahmadzadeh et al. proposed an approach that generates a continuous representation of the demonstrated trajectory based on the concept of canal surfaces [9]. Their experimental results show that the proposed approach can reproduce a wide range of trajectories that achieve the goal of the task. Koskinopoulou et al. developed a framework for Human Robot Collaborative (HRC) task execution [10]. Akgun and Thomaz developed an algorithm to simultaneously learning actions and goals from naïve human teacher demonstrations [11].
Konidaris et al. presented an online algorithm for constructing skill trees from demonstration trajectories in a dynamic continuous domain [12]. They evaluated their algorithm on the uBot5 mobile manipulator and showed that it was able to learn skills from both expert demonstration and learned control sequences. Other techniques like linguistic transfer of an assembly task from human to robot have also been developed [13].
Iii Problem Statement
We can consider RLfD as a subset of Supervised Learning. In supervised learning, the agent takes the labelled training data and the algorithm learns an approximate model to fit this data. Similarly, in RLfD, the training data is collected from demonstrations provided by the teacher. Figure
2 (top) shows training data being acquired from demonstrations by teacher to derive a policy.The robot world comprises of states and actions . Mapping between the states, is defined by a probabilistic transition function called state transition matrix given by:
(1) 
The set A contains highlevel behaviors as well as lowlevel motions. We make an assumption that the states are not fullyobservable or hidden. However, the learning algorithm has access to the observed state , with the mapping . The policy selects the next action based on the current robot state. Figure 2 (bottom) shows one cycle of policy execution.
Iv Prposed Approach
The proposed RLfD system consists of three fundamental phases. Figure 3 shows a block diagram of the proposed approach. First phase is the data collection phase, in which the data is acquired from the demonstrations performed by the teacher. Realtime joint angle values and gripper state are received by the system, which applies Forward Kinematics to find the position and orientation of the manipulator. This process is repeated for all the demonstrations and the time series of position and orientation are stored.
The second phase is the task learning phase, in which the stored data from the first phase is used to learn a model of the task or the behavior. From each demonstration, keypoints of trajectories are extracted and then k
means clustering is used to cluster the keypoints from all demonstrations. Centers of each cluster is calculated and mapped to a unique symbol representing a state. BaumWelch algorithm is used to learn a policy from the demonstrations to find the probability matrix of stateaction pairs. Then, Viterbi algorithm is used to find the most probable sequence of states. Dynamic Time Warping (DTW) is used to align the time vector of the learned trajectory.
Third phase is the robot execution phase, in which the saved model is used to produce the learned behavior. Cubic smoothingspline regression is used to determine a generalized trajectory from the most probable sequence of states. Inverse kinematics is applied to obtain a generalized trajectory suitable for the robot.
V Data Collection
Data collection is the first phase of the system, where the robot collects data from the physical demonstrations performed by human teacher. In the next three subsections, we discuss the main steps of this phase.
Va Demonstrations
In our work, we have used kinesthetic teaching to demonstrate the task or trajectories to the robot. Kinesthetic teaching is a way of teaching in which, the teacher physically holds the robot arm or manipulator, and demonstrates the task by moving the robotic arm. The main advantage of this technique is that there is no need to map the real world space to robot world space, as the performed demonstrations are with respect to robot world.
One major roadblock for this type of teaching is that it requires a robot that has a feedback system to record the joint position in realtime. However, we do not face this problem as we are using one of the most advanced research robot called Baxter, which has inbuilt feedback system to measure the joint angles from all fourteen joints and manipulator state of the dualarm robot. This allows us to read these joint values from joint trajectory server in real time. To get more precision for fast movements, we have increased the default rate (100Hz) of reading current joint angles from the robot to 1000 Hz. These values are stored with a timestamp and sent to the forward kinematics function.
Figure 1 shows an example demonstration being performed using kinesthetic teaching to demonstrate a pick and place task to Baxter. In this, the teacher holds the force sensor at the manipulator of each robotic arm and guides the robot through the task. To change the state of the gripper, physical buttons near the manipulator are used. One button closes the gripper and the other opens the gripper. The system records the gripper states along with the joint angles throughout the demonstration, and for each demonstration.
VB Forward Kinematics
‘Forward kinematics refers to the use of the kinematic equations of a robot to compute the position of the endeffector from specified values for the joint parameters’. Kinematics equations are series of transformation matrices to characterize the relative movement at each joint with respect to the previous joint. Instead of going through the complex matrix multiplications for calculating the relative movements at each joint, we use DenavitHartenberg parameters or DH parameters.
Each individual homogeneous transformation is calculated as:
(2)  
where and are known as DH parameters associated with link and joint .
We use Baxter as our robot for learning, thus we calculated the DH parameters for both the limbs of the robot to perform the forward kinematics. We follow the standard convention of placing the frames at each joint to calculate the DH table. Figure 4 shows an example of placing the frames on first two revolute joints and the physical parameters used to mathematically modeling Baxter’s arm. Table I shows the derived DH parameters for all the joints and links for Baxter. These parameters are used to calculate the individual transformation matrices for each of the link.
Joint  

1  0.2703  0.069  1.571  
2  0  0  1.571  
3  0.3644  0.069  1.571  
4  0  0  1.571  
5  0.3743  0.01  1.571  
6  0  0  1.571  
7  0.2295  0  0 
Now, we use the DH parameters to calculate . The homogeneous transformation matrix that expresses the position and orientation of the manipulator is called a transformation matrix , and is calculated as:
(3) 
For each measurement, the calculated position and orientation are stored into a data file, along with the time stamp and the gripper state. This marks the end of the first phase.
Vi Task Learning
Task learning is the second phase of the system. In this phase, the data stored during the first phase is used to learn a model of the task.
Via Keypoint Extraction
We detect the characteristic features of each demonstration, and we call it keypoints. By doing this, we avoid having large number of hidden states to be trained by HMM. We represent every recorded movement during the demonstration by a set of timediscrete sequence. For each arm we have a sequence that defines the positions of manipulator, and a sequence that defines the orientations of manipulator over time, where is defined as the length of demonstration . is a three dimensional vector, whereas is a four dimensional vector. A point is selected as a keypoint in the sequence of points if:
(4)  
where, denotes the timestamp of demonstration and keypoint. This means that if the angle between the vector that goes from point to its successor and the vector that goes from point to its predecessor is less than , then point is selected as a keypoint. Figure 5 shows an example of finding angles between the two vectors in a two dimensional plane. To detect only sharp corners in the manipulator trajectory as keypoints should be high.
To detect the keypoints of the orientation angles of the manipulator, similar selection criterion for keypoints is used. A point ,i is selected as a keypoint in the sequence of points if:
(5)  
A major challenge in detection of the keypoints using this approach is to tune the values of thresholds , which will decide the number of keypoints extracted from the recorded task or trajectories. If the number of detected keypoints is low, some relevant characteristics of the task or trajectories can be missed from the generalization step. Whereas, if the detection of keypoints is high, over fitting of the recorded task or trajectories can take place.
Keeping in mind the tradeoff between generalization and overfitting, reasonable values for were experimentally determined. Figure 6 shows the result of keypoint extraction algorithm on multiple demonstrations. It shows the keypoints extracted from three demonstrations. The keypoints are marked with circles on each original trajectory of the demonstration.
ViB Clustering
In contrast to the common key point approach used by Asfour et al. [8], we cluster the keypoints derived from all the demonstrations and calculated the centroids of each cluster. To solve this problem, we use kmeans clustering algorithm. The kmeans clustering algorithm takes the number of clusters as an input to generate k clusters, from a set of observation vectors. It returns the centroids for each of the k clusters formed. For our system, we set the number of clusters k equal to average number of keypoint in each demonstration.
A keypoint vector is classified with a cluster if the centroid of the cluster is closest to it, i.e. closer to centroid than any other centroids.
kmeans clustering algorithm attempts to minimize the distortion, which is the sum of the squared distances between each keypoint vector and its dominating centroid. At each step, kmeans refines the choice of centroids and tries to reduce the distortion. When this distortion change gets below a threshold, the algorithm stops. Figure 6 shows the result of the implement kmeans clustering algorithm to cluster the keypoints and find the centroids of the clusters. In the plot, each centroid is marked with big red circles. In the next section, we discuss how these centroids are used to learn a model of the task or trajectories using a HMM.ViC Model Learning using HMM
A HMM (Hidden Markov Model) is a directed graphical model that is a statistical Markov chain of a sequence of unobserved or hidden states and a corresponding sequence of observation variables. It has been widely applied in the field of handwriting recognition, speech recognition, DNA sequence analysis, etc.
Figure 7 shows a graphical representation of HMM, where denotes the hidden states and denotes the observed variables at time instants . The probability of going to state at time , given that the current is state at time , is denoted by:
(6) 
Using these probabilities, we form the state transition matrix, given by:
(7) 
where, denotes the number of hidden states in the HMM. In this work, the number of hidden states was set equal to the number of centroids derived from the demonstrations.
The initial state probabilities is defined as the probability of model being in state at time , and is given by:
(8) 
The observation probability is defined as the probability of observing a symbol at time given the model is in state , and is denoted as:
(9) 
Using these probabilities, we form the observation probability matrix, given by:
(10) 
where, is the number of observation symbols. A complete HMM is described as:
(11) 
In our research, we used a discreteHMM for learning a model of the demonstrated trajectories, which required the recorded continuous trajectories to be mapped to discrete values. The keypoint extraction and clustering technique applied in section VIA Keypoint Extraction and section VIB Clustering were for vector quantization. As it is required to use most of the k clusters in order to preserve most of the features in the continuous trajectory, we mapped each cluster into a discrete symbol to from a codebook of symbols , where k is the number of clusters formed. The sequences of these observations form the set, and is given by:
(12) 
where, M denotes the total number of demonstrations.
The efficiency of learning with HMM depends on the number of available observations. As in RLfD, it is preferred to keep the number of observations or demonstrations low, an appropriate initialization of the model parameters is imperative. Next, we discuss the initialization process of the HMM parameters.
We implemented the Bakis left–right topology [14] to model the demonstrated task. The forwardtransition probabilities and selftransition probabilities were initialized in the state transition matrix as:
(13)  
where, is the amount of time spent in state , and is a normalizing constant to make sure .
All other state transition probabilities were set as zero, to make sure that transition to those states is impossible. Output probabilities were initialized as:
(14) 
where, denotes the number of times is observed in state of . The state probabilities , were initialized as . This ensures that the learning always starts from the starting point.
Once the model parameters were initialized, Baum–Welch algorithm [14] was used to train on all demonstrations or observations . Later, Viterbi algorithm [14] was used to determine the most probable sequence of hidden state. Due to the nature of the implemented Bakis leftright topology, some hidden states or centroid points were not present in all the observed trajectories. We called them as zeropoints and were ignored in the generalized trajectory.
ViD Dynamic Time Warping
Since the length and velocities of the demonstrated trajectories differ, the time frames of extracted keypoints of each demonstration are different. To produce the learned trajectory, we need to align the set of keypoints along a common time vector. To solve this problem, we use DTW [15], in which the temporal alignment of the clusters made by keypoints is done by aligning the entire trajectory with respect to a reference trajectory.
DTW sequence alignment technique forms a matrix that consists of distances between two timeseries, which is then used to find an optimal path that minimizes the distance between the two timeseries. For a given test sequence of length , and a reference sequence of length , the distance matrix is calculated as:
(15) 
In our research, to calculate the distance between two time‘series, Euclidean l2norm is used. The optimal alignment path is calculated as:
(16) 
For selection of the reference sequence, forward algorithm was used to find the demonstration that has the highest probability for the learned model. As our data received from demonstration was eight dimensional data, we implemented multidimensional DTW, and the distance matrix is calculated as:
(17) 
where, is the number of dimensions, and in our case, .
ViE Generalized Trajectory
After we have a common timeline for the model learned for the sequence of centroids, we need to connect these centroids to generate a generalized trajectory. To solve this problem, we used cubic smoothingspline regression to determine a generalized trajectory from the most probable sequence of centroids. This method is widely applied for fitting a smooth curve to large set of scattered data. In our case, we want to generate a smooth trajectory from the set of scattered centroid points. The spline curve was interpolated at intervals equal to the period size of clusters, which produced a generalized trajectory suitable for Baxter.
Vii Robot Execution
Once the model of the task is saved, we use the stored files to perform the task. The files contains the centroids, which are the position and orientation of the manipulator. For the robot to reproduce the learned task, we need to convert these position and orientation values into joint angles values that are suitable for the robot. To solve this problem, we use Inverse Kinematics. This technique is just the opposite of Forward Kinematics. It provides the set of joint angle values for a given the Cartesian pose of the manipulator.
We use the Baxter’s IK server to perform the inverse kinematics during the execution of the task. It is a buildin service from Baxter SDK. It takes the position and orientation values as the input and returns seven valid joint angles values to achieve it. These values are sent to the robot and it moves to that state using Joint Trajectory Action Server (JTAS).
Viii Experiments
To evaluate the approach, we performed the experiments on the research robot by Rethink Robotics called Baxter. It is a dualarm 7DoF robot. A modular approach was used to create the framework and all the algorithms were developed as separate packages in python. This made it easier to test and debug each part of the framework separately. In this section, we discuss experiments performed and the results of those experiments.
The keypoint extraction algorithm and clustering algorithm plays a vital role in the efficiency of the model. The variables for tuning the keypoint extraction algorithm were experimentally determined. To make sure this algorithm works well on wide variety of task and complex trajectories, we validated them on different tasks. In the next two subsections, we discuss the details of the two complex tasks that the robot learned through multiple demonstrations.
Viiia Stacking cup task
The first task is to place a cup on a stack of cups. In this task, the robot takes the cup from a human and puts it on a stack of cups. Figure 1 shows robot learning the stacking cup task from a human teacher through demonstration. Three demonstrations were performed to teach this task. Table II shows the data statistics for this task.
Demonstration #  # of recorded values  # of keypoints extracted 

Demonstration 1  18024  18 
Demonstration 2  19759  19 
Demonstration 3  18140  20 
The robot was able to reproduce the task successfully. This task involved learning of precise motion of the manipulator to precisely drop the cup on top of the stacked cups. Figure 9 shows the manipulator position for the demonstrated task and the learned trajectory. The small triangles on the trajectories marks the keypoints extracted from that demonstration, and the red circles show the centroids of the clusters formed. In total, 19 clusters were formed and they acted as states for the HMM. Numbers from 018 named these centroids. The cyan line shows the generalized trajectory. This plot only shows the position of the manipulator only. They complete learning process involved the learning of the orientation of the manipulator as well. Robot performing the learned stacking cup task can be seen in figure 8.
ViiiB Pick and place task
The second task is a pick and place task, in which the robot picks up a block of wood and places it at some other spot and then pick the block from that spot and places it back to the original spot. This is a much more complex task than the previous task. Table 4 shows the data statistics for this task.
It can observed in Figure 10 that the three demonstrations were a lot different from each other, though they were performing the same task. The small triangles on the trajectories marks the keypoints extracted from that trajectory, and the red circles show the centroids of the clusters formed. In total, 21 clusters were formed and they acted as states for the HMM. Numbers from 020 named these centroids. The cyan line shows the generalized trajectory. Please note that the trajectory in the graph is not the actual trajectory the robot went through. The derived trajectory in the plot is just by connecting the states with linear lines to illustrate the flow of states. The actual trajectory of the robot used JTAS spline interpolation and was much smoother than the one in the plot.
Demonstration #  # of recorded values  # of keypoints extracted 

Demonstration 1  31510  24 
Demonstration 2  28593  21 
Demonstration 3  27025  21 
In both the experiments, the robot was able to learn the complete task and successfully reproduce the learned task.
Ix Discussion
In this paper, we presented a framework for RLfD. This framework has three phases: Data collection, Task Learning and Robot Execution. The data collection module collects and stores the eight dimensional timeseries data from each demonstration. The Task Learning module then uses this data to learn a model of the task. As we can observe from the experiments, the keypoints are extracted from each of the recorded demonstration. This was done to limit the number of observed states for the HMM. As we can observe from the data statistics of the two experiments, the number of extracted keypoints is much less than the recorded values. This reduction in the number of states lead to a faster learning of the model. Initially, we tried to learn the model of the task based on the joint angles values, and it avoided the implementation of the forward and inverse kinematics. Then, we implemented FK and IK and used the position and orientation of the manipulator, which made it easier to visualize the trajectories of the task. In addition, the keypoints extracted from both the methods were the same in most of the task. Thus, we decided to move forward with the second approach using the position and orientation, as it is easier to plot and visualize the position of the manipulator.
The model learning part of our approach is similar to [16], but the key point extraction technique we implemented allowed us to precisely extract keypoints from the demonstrations. By adjusting the threshold values , we were able to finetune the key point extraction algorithm to extract only sharp corners from the demonstrated trajectory. The kmeans clustering technique that we used for clustering the key points allowed us to include all the key points from all the demonstrations performed. Thus, we were able to preserve the key features of the task that were missing from some demonstrations. Thus, we got more key features of the task as compared to Asfour et al. [8], where they used the common key points concept, i.e. they just use the key points that are found in all demonstrations. Therefore, the generalized trajectory only includes the features that are present in all demonstrations.
It was experimentally determined that three or four demonstrations for most of the task, tends to produce a generalized trajectory closest to the most probable demonstrated trajectory. The learning of the orientation played a crucial role as well. For many tasks like picking up the wooden block, the orientation of the gripper has to be perfectly aligned with the object to grab the object from the right place. The model learning also involved learning of the gripper states. The model learned should not only learn whether the gripper should be closed or open, but also the width it should open or close to grasp or release an object. This ensures that the robot does not break delicate objects by learning the grasping width of the gripper during the task.
X Conclusions
In this paper, a novel RLfD framework was introduced, which used HMM to model the trajectory skill demonstrated to a collaborative robot. Our results show that the robot can collaborate with the human teacher and learn variety of trajectory based task in short time and reproduce the task while working with a human user.
In future, other collaborative robots can be trained to learn complex task using the developed RLfD framework. With the advancements in the field of image processing and visual servoing, RLfD can be made more robust by allowing the robot to learn the position and orientation of the task relevant objects in addition to the position and orientation of the manipulator. This way, the robot will be able to adapt to the changing environment, as it will learn the environment parameters like distance from task relevant objects. This will enable the robot to learn a task and then reproduce it in a dynamic environment.
References

[1]
C. G. Atkeson and S. Schaal, “Robot Learning From Demonstration,” in
Proceedings of the Fourteenth International Conference on Machine Learning
, pp. 12–20, 1997.  [2] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and autonomous systems, vol. 57, no. 5, pp. 469–483, 2009.
 [3] S. Calinon, F. Guenter, and A. Billard, “On Learning, Representing, and Generalizing a Task in a Humanoid Robot,” IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol. 37, pp. 286–298, 4 2007.
 [4] S. Calinon, I. Sardellitti, and D. G. Caldwell, “Learningbased control strategy for safe humanrobot interaction exploiting task and robot redundancies,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 249–254, IEEE, 10 2010.
 [5] S. Calinon, “Robot Learning with TaskParameterized Generative Models,” in Proc. Intl Symp. on Robotics Research (ISRR), 2015.
 [6] M. Schneider and W. Ertel, “Robot Learning by Demonstration with local Gaussian process regression,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 255–260, 10 2010.
 [7] L. Rozo, P. Jiménez, and C. Torras, “A robot learning from demonstration framework to perform forcebased manipulation tasks,” Intelligent Service Robotics, vol. 6, pp. 33–51, 1 2013.

[8]
T. Asfour, P. Azad, F. Gyarfas, and R. Dillmann, “Imitation learning of dualarm manipulation tasks in humanoid robots,”
International Journal of Humanoid Robotics, vol. 5, no. 02, pp. 183–202, 2008.  [9] S. R. Ahmadzadeh, R. Kaushik, and S. Chernova, “Trajectory learning from demonstration with canal surfaces: A parameterfree approach,” in 2016 IEEERAS 16th International Conference on Humanoid Robots (Humanoids), pp. 544–549, IEEE, 11 2016.
 [10] M. Koskinopoulou, S. Piperakis, and P. Trahanias, “Learning from Demonstration facilitates HumanRobot Collaborative task execution,” in 2016 11th ACM/IEEE International Conference on HumanRobot Interaction (HRI), pp. 59–66, IEEE, 3 2016.
 [11] B. Akgun and A. Thomaz, “Simultaneously learning actions and goals from demonstration,” Autonomous Robots, vol. 40, pp. 211–227, 2 2016.
 [12] G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, “Robot learning from demonstration by constructing skill trees,” The International Journal of Robotics Research (IJRR), vol. 31, pp. 360–375, 3 2012.
 [13] N. Dantam, I. Essa, and M. Stilman, “Linguistic transfer of human assembly tasks to robots,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 237–242, 10 2012.
 [14] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
 [15] E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, vol. 7, pp. 358–386, 3 2005.
 [16] A. Vakanski, I. Mantegh, A. Irish, and F. JanabiSharifi, “Trajectory learning for robot programming by demonstration using hidden Markov model and dynamic time warping,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 4, pp. 1039–1052, 2012.
Comments
There are no comments yet.