Nowadays, along with the advances in sensing and learning techniques, applications of robots have been extended from controlled to unstructured and complex environments [2, 3]. Company of robots in humans’ daily life have caused lots of difficulties in designing and programming them, since they should operate in complex environments with unpredictable or time-varying dynamics and interact with humans [2, 3, 4]. Moreover, ordinary users generally do not have enough expertise to program robots for new tasks [2, 3]. In addition, to gain acceptance as an intelligent companion in our everyday life, robots should be sociable. They should understand the meanings of their partners’ motions and body language, and respond accordingly. These requirements and limitations specify the necessity of developing socially interactive learning methods for robots to enable them to effectively cope with new environments and tasks instead of being manually pre-programmed [2, 3, 4].
Inspiring by the efficient social learning methods in animals and humans (e.g. mimicry, emulation and goal emulation), researchers proposed natural and user-friendly ways to teach robots, which is called robot programming by demonstration or imitation learning [2, 3, 4]. Although all the social learning methods from the high-level knowledge transfer to the low-level exact regeneration of observed demonstrations are mistakenly known as imitation, but there are stark differences between them [3, 4, 5]. In the high-level methods, in contrast to the low-level ones, understanding the teacher’s intentions along with regenerating actions are required [3, 4, 5]. In this level, also called ”true imitation”, skills are abstracted in a generalized symbolic representation. Abstraction, conceptualization and symbolization are bases of true imitation. They bring decreased state-space as one of the requirements of real applications in addition to expediting the knowledge transfer from one agent or situation to another [3, 6, 5, 7, 8].
In recent years, abstraction and symbolization have received a great deal of attention by researchers in the field of imitation learning [6, 7, 8, 9, 10, 11]. A considerable portion of the proposed methods inspired by the presumed role of mirror neurons in imitative behaviors of animals and humans [6, 7, 9, 10]. Tani et al. [9, 10, 12]
proposed an offline bio-inspired method called recurrent neural networks with parametric biases (RNNPB), as a model of mirror neuron system. In this model, the observed spatio-temporal demonstrations are learned and abstracted by the network based on their perceptual properties. Moreover, Inamura et al. proposed another bio-inspired imitation learning method inspiring the mirror neurons and mimesis theory 
. In this model, hidden markov models (HMMs) are used for abstracting and symbolizing the observed human motions as well as for recognizing and generating them. Demonstrations of different motion patterns are manually grouped and encoded into distinct HMMs in an offline manner. The number of HMMs representing different behaviors should be known a priori; which is not suitable for real applications. Moreover, the method is not incremental, meaning that it does not give robot the ability to learn concepts gradually and autonomously in cooperation with the partners in order to keep itself socially competent.
Considering these shortcomings into account, some methods were proposed for incremental learning of human motions [8, 11]. One of the prominent representative algorithms is proposed by Kadone and Nakamura . This model affords autonomous segmentation, abstraction, memorization and recognition of demonstrated motions using associative neural networks. Kulic et al.  proposed another well-known incremental and autonomous imitation learning method for acquisition, symbolization, recognition and hierarchical organization of whole body motion patterns using Factorial HMMs.
Although, in all the mentioned studies [7, 8, 9, 10, 11, 12], only the perceptual similarity among observed demonstrations are addressed for abstraction and symbolization, but there are some perceptually different concepts that have the same functional effects or semantic meanings, called relational concepts [6, 14, 15, 16]. These concepts cannot be specified merely based on their perceptual properties and an extra information is needed to acquire them [6, 14, 15, 16]. They are highly prevalence in humans’ social interactions and their everyday life; for instance, disparate gestures to convey the meaning of ”Hello” in different cultures. Therefore, functional categorization of observed demonstrations is also indispensable for robots coexisting with humans. However, despite the prevalence of relational concepts, not enough researches carried out in this field.
To the best of our knowledge, only a limited number of researches has been proposed for learning and abstracting concepts based on both their perceptual and functional properties [6, 14, 16, 17]. One of the basic models is proposed by Mobahi et al. . The model is just applicable for learning concepts from single observations, and is not directly extendible to continuous sequences of observations. In contrast, the proposed methods by Hajmirsadeghi et al. [6, 17] are applicable for learning concepts from spatio-temporal motion sequences using both perceptual and functional properties. In these models, each relational concept is represented by a group of distinct HMM prototypes that each symbolize a different perceptual variant of that concept. Separated modeling of prototypes in these models [6, 17], leads to neglecting their common structural relations and consequently each prototype should relearn the common knowledge again. Therefore, the learning speed decreases and more observations are needed for generalization. This is in contradiction to the main idea of the imitation learning that supports expediting the autonomous training of robots using the minimum number of demonstrations.
Considering the mentioned requirements and limitations, this paper presents a gradual and incremental learning algorithm to abstract and generalize the observed multimodal spatio-temporal demonstrations based on both their perceptual and functional characteristics during the imitation. The proposed method comprises low-level and high-level modules. The low-level module abstracts the observed spatio-temporal demonstrations based on their perceptual properties using an RNNPB network [9, 10, 12]. The high-level module acquires relational concepts based on the formed perceptual prototypes and the perceived teacher’s feedbacks. The proposed memory rehearsal procedure enables the robot to gradually extract and utilize the common structural relations among concepts. Therefore, the learning process is expedited especially at the initial stages and the generalization capability is improved as well as the robustness against noise and variations among observed demonstrations.
Ii ILoCI: The proposed method for Incremental Learning of Concepts by Imitation
In a nutshell, ILoCI has a low-level and a high-level module. The low-level module of ILoCI is a dynamic model of mirror neuron systems, called RNNPB, which abstracts the observed multimodal spatio-temporal demonstrations as perceptual concepts. For more details on RNNPB refer to [9, 10, 12]
. It automatically assigns a PB vector to each acquired perceptual prototype. The acquired PB vectors can be exemplars or prototypes based on their associated information in the high-level module. An exemplar PB vector stands for only one demonstration and a prototype PB vector is the medoid of demonstrations with sufficient perceptual similarity. All the exemplar and prototype PB vectors along with their associated information are stored in a memory in the high-level module, called ”Mem” (see Table I). A relational concept is defined as a set of perceptually variant exemplars and prototypes in the memory that have same functional properties. The high-level module learns the relational concepts by employing the low-level module and the acquired teacher’s feedbacks through interactions. Fig. 1 illustrates the relations among exemplars, prototypes and concepts. In the sequel, ILoCI is explained in more details.
|Mem||nPrototypes||Int||Number of all consolidated exemplars and prototypes in Mem.|
|TrajectoryNet||Network||The RNNPB that abstracts and symbolizes the consolidated exemplars and prototypes.|
|PBs||Set||PB vectors assigned to the learned demonstrations by TrajectoryNet.|
|PBs_rec||Set||PB vectors generated by TrajectoryNet when recognizing each learned demonstration.|
|numSamples||Set||Number of sufficiently similar observed demonstrations associated to each consolidated exemplar or prototype in Mem.|
|numSteps||Set||Number of time steps of each consolidated demonstrations in Mem.|
|initialInfo||Set||Initial configuration of each consolidated demonstration in Mem.|
|conceptLabels||Set||Concept label assigned to each of the consolidated demonstrations in Mem.|
|generationError||Set||Error of regenerating each consolidated demonstration in Mem.|
Ii-a Learning Phase
The main procedure of ILoCI is an iterative cycle triggered by the advent of a new teacher’s demonstration. After perceiving a new demonstration, the smoothing, scaling and fitting post-processes are activated consecutively. Then, the processed demonstration is fed into the inverse kinematics function to compute its corresponding motor data. Afterwards, the obtained sensory and motor data are input into the low-level module to recognize the corresponding concept.
After recognizing the concept, the robot performs an action in response to the teacher and receives a reinforcement signal accordingly. Receiving a reward, the robot uses the observed demonstration to update or develop its memory. In contrast, in the case of punishment, robot tries other available concepts until receiving a reward. If none of the former concepts in the robot’s memory are proper for the new demonstration, a new concept will be generated and consolidated in memory. In this way, the robot gradually and incrementally learns and develops the relational concepts in imitation of the teacher to increase its lifetime rewards. In following, steps are described in more details.
Ii-A1 Perceiving new demonstration
At first an observation from the teacher goes through pre-processing. Details are described in Section III. After preparing the observed motion sequence, the robot tries to find its associated concept. To do so, the observed motion sequence in terms of sensory and motor data, is fed into Mem.TrajectoryNet and the value of is computed for it by back propagating and minimizing the error between the target and the predicted values of sensory and motor data. Afterwards, in order to find the most similar consolidated in memory, the computed value of is compared with the untried associated values of the consolidated concepts in memory.
The concept of the most similar consolidated exemplar or prototype is selected as the guessed concept of the novel observed demonstration () and is added to the set of the currently tried concepts (). Then, in response to the teacher, the robot executes the action with the lowest generation error among the actions with concept in its memory. After performing the selected action, robot receives a feedback (reward or punishment) from the teacher, which helps it to adjust its concepts. According to the received reinforcement signal, robot faces three situations:
Receiving positive reinforcement signal with high similarity between the compared PB vectors: A positive feedback shows that the robot has found the concept of the newly observed demonstration correctly. Moreover, it is an evidence of a highly similar exemplar or prototype for the that demonstration in the robot’s memory and fulfills the need of relearning. Therefore, the most similar consolidated demonstration in the robot’s memory is strengthened as a potential candidate for the new observed demonstration.
Receiving positive reinforcement signal with low similarity between the compared PB vectors: In this case, has been found correctly but there is no enough perceptually similar exemplar or prototype for that demonstration in the memory. Therefore, the robot should learn a new prototype for that relational concept in its memory and consolidate it through memory rehearsal. After a while, memory may be overpopulated with perceptually similar exemplars and prototypes. Therefore, these demonstrations should be abstracted and clustered in order to select the best representatives of their counterpart clusters. Thus, a complete link hierarchical agglomerative clustering is called when a new exemplar of a concept is added to the memory while the number of samples of both prototypes and exemplars of that concept exceeds . Afterwards, final valid clusters are selected based on two criteria. First, the number of demonstrations should exceed a predefined threshold with at least one exemplar in the cluster. Second, the mean of the pairwise Euclidean distances among PB vectors within the cluster should be less than (1). This threshold is computed based on the mean (
) and the standard deviation () of the pairwise Euclidean distances across all vectors in the clusters of the desired concept.
In (1), is a predefined parameter that controls the granularity level of the algorithm. Higher values of lead to more number of specific prototypes; while lower values bring more general prototypes. However, all variant perceptual prototypes of a concept will be generalized as one relational concept in the high-level module. In our experiments, this parameter is set to an equitable value selected based on some prior knowledge and trial and errors. However, it could be set to a desired value to satisfy the requirements of the application. Fig. 2 illustrates the clustering process when a new demonstration of triangle concept is added to the memory.
Receiving negative reinforcement signal: In the case of receiving a negative signal, the robot uses its next most similar untried concept (i.e the one not in ), until it receives a positive feedback. If the robot uses all its learned concepts without receiving a positive feedback, then the new observed demonstration will be learned as a novel exemplar of a new concept using memory rehearsal.
Ii-A2 No other untried concept exists in the robot’s memory
This situation means that none of the former tried concepts in the robot’s memory were proper for the novel demonstration; therefore, a new concept is generated and the new demonstration is consolidated in the robot’s memory as an exemplar of that concept through memory rehearsal.
Ii-A3 Memory rehearsal
Memory rehearsal is performed to learn a novel demonstration of a new concept, or to form a novel prototype for an earlier learned concept. Learning new demonstrations faces memory interference which damages previously learned patterns in the memory. This is due to the distributed representation of all patterns in a single network (various patterns share the same synaptic weights in the network). Despite its numerous advantages, memory interference is one of the challenges of employing distributed representation scheme to abstract patterns. To overcome this difficulty, rehearsing and consolidation according to a biological hypothesis is employed.
In the memory rehearsal, previous consolidated prototypes and exemplars in Mem are first regenerated using Mem.TrajectoryNet as a long-term memory. To do this, the values of PB neurons and initial input neurons in the network’s input layer are set to the associated values of the consolidated prototypes or exemplars in Mem. Then, the corresponding patterns are regenerated. The regenerated patterns are temporarily stored in a temporal storage called temporal memory. New demonstration is also added to the temporal memory. Next, Mem.TrajectoryNet is trained with all the patterns in the temporal memory, starting from the previous network in order to speed up the network’s training process. After that, Mem is updated based on the new trained Mem.TrajectoryNet and the prior associated information of patterns in temporal memory (e.g. nPrototypes, numSamples, numSteps, initalInfo and conceptLables). Finally, the temporal memory is released.
Like infants in their early years of life, a naïve robot should spend considerable time for learning a sufficient number of patterns through rehearsing and consolidation. In this step, more interactions with teacher are needed to learn concepts during imitation. However, as time passes, the robot has a variety of previously learned concepts in its memory and consequently it responds to the teacher more appropriately with less interactions. But, it is clear that by observing a new concept, the robot should spend time to rehearse and consolidate it. This is similar to the costs and practices that humans experience to learn a new skill.
Ii-B Inference Phase
In an incremental method, the learning process never stops. However, to assess the performance of ILoCI, an inference phase is designed. In this phase, no further feedbacks are provided by the teacher. When observing a new demonstration, the robot uses its current acquired knowledge during the learning phase to recognize the concept of the new demonstration. is computed and its value is compared with the values of consolidated vectors in the memory. The concept of the most similar vector is considered as the concept of the demonstration and a proper action is responded to the teacher.
Iii Results and Discussion
To assess the generalization ability of ILoCI in facing large number of concepts and to make it directly comparable with other competing algorithms, its performance is evaluated on a standard benchmark data set, called LASA [19, 20]. LASA consists of 26 various handwriting motions, collected from pen input using a tablet PC [19, 20] (supplementary data). All motion shapes constitute 22 distinct relational concepts together in total. It is worthy to note that the shapes are incrementally and gradually demonstrated to the robot to learn their relational concepts while imitating and interacting with the teacher.
To recognize and generate the observed demonstrations in future, the robot needs to learn the motor data along with the associated observed sensory information. Thereby, the observed teacher’s handwriting motion is scaled and fitted in a selected y-z plane in the robot’s workspace. The selected workspace, depicted as a supplementary figure, assures the feasibility of executing the action by robot considering its physical limitations and valid workspace . Our test platform is the Aldebaran Robotics Nao humanoid robot version V3.2 . After scaling and fitting processes, the joint angles of Nao’s right arm will be obtained by applying the built-in inverse kinematics module (IK) on the processed demonstration. To make the results invariant to the possible translational and rotational transformations, the relative displacement values of sensory and motor data are used instead of their absolute values as the inputs to the learning algorithm.
Five-fold cross-validation is used to examine the performance of the proposed algorithm. Each fold consists of different combinations of demonstrations for training and testing. Variant perceptual representations of each shape are randomly divided to five partitions and each of the partitions is used once as training and four times as testing data set. The ideal situation for the robot is to learn the concepts fast while observing only a few numbers of demonstrations and acquiring more comprehensive prototypes. Thus, only 20% of the demonstrations are used for training and the remaining 80% are used for testing in each fold. In the experiment, Mem.TrajectoryNet has 6 input/output nodes, 4 PB neurons, 25 context and 60 hidden neurons. Moreover, , and are set to 0.5, 3 and 0.1 values, respectively.
The average correct classification rate over all five folds is during the inference phase. Table II presents the sparse representation of the average normalized confusion and confidence matrices. The full representation of these matrices are available as supplementary data. True positive values in Table II show that the robot can correctly recognize demonstrations of each relational concept with high confidence values. Although, some motion shapes in LASA data set have considerable degree of similarity with each other, but the algorithm can discriminate them properly. These similarities also explain the false negative values for some shapes like Line and Saeghe as well as Angle, NShape and Worm. However, the low confidence values for the false negatives indicate that the algorithm is unsure about these results. This ability to properly judge its outcomes brings the metacognition property to the robot.
|Predicted Concept:(Normalized Confusion, Confidence)|
|Angle||Angle: (66.67, 1.71), Line: (3.33, 0.01), NShape: (13.33, 0.15), Trapezoid: (10, 0.12), Worm: (6.67, 0.07)|
|BendedLine||BendedLine: (100, 9.12)|
|CShape||CShape: (96.67, 7.08), Sshape: (3.33, 0.01)|
|GShape||GShape: (80, 2.98), CShape: (6.67, 0.13), Sshape: (10, 0.04), Worm: (3.33, 0.06)|
|JShape||JShape: (100, 7.99)|
|Khamesh||Khamesh: (100, 4.31)|
|LShape||LShape: (96.67, 2.51, Heee: (3.33, 0.04)|
|Leaf||Leaf: (91.67, 6.52), JShape: (1.66, 0.13), Snake: (6.67, 0.12)|
|Line||Line: (86.67, 3.13), Saeghe: (13.33, 0.21)|
|NShape||NShape: (60, 1.65), Angle: (6.67, 0.07), Worm: (33.33, 0.68)|
|Pshape||PShape: (96.67, 2.87), Trapezoid: (3.33, 0.02)|
|RShape||RShape: (100, 4.05)|
|Saeghe||Saeghe: (80, 3.32), Line: (20, 0.53)|
|Sine||Sine: (100, 3.69)|
|Snake||Snake: (100, 4.34)|
|Spoon||Spoon: (95, 3.54), Heee: (5, 0.04)|
|Sshape||Sshape: (90, 2.96), GShape: (10, 0.03)|
|Trapezoid||Trapezoid: (100, 3.88)|
|WShape||WShape: (95, 3.20), Khamesh: (5, 0.03)|
|Worm||Worm: (86.67, 2.61), NShape: (13.33, 0.49)|
|ZShape||ZShape: (100, 3.56)|
|Heee||Heee: (65, 2.02), LShape: (10, 0.10), Spoon: (3.33, 0.15), ZShape: (21.67, 0.21)|
Sparse representation of the average normalized confusion matrix and the corresponding average confidence values on LASA handwriting data set over 5-fold cross-validation. Bold texts indicate true positive values.
Moreover, to assess the learning speed and the interaction quality of the proposed algorithm, the reinforcement signals given by the teacher during the learning phase is investigated. Fig. 3 shows the average reinforcement signals (over five folds) given by the teacher. Because of the discrete nature of the reinforcement signals (+1 for reward and -1 for punishment), the results in Fig. 3 has been smoothed with a backward moving average with window length of seven to reflect the expected behavior clearly. Results show that robot is capable of learning the relational concepts of the observed demonstrations very fast especially at the initial stages of learning. According to Fig. 3
, in 85% (in average) of the experiments, the robot has correctly recognized the relational concepts in the first interaction after merely learning 45 demonstrations (25% of the data). Two specific reasons can be cited for this notable property. First, when a new demonstration with a novel relational concept is observed, it will be consolidated and probably updated later in the memory as a representative of the perceived relational concept. Consequently, the robot has at least one representation for each relational concept in the memory due to the functional abstraction. Therefore, it can recognize new demonstrations quickly using prototypes in its memory without relearning them from scratch. Second, all consolidated exemplars and prototypes are stored in one memory (distributed representation) through memory rehearsal which brings about the utilization of their common structural relations in order to expedite and enhance the learning process.
Furthermore, ILoCI unites all different perceptual prototypes of each relational concept in the high-level module based on the teachers’ feedbacks. Fig. 4 shows the symbol space (PB space) of the acquired perceptual prototypes in the fifth fold using non-metric multidimensional scaling (MDS). The figure shows the 2D visualization of the acquired 4D PB vectors. In Fig. 4, all PB vectors associating with different perceptual prototypes of one relational concept are represented with same markers, which shows their unity as one relational concept (e.g. both acquired perceptual prototypes of BendedLine are shown with blue square markers). The results also show that ILoCI almost finds the same number of perceptual prototypes for each relational concept as the number of their real perceptual variants. However, two different perceptual prototypes are acquired here for LShape which has only one distinct perceptual representation since the teachers can draw shapes freely. So, the observed demonstrations might vary and consequently two different perceptual prototypes are formed for LShape. However, it is notable that all variant perceptual prototypes of each relational concept are unified in the high-level module through the functional abstraction.
In addition, the proposed algorithm generates smooth and comprehensive prototypes for each relational concept, despite the discrepancies in the observed demonstrations, without any smoothing post-processing. Fig. 5 shows one regenerated example for each acquired relational concept by the robot. The smoothness of the generated prototypes supports the generalization ability of the algorithm. The supplementary videos show the execution of some presented motions by Nao humanoid robot.
This paper introduced an incremental and gradual model for learning concepts by imitation as one of the manifestations of true imitation learning. The presented algorithm autonomously and incrementally learns concepts from observed multimodal spatio-temporal demonstrations, based on both their perceptual and functional properties during imitation. It abstracts demonstrations both at the trajectory and the symbolic levels, which is a significant challenge in integrating the symbolic AI and the continuous control of robots . In this method, all perceptual concepts are incrementally learned in a single recurrent neural network through the proposed memory rehearsal. Functional similarities between concepts are also acquired through a limited number of interactions with the teacher. Incremental learning of acquired concepts together through memory rehearsal enables robot to utilize the common structural relations among demonstrations. Consequently the learning process is expedited especially at the initial stages and the generalization ability of the algorithm is also increased.
The performance of the proposed method was assessed using standard LASA benchmark data set [19, 20]. Results show that due to abstraction and generalization in both perceptual and functional spaces, robot acquires comprehensive prototypes and therefore it can truly recognize concepts of observed demonstrations during the imitation. The mentioned properties make the proposed method a good choice for real-world applications in which robots should comprehend intentions of their partners while interacting with them.
V Supplementary Material
The supplementary material is available at https://goo.gl/ojowSx.
-  M. Alibeigi, M. N. Ahmadabadi, and B. N. Araabi, “A fast, robust, and incremental model for learning high-level concepts from human motions by imitation,” IEEE Transactions on Robotics, vol. 33, no. 1, pp. 153–168, 2017.
-  C. C. Kemp, A. Edsinger, and E. Torres-Jara, “Challenges for robot manipulation in human environments,” IEEE Robotics and Automation Magazine, vol. 14, no. 1, p. 20, 2007.
-  A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Handbook of robotics chapter 59: Robot programming by demonstration,” Handbook of Robotics. Springer, 2008.
-  M. Lopes, F. Melo, L. Montesano, and J. Santos-Victor, “Abstraction levels for robotic imitation: Overview and computational approaches,” in From Motor Learning to Interaction Learning in Robots, pp. 313–355, Springer, 2010.
-  J. Call and M. Carpenter, “Three sources of information in social learning,” Imitation in animals and artifacts, pp. 211–228, 2002.
-  H. Hajimirsadeghi, M. N. Ahmadabadi, and B. N. Araabi, “Conceptual imitation learning based on perceptual and functional characteristics of action,” IEEE Transactions on Autonomous Mental Development, vol. 5, no. 4, pp. 311–325, 2013.
-  T. Inamura, I. Toshima, H. Tanie, and Y. Nakamura, “Embodied symbol emergence based on mimesis theory,” The International Journal of Robotics Research, vol. 23, no. 4-5, pp. 363–377, 2004.
-  H. Kadone and Y. Nakamura, “Segmentation, memorization, recognition and abstraction of humanoid motions based on correlations and associative memory,” in 2006 6th IEEE-RAS International Conference on Humanoid Robots, pp. 1–6, IEEE, 2006.
-  J. Tani, M. Ito, and Y. Sugita, “Self-organization of distributedly represented multiple behavior schemata in a mirror system: reviews of robot experiments using rnnpb,” Neural Networks, vol. 17, no. 8, pp. 1273–1289, 2004.
-  M. Ito and J. Tani, “On-line imitative interaction with a humanoid robot using a dynamic neural network model of a mirror system,” Adaptive Behavior, vol. 12, no. 2, pp. 93–115, 2004.
D. Kulić, W. Takano, and Y. Nakamura, “Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden markov chains,”The International Journal of Robotics Research, vol. 27, no. 7, pp. 761–784, 2008.
-  J. Tani and M. Ito, “Interacting with neurocognitive robots: A dynamical system view,” in Proc. 2nd int. workshop on man-machine symbiotic systems, kyoto, japan, pp. 123–134, 2005.
-  M. Donald, Origins of the modern mind: Three stages in the evolution of culture and cognition. Harvard University Press, 1991.
-  M. Mahmoodian, H. Moradi, M. N. Ahmadabadi, and B. N. Araabi, “Hierarchical concept learning based on functional similarity of actions,” in Robotics and Mechatronics (ICRoM), 2013 First RSI/ISM International Conference on, pp. 1–6, IEEE, 2013.
-  T. R. Zentall, M. Galizio, and T. S. Critchfield, “Categorization, concept learning, and behavior analysis: An introduction,” Journal of the experimental analysis of behavior, vol. 78, no. 3, pp. 237–248, 2002.
H. Mobahi, M. N. Ahmadabadi, and B. Nadjar Araabi, “A biologically inspired method for conceptual imitation using reinforcement learning,”
Applied Artificial Intelligence, vol. 21, no. 3, pp. 155–183, 2007.
-  H. Hajimirsadeghi, “Conceptual imitation learning based on functional effects of action,” in EUROCON-International Conference on Computer as a Tool (EUROCON), 2011 IEEE, pp. 1–6, IEEE, 2011.
-  L. R. Squire, N. J. Cohen, and L. Nadel, “The medial temporal region and memory consolidation: A new hypothesis,” Memory consolidation: Psychobiology of cognition, pp. 185–210, 1984.
S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dynamical systems with gaussian mixture models,”IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011.
-  A. Lemme, Y. Meirovitch, M. Khansari-Zadeh, T. Flash, A. Billard, and J. J. Steil, “Open-source benchmarking for learned reaching motion generation in robotics,” Paladyn, Journal of Behavioral Robotics, vol. 6, no. 1, 2015.
-  M. Alibeigi, S. Rabiee, and M. N. Ahmadabadi, “Inverse kinematics based human mimicking system using skeletal tracking technology,” Journal of Intelligent & Robotic Systems, pp. 1–19, 2016.
-  “Nao humanoid robot.” https://www.aldebaran.com/en, 2016.