Investigating the Effects of Robot Engagement Communication on Learning from Demonstration

05/03/2020 ∙ by Mingfei Sun, et al. ∙ The Hong Kong University of Science and Technology 0

Robot Learning from Demonstration (RLfD) is a technique for robots to derive policies from instructors' examples. Although the reciprocal effects of student engagement on teacher behavior are widely recognized in the educational community, it is unclear whether the same phenomenon holds true for RLfD. To fill this gap, we first design three types of robot engagement behavior (attention, imitation, and a hybrid of the two) based on the learning literature. We then conduct, in a simulation environment, a within-subject user study to investigate the impact of different robot engagement cues on humans compared to a "without-engagement" condition. Results suggest that engagement communication significantly changes the human's estimation of the robots' capability and significantly raises their expectation towards the learning outcomes, even though we do not run actual learning algorithms in the experiments. Moreover, imitation behavior affects humans more than attention does in all metrics, while their combination has the most profound influences on humans. We also find that communicating engagement via imitation or the combined behavior significantly improve humans' perception towards the quality of demonstrations, even if all demonstrations are of the same quality.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Robot Learning from Demonstration (RLfD) is a technique where a robot derives a mapping from states to actions, a.k.a policy, from instructors’ demonstrations atkeson1997robot . This technique has been shown to be successful in teaching robots physical skills by imitating instructors’ body movements e.g., pole balancingatkeson1997robot , tennis swings ijspeert2002movement , air hockey maneuvers bentivegna2002humanoid , etc. A standard RLfD process takes two steps: demonstration gathering step, which collects demonstrations from the human demonstrators, and policy deriving step, which reasons the underlying state-action mappings argall2009survey . Like a human learner, a robot in RLfD could have different strategies of gathering demonstrations according to its underlying policy derivation algorithms. For example, robots with the DAgger algorithm ross2011reduction learn progressively by taking incremental demonstrations from instructors, much like going through a scaffolding process jackson1998design ; saunders2006teaching . A robot can also learn more proactively. For example, if equipped with Confidence-Based Autonomy (CBA) chernova2009interactive , an interactive algorithm for RLfD, a robot can request demonstrations at the states of which it has little or no knowledge. These learning strategies have been proven to be very effective and thus widely adopted in RLfD laskey2017comparing .

Unlike human learners, robots in previous RLfD processes rarely show any engagement cues during the learning process. They mostly remain stationary without giving any feedback, especially when instructors are giving demonstrations (i.e., in the demonstration gathering step). In human tutelage, engagement cues play an important role in shaping instructors’ mental model of the learners skinner1993motivation . For example, learners’ attentional engagement, e.g., gaze, indicates their points of interest in the instructions. Imitation, a behavioral engagement cue, shows learners’ motivation to perform like the instructors chernova2014robot . It is reported that learner engagement cues could potentially affect instructor perceptions and behavior guthrie2001classroom . For example, in educational research, instructors are found to have the tendency to provide more support to learners of high behavioral engagement skinner1993motivation .

These effects of showing learning engagement, however, are less explored in the RLfD research, partly because designing engagement cues for robots in the context of RLfD is challenging. First, most of the existing methods for generating engagement cues in Human-Robot Interaction (HRI) cannot be directly applied to RLfD. For example, it is common practice in HRI to simulate robots’ attentional engagement by directing their gaze towards visually salient elements (e.g., color or lightness nagai2008toward ), specific objects (e.g., human faces sidner2004look ) or predefined events (e.g., pointing gestures breazeal2004humanoid ). This practice cannot be easily set up in RLfD because the robot’s allocation of attention should follow the development of instructors’ demonstrations. This is especially true in skill-oriented RLfD, where the robot needs to reproduce the body skills from the human demonstrator. In this context, the attention should be subject to the demonstrations, i.e., body movements, which are less constrained and highly dynamic compared to a standard HRI process. Methods for generating other engagement cues, e.g., imitation bailenson2005digital ; riek2010my ; li2015observer , also need further adaptation to accommodate the dynamic nature of RLfD. Second, even if an engagement cue can be designed effectively, its deployment in RLfD should be in real-time with low computational cost.

To this end, we focus on skill-oriented RLfD and propose two novel methods (Instant attention and Approximate imitation) to enable robots to communicate their learning engagement in a RLfD process. Note that we consider the demonstration gathering step as the interaction scenario since it determines the demonstration quality, which is crucial for the policy optimality argall2009survey ; sun2019adversarial . We do not focus on designing effective learning algorithms for the demonstration learning. The learning engagement cues are generated as follows: the Instant attention method generates robot attentional engagement by tracking instructors’ body movements through particle filters; the Approximate imitation method produces behavioral engagement, i.e., imitation, by partially mapping the instructor’s joint movements to those of the robot with approximations. We then use the proposed methods to generate three modes of engagement communication (via attention, via imitation, and via a hybrid of the two) for robots in RLfD. To investigate the effects of the three engagement modes on humans, we compare them with another mode (“without-engagement” in which the robot remains stationary as most robots do in existing RLfD studies atkeson1997robot ; ijspeert2002movement ; bentivegna2002humanoid ) by a within-subject user study in a simulation environment. Results suggest that robots with the proposed cues are perceived to be more engaged in the learning process and their behaviors are more socially acceptable in RLfD than the robots without. Also, having engagement cues significantly affect human’s estimation of the robots’ learning capabilities. The robots which communicate engagement in RLfD are perceived to be significantly more capable in learning than the robots without, even though none of them are equipped with the learning algorithms. The engagement communication also affects the human’s expectation towards the final learning outcomes. Furthermore, behavioral cues influence humans’ perceptions significantly more than attentional engagement does, while the hybrid cues significantly outperform the other two. We also find that showing behavioral or combined engagement significantly improves humans’ evaluation of demonstration quality. Specifically, the human participants perceived the demonstrations to be significantly more appropriate for the robot to learn when the robot communicates its engagement via behavioral or the mixed engagement, even though all demonstrations are actually of the same quality.

The contributions of this paper are as follows. First, we propose two novel algorithms which allow robots to generate attention and imitation behavior to communicate its learning engagement with low computations in RLfD. Second, we developed a simulation platform to evaluate the effect of engagement communication in RLfD. Third, we take a first step towards evaluating the effects of three types of engagement cues (attention, imitation, and hybrid) on humans. Through evaluation in a simulation environment with a humanoid robot learning the different skills from a simulated demonstrators, we show interesting findings on the design of robot engagement communication in RLfD. To the best of our knowledge, this paper is the first to systematically investigate how the robot engagement communication affects the humans’ perceptions and expectations of the robot in RLfD.

2 Related work

2.1 Robot Learning from Demonstration (RLfD)

Robot Learning from Demonstration (RLfD) is also known as “Programming by Demonstration”, “imitation learning”, or “teaching by showing” 

schaal1997learning . Rather than exhaustively searching the entire policy space, RLfD enables robots to derive an optimal policy from demonstrators’ (also called instructors) demonstrations atkeson1997robot

. Usually, this technique does not require additional knowledge about programming and machine learning from human instructors, and thus opens up new possibilities for common users to teach robots 

crick2011human

. Existing studies on RLfD focus mainly on policy derivation algorithms, e.g., mapping states to actions by supervised learning 

chernova2009interactive

, updating the policy by value iteration in Reinforcement Learning 

atkeson1997robot , and recovering rewards to explain demonstrations by Inverse Reinforcement Learning abbeel2004apprenticeship ; sun2019adversarial . Some studies also work on designing robots’ reciprocal learning feedback to communicate what the robots have learned to human teachers, e.g., demonstrating the robot’s current learned policy calinon2007active , providing verbal and/or nonverbal cues koenig2010communication ; admoni2015robot ; pitsch2013robot ; sun2017sensing ; breazeal2004humanoid ; chao2010transparent , or visualizing where they succeed and fail sena2018teaching . These studies, however, largely overlook how the robots’ engagement behavior would affect the instructors and their demonstrations, especially during the demonstration gathering step. Hence, in this work, we consider how to generate behavior which allow robots to communicate their learning engagement to instructors, and investigate their potential effects on RLfD.

2.2 Engagement and learning engagement cues

Engagement is a broad concept in HRI with many different definitions. Some studies focus on the whole spectrum of an interaction, and defines engagement as the process of initiating, maintaining, and terminating the interaction between humans and robots sidner2005explorations . Others narrow the notion of engagement down to the maintenance of interactions, interpreting engagement as humans’ willingness to stay in the interaction yamazaki2009revealing ; szafir2012pay .

In the context of learning, engagement mainly refers to the state of being connected in the learning interaction, which can be measured from three aspects: cognition, behavior, and emotion silpasuwanchai2016developing . Cognitive engagement is closely related to the allocation of attention as it is one of the most important cognitive resources pekrun2012academic . Failure to attend to another person indicates a lack of interest argyle1976gaze . Thus, we adopt attention as a cue to communicate cognitive engagement in RLfD. Behavioral engagement is captured by task-related behavior, e.g., task attempts, efforts, active feedback, etc. Imitation, a common behavioral engagement signal, refers to “non-conscious mimicry of the postures, mannerisms, facial expressions, (speech), and other behavior of one’s interaction partners” chartrand1999chameleon . In interpersonal communications and HRI, the imitation behavior increases the likelihood of understanding chartrand2005beyond , interpersonal coordination bernieri1991interpersonal and emotional contagion hatfield1993emotional . In the context of learning, the imitation behavior also indicates the robot’s internal status in learning, e.g., the progress and motivation chernova2014robot . Thus, we use imitation as a way to communicate the behavioral engagement for robots in RLfD. Emotional engagement is associated with the affective status evoked by the interaction, including valence and arousal. Despite its importance, emotional engagement is hard to apply in RLfD since most existing RLfD robot systems lack the full ability to express emotions. In the scope of this paper, we define the robot learning engagement as the involvement in the learning process, with a focus on its cognitive engagement, i.e., attention, and behavioral engagement, i.e., imitation. The following subsection presents related work on generating attention and imitation behavior to communicate engagement.

2.3 Robots’ communication of engagement

In HRI, a robot can communicate its attention via different physical channels, e.g., gaze lockerd2004tutelage ; kuno2007museum ; mutlu2006storytelling ; mutlu2009footing , head orientation lockerd2004tutelage ; sun2017sensing , and body postures takayama2011expressing . Regardless of which channel they use, robots are usually programmed to pay attention to salient elements, including but not limited to colors breazeal2004humanoid , objects with visual intensity nagai2008toward , and movements breazeal2004humanoid ; nagai2008toward . For example, Nagai et al. regarded visually outstanding points in the surroundings, in terms of their colors, intensities, orientations, brightness and movements, as points of attention nagai2008toward . Other work directs robots’ attention to specific objects, e.g., human faces sidner2004look and colorful balls anzalone2015evaluating to name a few, or predefined events, e.g., pointing gestures lockerd2004tutelage . For example, Sidner et al. designed a robot that pays attention to participants’ faces for most of the time sidner2004look . Lockerd et al. drove the robot attention mechanism with interaction events, such as looking at an object when it is pointed at or looking at a subject when the person takes a conversational turnlockerd2004tutelage . To accommodate multiple events, a state transition diagram is usually adopted to control any attention shifts breazeal2004humanoid ; lockerd2004tutelage . Though these studies provide insightful information about the design of robot attention, their approaches may not easily be applicable to skill-oriented RLfD as the point for attention in instructors’ body movements is dynamically changing.

Compared to attention, the imitation behavior has been less widely adopted as a robot engagement cue. The robot imitation of a human participant’s behavior in real-time is inherently challenging due to the correspondence problems argall2009survey as well as the robot’s physical constraints kim2009stable ; suleiman2008human ; koenemann2014real . Hence, instead of generating full-body imitation behavior, some HRI researchers proposed to do partial imitations. For example, Baileson and Yee built an immersive virtual agent that subtly mimicked people’s head movements in real-time bailenson2005digital . A similar imitation strategy was applied by Riek et al. to a chimpanzee robot riek2010my . In addition to head imitation, gesture “mirroring” has also been implemented by Li et al. on a robot confederate li2015observer . Although these studies showed that partial imitation behavior improve participants’ perception of robots’ capabilities gonsior2011improving ; fuente2015influence , they mainly used ruled-based methods bailenson2005digital or predefined behavior li2015observer , which may not be transferable to RLfD scenarios. In this work, we employ the same strategy and allow robot learners to make partial imitations. Different from existing work, we take an algorithmic approach to automatically generating approximate imitations of instructors’ body movements for robots in real-time.

3 Learning Engagement Modeling

This sections presents two methods for generating engagement cues. The first subsection briefly introduces human body poses and forms the basis of the proposed methods. The remaining subsections describe the methods in detail.

3.1 Representation of the body pose

In RLfD, instructors usually demonstrate a point via their body movements. Our proposed methods thus use human body poses to generate attention and imitation behavior. A body pose is usually depicted by a tree-like skeleton, with nodes as human joints and links as body bones (shown in Figure 

1). Mathematically, this skeletal structure can be represented in two forms 111Usually, the two forms are readily available in most body pose extracting sensors, e.g., Kinect.: the position form and the transformation form.

Position form: The position form describes the body pose in a single frame of reference (usually the sensor frame), as shown in Figure 1(a). In this form, the pose skeleton is denoted as , where

is the position vector of the

-th joint in the skeleton, and is the number of joints. This form gives for each joint its global position, providing the potential attention point for the robot. Hence it is used for the Instant attention algorithm to generate robot attention points.

Transformation form: The transformation form describes the body pose in a series of frames of reference sun2019estimating , as shown in Figure 1(b). In particular, each joint has its own frame (a right-handed frame), and the links in the tree-like skeleton define parent-child structures between frames. The pose of a non-root joint is then described by a translation (i.e., the bone length) and a rotation (i.e., joint movement) in its parental frame, with the root joint (often the hip joint) described in the sensor frame. This form decomposes a human body movement into joint rotations (body-independent) and joint translations (body-dependent) in a way that the movement can be easily imitated by robots: just mapping the rotations onto robot joints. We denote this form as , and use it for the Approximate imitation algorithm to obtain approximate imitation behavior.

Figure 1: (a) A body pose in the position form: all joints are described in a single frame by their positions; (b) A body pose in the transformation form: each joint has its own frame and the skeleton defines parent-child structures and translations between frames; the frame with -- labels is the root frame and is referred in the sensor frame.

3.2 Instant attention

The attentional engagement for robots is generated based on the cognitive theories on human attention. Generally speaking, a generation process of human visual attention involves two stages jonides1983further : first, attention is distributed uniformly over the visual scene of interest; then, it is concentrated to a specific area (i.e., it is focused) for gaining information eriksen1972temporal . In a skill-oriented RLfD process, the instructor demonstrates skills mainly through their body joint poses. The above mechanism thus corresponds to that the human joints of interest are tracked uniformly at the initial stage, and then one joint providing the most information for learning is picked as an attention point. As for a demonstration learning, the more predictable/track-able a body joint movement is, the less information the robot could gain from that part, and consequently, less attention the robot should pay to it. In other words, if a body joint moves out of expectation the most among all joints, it will be worth paying attention to.

To this end, we use the particle filter (PF) as it is robust and effective in predictions konidaris2012robot and tracking arulampalam2002tutorial . In short, PF is a Bayesian filter which uses a group of samples to approximate the true distribution of a state thrun2005probabilistic . Particularly, given the state observations, PF employs many samples (called particles) to describe the possible distribution of that state. The particles are denoted as

(1)

Here is the number of particles in the particle set . Each particle (with ) is a hypothesis as to what the true state might be at time , and is first produced by a prediction model which is based on all history observations , i.e., . At each updating stage, particle is then re-sampled according to the importance weight

, i.e., the probability that the the particle

is consistent with the current observation , i.e., . In other words, each survives into the next stage with the probability . For more details on the particle filter, refer to thrun2005probabilistic .

We apply one PF to track each relevant joint during the human demonstration. Specifically, state describes the joint position in the sensor frame. We assume the state transits with additive Gaussian noise:

(2)

where is the observed joint shift: ; and

is the multivariate normal distribution with zero mean and diagonal covariance matrix

. The importance factor for each particle is defined to be exponential to the Euclidean distance between the predicted and observed joint position:

(3)

where is the normalizer. Each joint in the body pose is tracked by a particle cloud, a group of particles

. In order to dynamically adjust the cloud size in accordance with the joint movement, the variance

is set to be proportional to the average Euclidean distance between the predicted and observed joint position:

(4)

where is a hyper-parameter and is the number of particles. The indicates the cloud size: the greater the is, the more attention the robot should pay to the associated joint. Thus, the joint with maximum corresponds to the attention point. In the experiment, the is set to for best tracking of human joints. Note that, though the importance factor is calculated as the distance between predicted and observed joint positions, it is not equivalent to the measure of joint acceleration. In particular, the predicted joint position is just an estimate, and the difference between predicted and observed joint positions measures how much the estimate deviates from the truth. The importance factor thus implies the unpredictability and can only be computed after the current observation is available.

Figure 2 illustrates how the PF works to generate attention. The particle cloud functions as the robot’s prediction of the joint future movements, and is subject to change based on the current observations. Initially, the robot predicts the movements of all body joints of interest to be the same, i.e., all clouds are of the same sizes. During a demonstration, when a joint moves out of its cloud region, beyond the robot’s prediction, the cloud grows to catch that movement and the robot will thereafter be likely to pay attention to that joint. Likewise, if the joint movement is small, within the robot’s prediction, or no movement at all, the cloud shrinks and chances are small that the attention will be given to that joint. Overall, the cloud size indicates the predictability of the instructor’s body movements as well as the level of attention the robot needs to pay. At each time, the joint with the biggest cloud is picked as the attention point. This process loops with every new body pose as shown in Figure 3.

Figure 2: The particle clouds evolve over time: (a) all clouds are initialized at the same size; (b) if the joint movement is small, the cloud shrinks: the picked cloud becomes smaller since the elbow did not move; (c) if the joint moves out of its cloud region, the cloud grows to catch the movement: the picked cloud becomes larger to adapt to the elbow’s movements.
Figure 3: The flow chart of the Instant attention.

We now present a practical algorithm for Instant attention to generate attentional engagement instantly for robots (Algorithm 1). The algorithm takes TrackedJoints and the BodyPose in the position form as input, and outputs one attention point at each time. Specifically, the TrackedJoints contains the joints required to be tracked. In practice, the joints to be tracked are task-dependent, and should be defined according to the possible attention points on the instructor’s body. For example, a cooking robot may only need to track the instructor’s upper body movements and the joint correspondence can be configured by the developers based on the robot physical structures. Another input BodyPose is the human body pose in the position form. The algorithm runs as follows: first, it initializes a particle filter with the same covariance for each tracked joint (line 2-4). Then it estimates the distribution of the next joint position (line 9-11), followed by the estimation correction given the current position observations (line 12-13). Finally, the algorithm adjusts the covariance of the noise distribution to capture the joint movement (line 14), and the attention point is found by selecting the joint with the maximum covariance value (line 15).

Input: TrackedJointSet ; BodyPose , where is the 3D position of -th joint at time
Output: AttentionPoint
1 begin
2       for  each joint in  do
3             initialize -th particle filter for joint ;
4             initialize for joint ;
5            
6      for each joint in  do
7             if  is in  then
8                   ;
9                   obtain particles from -th particle filter;
10                   for m = 1 to M  do
                         sample with probability ;
                          /* Equation2 */
                         calculate ;
                          /* Equation3 */
11                        
12                  for m = 1 to M  do
13                         draw new particles with probability ;
14                        
                  update ;
                    /* Equation4 */
15                  
16            
17       ;
18       return ;
19      
20
Algorithm 1 Instant attention

Once an attention point is generated, say , it is worth mentioning that is actually located in the sensor frame. In order to obtain the accurate attention point of the robot, a further transformation is required. Figure 4 illustrates how to transform in the sensor frame into the robot head frame given the transformation from to .

Figure 4: The attention point is located in the sensor frame . We need to do the transformation to get in the robot head frame , where is the transformation from to .

The Instant attention method has several advantages. First, unlike other mechanisms (salience-based, object-based or event-based), this method utilizes the particle cloud to track the instructor’s joint movements, and automatically produces attention points based on the information gained from the movements. Second, the attention point is generated and shifted smoothly because the spatial size of the cloud evolves smoothly. Specifically, the particle distribution is iteratively sampled based on their previous distribution by the importance weight , i.e., a in survives into with probability , even if the joint moves abruptly (i.e.,

is large). Also, the cloud is immune to noises and outliers, e.g., joint vibrations caused by sensors, since small turbulence (no matter the exact speed) will not change cloud size (the

is averaged over all predicted states), while existing speed-/spatial-position-based methods could cause gaze jerks or sudden gaze shifts due to these noise/outliers. Third, the joints to be tracked can be dynamically changed, offering a flexible and adjustable attention mechanism based on the RLfD task.

3.3 Approximate imitation

Behavior imitation in robotics is usually formulated as an optimization problem, which needs to find the joint correspondence first argall2009survey , and then solve the inverse kinematics for the robot structure grochow2004style . Both of the processes are difficult, computationally intensive, and robot-configuration-dependent, hence not applicable for generating imitation behavior for general robots. On the other hand, psychological results reported that people mimic behavior to communicate engagement by adopting similar postures or showing similar body configurations according to the context chartrand1999chameleon

. We thus relax the behavior imitation in robotics as follows: First, the robot is not required to search blindly for the best joint correspondence since the joint correspondence is task-dependent. We allow the user to explicitly specify the joint correspondence according to the RLfD context. Second, for those robot joints whose Degree of Freedom (DoF) do not match the human joint, we only set the joint angles for the available robot joints to approximate the human movements. Though this solution of approximation may not be optimal in the sense of behavior mimicry, it runs very fast (in real-time) to generate behavioral engagement, achieving a balance between simplicity and optimality.

To achieve this, we propose the algorithm Approximate imitation, which allows robots to generate similar motions as the demonstrator’s for specified joints. Given the joint correspondence, the algorithm runs with two steps: frame transformation, and rotation approximation, as presented in Figure 5.

Figure 5: The flow chart of the Approximate imitation

The frame transformation is to transform the instructor’s body pose to match the robot frames. To be specific, we leverage the transformation form of body poses to decompose the frame matching into two steps: first, rotation alignment, and then translation alignment. The rotation alignment is to rotate the human joint frames so that their axes are aligned with the robot joint frames, as shown in Figure 6(a); the translation alignment is to translate the human joint frames in their parent frames so that the initial skeletal structure of the demonstrator’s body matches the robot initial configurations, as shown in Figure 6(b). To sum up, we represent the rotation alignment as in the joint frame, , and the translation alignment as in the parent frame of , (both represented in Homogeneous transformation). Then for , its frame transformation is , where is the transformation from to .

Figure 6: Frame transformation. (a) Rotation alignment: aligning the local frame of the human body pose with the corresponding robot joint frame by rotation matrix . The aligned local frame is . (b) Translation alignment: translating in its parent frame by to match the corresponding robot frame so that the human pose link is aligned with the robot link .

Since the DoF of the robot joint may not equal the DoF of its corresponding human joint, we could not have the exact movement mapping. Instead, we use the robot joint to approximate the human joint rotations as follows. First, a human joint rotation is converted into Euler forms, . Second, if the DoF of a robot joint is 3 (roll, pitch and yaw) and exactly matches the human DoF, then the conversion is straightforward: rotate for the robot joint with roll first, then pitch, and finally yaw. If the DoF of a robot joint is 2 (e.g., roll and pitch), then the conversion can be approximated as rotating with roll first, and then pitch. If the DoF of a robot joint is 1 (e.g., roll only), then rotate with roll only. For example, in Figure 7, the robot arm has the same structure as the demonstrator’s but with different joint DoF, as shown in Figure 7(a) and (b). It can approximate the instructor’s left arm movement by first converting (the rotation) into Euler angles , and then setting the joint roll to , and the joint pitch to for the shoulder, ignoring the , as shown in Figure 7(c).

Figure 7: Rotation approximation: (a) the instructor’s left shoulder has a DoF of 3 and its transformation is ; (b) the robot shoulder joint has a DoF of 2: roll and pitch; (c) the robot rotates for its shoulder the roll joint with and then the pitch joint with , without considering .

We now present the algorithm Approximate imitation in Algorithm 2. The algorithm takes joint correspondence JointCorrespondence, and instructor’s body pose JointMovement in transformation form as input, and outputs the joint configurations, JointConfigs, for the robot. Specifically, JointCorrespondence defines the joint mapping, , from human joint to robot joint for part joints. The JointMovement is represented as a series of transformations along the skeletal structure, (see Section 3.1 for more details). The algorithm runs as follows: first, it calculates the frame transformations from to , and saves the rotation alignment and translation alignment in and (line 3-5). Then for each joint movement in , the algorithm transforms it into the corresponding robot frame by translation and rotation alignment, followed by a conversion into the Euler form (line 7-8). The algorithm proceeds by selecting the right angles from , , and for the robot joint according to the DoF of the robot joint (line 9-16). The joint configurations are saved in , and returned as the final output.

Input: JointCorrespondence , JointMovement=
Output: JointConfigs
1 begin
2       = []; ; ;
3       for  in  do
4             .append(rotateAlign(, )) ;
5             .append(translateAlign(, )) ;
6            
7      for  in  do
8             = ;
9             = convertToEuler() ;
10             if  then
11                   append to ;
12                  
13            else
14                   if  then
15                         append to ;
16                        
17                  else
18                         if  then
19                               append to ;
20                              
21                        
22                  
23            
24      return ;
25      
26
Algorithm 2 Approximate imitation

The Approximate imitation method has several advantages for generating imitation behavior for robots in RLfD. First, this algorithm runs in real-time as the imitation is only partially taken place on the instructor’s body poses. In particular, we take advantage of local transformations of body poses to avoid solving inverse kinematics for the whole robot joints, which is computationally intensive and may also not have the closed form solutions. Also, instead of finding the exact mapping for robot joint angles, we set configurations based on the DoF of the robot joint to achieve a similar motion trend. This conversion may sometimes distort movements, but, still, the directions and trends are captured (as reflected in 4). Second, this method is generic and applicable to standard skill-oriented RLfD. Depending on the RLfD scenario, we can also assign different joint correspondences to do a partial imitation. For other types of RLfD, e.g., object-related demonstrations or goal-oriented learning from demonstrations, we can also apply the proposed method to generate the approximate imitation based on the object or the goal. Specifically, we can replace the joint transformations with the poses of the object or the goal, and generate the target , , and . Then we can adopt the inverse kinematic solvers to calculate a set of joint configurations to move the robot’s end-site to the target pose . Based on the DoF and the space constraints of the robot end-effectors, we can make the similar approximations to have the end-effector only achieve the pose, the and pose, or the complete target pose.

4 Evaluation

This section first introduces our RLfD simulation platform, then describes a preliminary study for determining the timing of imitating behavior, and finally presents the main user study.

4.1 RLfD simulation platform

Our RLfD simulation platform is composed of a virtual human instructor and a robot, as shown in Figure 8(a) and (b). The virtual human instructor performs different yet controlled types of movement skills, while the robot (a Pepper) needs to capture motion and learn skills from the instructor. Both parties stand facing each other in a simulated 3D space, as shown in Figure 8(c).

Figure 8: RLfD simulation platform: (a) the simulated human instructor; (b) the virtual Pepper robot; (c) the instructor and robot are facing towards each other for teaching and learning; (d) platform composition.
Figure 9: An example to show how the platform works

The simulation platform has three major components: demonstration component, sensing component, and engagement component, as shown in Figure 8(d). The demonstration component determines what movements the instructor needs to perform. We exploit motion capture (MoCap) data to simulate real movements. The MoCap data are recorded by 3D motion capturing systems with high precision, and are usually used for simulations and animations gleicher1998retargetting . The sensing component serves as a pose sensor, extracting body poses from the virtual instructor. This component also converts body poses between two representations (global positions and local transformations). Finally, the engagement component controls the robot’s engagement communication. Based on the proposed algorithms, the robot could choose one of the three ways to communicating engagement in RLfD: showing attention (A-mode), showing imitation (I-mode), and showing both (AI-mode). We further add one more mode, i.e., no engagement (N-mode), to evaluate the effectiveness of these three modes. In N-mode, the robot just stands near the instructor and remains stationary without any body movements. Compared with the A-mode, the robot’s gaze is fixed on the demonstrator’s face and is not affected by the demonstrator’s body movements.

In this simulated RLfD, the tasks for robots to learn are sports skills performed by a virtual instructor. We chose sports skills for robots to learn as this type of movement has often been adopted in RLfD ijspeert2002movement ; bentivegna2002humanoid . Four types of sports movements, i.e., boxing, rowing, swimming, and frisbeeing, are selected from CMU Graphics Lab Motion Capture Database222http://mocap.cs.cmu.edu/ as these four sports involve movements of various body parts. Regarding the policy deriving algorithms, even the state-of-the-art method may fail to deliver good learning outcomes, which may in turn change their perception towards the demonstration gathering. Thus, to minimize any side-effects or biases introduced by the performance of the learning algorithms, we do not utilize any learning algorithms, and the robot has no actual learning ability in the demonstration gathering process. In the other words, the robot only communicates its engagement when observing the human demonstrations by showing different cues and will not learn the sport skills in the following experiments and studies.

Figure 9 presents an example of how the simulation platform works. The first row shows the human instructor’s real demonstration, which is then re-targeted onto the instructor, as shown in the second row. The third and forth rows present the running of Instant attention and robot showing attention (A-mode). The last row presents the approximate imitation behavior of the robot (I-mode). We purposely rotate the 3D scene in the last two rows to get a better view of robot communicating engagement.

We chose online simulation rather than a field test due to the following constraints and concerns: First, due to the current limitations of RLfD techniques, the demonstrators are usually required to wear motion-capture devices, confined in a designated space, and repeatedly showcase the target movements. This could potentially impact on their interaction with robots and perception of the robot behavior. Also, limited by physical abilities, robots, e.g., Pepper, barely move without making undesirable noises, jerks, and vibrations, which could disturb the human participants and influence their assessment of robot learning. We thus use simulation in our experiment to avoid all these side effects and unexpected outcomes. Furthermore, we purposely select a viewpoint that allows the participants to have a better view of both the robot’s and the instructor’s behavior, i.e., the staging effect thomas1995illusion . Second, the robot’s engagement behavior could be evaluated in a more consistent and repeatable manner in a simulation. In a field test, the instructor’s demonstrations are usually non-repeatable and could be easily influenced by robots’ reactions. The simulation allows different engagement cues to be compared without bias. Second, the simulation provides a controllable and measurable environment to monitor and evaluate a system’s performance from various perspectives, which is often a necessity before algorithms are deployed in RLfD.

This simulation platform was built upon the Gazebo simulator 333http://gazebosim.org/ and the Robot Operating System (ROS). We use the Matlab Robotics System Toolbox 444https://www.mathworks.com/hardware-support/robot-operating-system.html to facilitate the algorithm implementation.

4.2 Preliminary study

In interpersonal communication, a person’s imitation behavior, also called mirroring behavior, often happens after the partner’s target behavior with certain time delay chartrand1999chameleon ; hove2009s . In this paper, we generate such mirroring behavior via the approximation mechanism. We need to determine the exact time delay so that users can correctly recognize imitation as a learning engagement cue. We run a within-subject pilot experiment to check the appropriate timing of robot imitation relative to the target action.

Manipulated variable. We set time delay as the independent variable in this study and experiment with three intervals: , , and . Technically, we used a buffer to store instructors’ body poses to postpone any imitation behavior. After proper setup, the buffer size was set to , , and to achieve an appropriate time delay of about , , and , respectively.

Subject allocation. We recruited 30 participants (mean age: 35.5, female: 12) via Amazon Mechanical Turk (AMT) who had no prior experience with physical or virtual robots. Each participant watched three simulated RLfD videos corresponding to the three delay intervals. In the videos, the instructor was teaching the robot some type of sports skill, and we staged the 3D scene at a fixed angle for a better view of the robot imitations. We counterbalanced the presentation order of the different time delays.

Dependent variables. Participants watched videos showing the robot imitating the instructor with three different time delays. They were informed that the robot is supposed to learn sports skills from the demonstrator. After each video, they were asked to rate their agreement on a 7-point Likert scale as to whether the robot in the video is actually learning.

Figure 10 presents the average and overall rating distribution on different time delays. We run a repeated measures ANOVA with time delay as the factor, and find that there is a significant difference in delay-induced perception of robot learning engagement (). Results of the Bonferroni post-hoc test suggest that the engagement rating of delaying for is significantly higher than that of delaying for () and (). Overall, setting the imitation time delay to can effectively communicate robots’ learning engagement ( 70% agree and strongly agree). We apply this configuration to the Approximate imitation algorithm in the main user study.

One might be wondering that why the rating difference between and delay is noticeably dramatic, even larger than the difference between and delay. The cause may possibly be the approximation mechanism adopted for generating the mirroring behavior. When the delay time is small (e.g., ), the approximate imitation algorithm generates the movement in a very responsive manner, almost at the same pace with the demonstrator’s movement. The subjects are likely to feel that the robot is showing, rather than following, the demonstrator’s movement. As the delay time becomes longer (e.g., ), the movement following effects becomes more obvious, and the robot appears to be learning from the demonstrator by mimicking his/her behavior. Consequently, the ratings between the and in terms of robot communicating learning engagement become higher. Such dramatic rating difference also confirms the necessity and importance of using the preliminary study to determine the appropriate delay time for the followed studies.

Figure 10: Results for timing: (a) average ratings; (b) rating distribution.

4.3 Main study

To evaluate the effectiveness of engagement communication and our proposed cues on participants’ perception of the robot and the demonstration, we conducted a within-subject experiment on an RLfD simulation platform, with an additional ”without engagement” condition (N-Mode) as the baseline.

4.3.1 Hypothesis

Our proposed methods generates different types of engagement cues for robots to express their engagement. Accordingly, we first hypothesize that:

H1. 1) Regardless of actual cues taken, robots that communicate engagement are perceived to be significantly more engaged (H1a) in learning, and their learning behavior is significantly more socially acceptable (H1b) than those in the N-mode. Further, 2) imitation cue will receive a significantly higher engagement rating than attention cue (H1c), while combined cues will be rated significantly the most (H1d). Similarly, 3) imitation cue will be rated significantly more acceptable than attention cue (H1e) while combined cues will be rated significantly the most (H1f).

According to educational theory postulating that learners’ engagement cues, especially behavioral engagement, could have reciprocal effects on instructors skinner1993motivation , we hypothesize that:

H2. Robots communicating engagement via different cues will have significantly different influences on human participants. Specifically, 1) regardless of the cues, communicating engagement will significantly influence humans’ estimation of the robot learning capability (H2a), and significantly raise the humans’ expectations towards the learning outcomes (H2b) than no communication. Further, 2) imitation cues will lead to a significantly higher estimation of the robot’s capabilities than attention cues (H2c) while combined cues have the most significant influence than others. (H2d). Similarly, 3) imitation cues will result in a significantly higher expectation towards the learning outcome than attention cues (H2e) while combined cues have significantly the highest expectation than others (H2f).

We further hypothesize that the robot showing different engagement behavior can affect humans’ assessment of demonstration quality. More specifically:

H3. 1) Regardless of the exact demonstrations shown to robots, different engagement cues will influence the human participants’ assessment of the demonstration quality. Specifically, demonstrations for robots with attention cues, imitation cues and the hybrid cues will be rated as significantly more appropriate (in terms of the expected robot capabilities) than that without engagement cues even if they are actually the same (H3a). Further, 2) demonstrations for robots with imitation cues and the hybrid cues will have a significantly higher rating in appropriateness than that with attention cues (H3b).

In the study, these different aspects were measured via post-study questions with 7-point Likert scale answers, as shown in Figure 11 and Figure 12. We derived these questions in the user study based on the previous research on Human-Robot Interactions and robot learning. Specifically, the questions to measure robot communicating engagement are adapted from the engagement studies Strait:2015:TMH:2702123.2702415 ; Sun:2017:SHE:3025453.3025469 ; the questions to measure participants’ expectation towards the robot learning capability are derived based on the studies on human expectations and assessment of human-robot collaborations Kwon:2018:ERI:3171221.3171276 . We also took two steps to ensure the effectiveness of the answers to all the questions, . First, the questions could only be answered after participants took necessary actions to understand the experiment. For example, the questions to measure engagement were only visible when the participants finished watching the full learning videos; and the questions to measure the participants’ expectation also require the participants to provide the answers and their reasons (those without giving reasons could not proceed to next questions). Second, all answers were manually checked to reject any invalid responses, e.g., a response with the same answers to all questions, and a response with vague and inconsistent comments.

4.3.2 User study design

The study consisted of five sessions: one introductory session and four experimental sessions. The introductory session requested demographic information and presented a background story to engage users: the participant has a robot team of four for an Olympic game. They needed to assess the robots’ performance when they were under a professional coach’s tutelage. In experiment sessions, participants watched the human instructor’s movements first and then monitored the robot learning process in the RLfD simulation platform. After each session, participants were required to fill post-study questionnaires. Each session checked one mode, and modes were counter-balanced with learning tasks. Specifically, we randomized the order of engagement modes and the four physical skills to ensure the mode applies evenly across different skills and the skill also occurs evenly across different modes. We recruited 48 participants (mean age: 30.9, female: 6, no prior experiences with teaching robots and no participation in the preliminary study) from Amazon Mechanical Turk (AMT).

During the experiment, we asked the participants to rate if they perceived the robot was paying attention or imitating based on its behavior. This served as the manipulation check for validity, ensuring that our designs indeed convey the intended type of engagement.

4.3.3 Analysis and results

Manipulation check. The manipulation check for different engagement communications shows that the manipulation is effective (for attention cue: repeated measures ANOVA, ; for imitation cue: repeated measures ANOVA, ). Robots in A-mode () and AI-mode () are indeed perceived to show more attention than robots in N-mode (); Bonferroni post-hoc test . Also, more imitation behavior is reported by subjects with robots in I-mode () and AI-mode () than robots in N-mode (); Bonferroni post-hoc test .

Figure 11: Ratings on robot engagement communications and their behavior in RLfD.

Efficacy of proposed engagement cues.

We analyze participants’ ratings via a one-way repeated measures ANOVA with the mode as the independent variable. We find that both attention and imitation cues significantly improve the ratings of robots’ engagement levels and their behavior, as shown in Figure 11. Specifically, the robots with A-mode (), I-mode () and AI-mode () are perceived to be significantly more engaged in the learning process than the robot in N-mode (); repeated measures ANOVA, , H1a accepted. Consequently, subjects accept the robots’ behavior in RLfD (A-mode: , I-mode: , and AI-mode: ) significantly more than the robot in N-mode (); repeated measures ANOVA, , H1b accepted. Further, in terms of engagement, combined cues are reported to be significantly better than single cues; Bonferroni post-hoc test ; H1d accepted. in terms of acceptability, combined cues are reported to be significantly better than single cues; Bonferroni post-hoc test ; H1f accepted. However, we do not notice a significant difference between imitation cue and attention cue, thus H1c and H1e are both rejected. Therefore, H1 is partially accepted.

Based on these analyses, we therefore conclude that:

Overall, our results partially support H1: showing attention, imitation or both are perceived to be significantly more engaged in learning, and is significantly more acceptable. Also, showing both behavior is perceived to be significantly better than showing only one behavior. However, no significant difference can be found between showing attention and showing imitation.

Figure 12: Ratings on the effects of engagement communication on the participants’ perception and their assessment of demonstration qualities.

Effects of engagement cues on participants’ perception.

We then compare the effects of different engagement cues on subjects’ perception via a one-way repeated measures ANOVA with the mode as the independent variable. In general, robot engagement communication significantly enhances the participants’ estimation of robots’ learning capabilities and the participants’ expectation of the learning outcomes, even if none of the robots in the experiment have the learning ability (no learning algorithms are adopted in the user study). Specifically, in terms of estimating the robots learning capability, participants rated the robots in A-mode (), I-mode () and AI-mode () to be significantly more intelligent than the robots in N-mode (); repeated measures ANOVA, ; H2a accepted. Similarly, participants rated the robots with engagement behavior (A-mode: , I-mode: , and AI-mode: ) to be more likely to master the skills than the robots without (N-mode: ); repeated measures ANOVA, ; H2b accepted.

In addition, showing behavioral engagement, i.e., I-mode, have significantly more influences on the participants than showing attentional engagement, i.e., A-mode. In particular, the robots in I-mode () are perceived to be significantly more capable of learning the demonstrated skills than the robots in A-mode (); repeated measures ANOVA, ; H2c accepted. Similarly, the robots in I-mode () receive significantly higher ratings than the robots in A-mode () in terms of participants’ expectation towards the learning outcomes; repeated measures ANOVA, . Thus, H2e accepted.

Further, we also notice significant differences between robots in AI-mode and robots in other modes. Specifically, robots in AI-mode show significantly more intelligence in learning () than robots in N-mode (), A-mode (), and I-mode (); repeated measures ANOVA, ; H2d accepted. Also, the robots in AI-mode () are estimated by the participants to be significantly more likely to master the skill than the robots in modes ( N-mode: , A-mode: and I-mode: ); H2f accepted. Note that in all different engagement modes and different skill settings, the robots are equipped with no learning algorithms and thus have no actual learning abilities.

Overall, our results support H2: communicating engagement significantly influence the humans’ estimation of the robots’ learning capabilities, and significantly changes their expectation towards the final learning outcomes, even though none of the robots have the learning abilities. Moreover, the behavioral engagement in RLfD, i.e., imitation, presents significantly more influence on the participants than the attentional engagement. Furthermore, communicating engagement via two cues at the same time have significantly more effects on participants than communicating engagement via a single cue.

Effects on participants’ assessment of demonstration qualities.

Finally, we analyze the participants’ ratings on the appropriateness of instructors’ demonstrations. As shown in Figure 12, no significant difference can be found between A-mode () and N-mode (); H3a rejected. However, compared with A-mode, only AI-mode () significantly improves the participants’ assessment of demonstration quality in RLfD, Bonferroni post-hoc test ; H3b partially accepted. Note that in different engagement modes, the skills to be learned are all generated by the same set of MoCap data. Thus all demonstrations are actually of the same quality.

Overall, our results partially support H3: communicating behavioral engagement or combined engagement will significantly improve participants’ assessment of demonstration qualities, while showing attention cannot, even though all the demonstrations are actually of the same quality.

Further, in the comments collected from the user study, we found that most participants explicitly stated that the robots without behavioral engagement may fail in learning, and accordingly, they were more likely to adjust future demonstrations when the robots communicated no engagement or only attentional engagement.

5 Discussion

5.1 Engagement communication for robots in RLfD

The choice of engagement cue should consider the nature of the learning task

Our results show that robots’ behavioral engagement is preferable to attentional engagement in a physical skill-oriented RLfD, which can probably be explained by the correspondence between the practice of RLfD and the cone of learning dale1969audiovisual . Cone of learning, a.k.a. pyramid of learning or cone of experience, depicts the hierarchy of learning through involvement in real experiences dale1969audiovisual

. It proposes that visual receiving (just watching the demonstration) is a passive form of learning, and learners can only remember half of the knowledge passing through this channel two weeks later. In contrast, “doing the real thing” is a type of active learning that leads to deeper involvement and better learning outcomes 

dale1969audiovisual .

In RLfD, the basic task for robots is to derive a policy from demonstrations and then reproduce the instructors’ behavior argall2009survey . On the one hand, a robot’s imitation behavior resemble this ”behavior reproducing” process; it is thus deemed actively engaged in the learning process. On the other hand, although showing attentional engagement implies that the robot is involved in the visual receiving of instruction, it is still considered as a passive way to learn. Consequently, instructors may come to the conclusion that a robot showing behavioral engagement will have deeper understanding and better mastery of the skill than that showing attentional engagement. Moreover, by analyzing the quality gap between a robot’s imitation behavior and the demonstration (behavior to be reproduced), instructors may have a more accurate assessment of the robot’s learning progress. In a word, to design effective engagement cues for robots in RLfD, we need to take the nature of the learning task into consideration.

Engagement communication should reflect robot’s actual capabilities

In our study, we do not equip the robot with any actual policy derivation algorithm since we want to avoid the perception bias caused the algorithm selection. In other words, the robot has no learning ability. Still, many subjects are convinced that robots with engagement communication (attention, imitation, or both) would finally master the skill. They hold such a belief even if some tasks are technically very challenging for robots to learn because of the correspondence problem, e.g., swimming. These findings suggest that engagement communication can affect instructors’ mental model of the robot’s capability and progress. There can be a misalignment between instructors’ expectations and the actual development as shown in our study. If instructors shape their teaching according to an inaccurate mental model, frustration may occur later in the RLfD process. Hence, it is critical to ensure that a robot’s communication of engagement reflects its actual capabilities (policy development in the case of RLfD).

5.2 Limitations

This work has several limitations. First, in our study, engagement communication is decoupled from the robot’s actual learning process. However, in human or animal learning, such communication is usually associated with the learning process. For example, a student making good progress tends to show more behavioral engagement skinner1993motivation . We will investigate how to couple learning process with engagement communication in the future. Second, in this paper, we only consider two types of learning engagement cues, i.e., attention and imitation. In practice, human learners may employ more diverse cues, e.g., spatially approaching, etc. Third, the proposed methods, Instant attention and Approximate imitation, are both based on the human body poses. They may not be applicable to the learning tasks which do not necessarily involve the demonstrator’s body movements, e.g., object manipulations. For those tasks, designing a good mechanism to communicate the robot engagement is still an open question. Fourth, in this work, we only consider skill-oriented RLfD in which the robot has to master a skill taught by instructors. Other types of RLfD, e.g., goal-oriented RLfD in which the robot learns how to achieve a goal from human examples, are inherently different in task settings. Though the proposed method may work, we still need to evaluate their effects in the future work. Fifth, we conduct the user study in an online simulation environment without a further off-line and real-time RLfD test. Though the simulation is common practice to evaluate the idea in RLfD, the participants do not have any control over the teaching process. How the participants might reshape future demonstration based on robot’s engagement feedback needs further investigation.

6 Conclusion and Future work

In this work, we propose two methods (Instant attention and Approximate imitation) to generate robots’ learning engagement in RLfD. The Instant attention method automatically generates the point of attention and the Approximate imitation method produces robot imitation behavior. Based on the two methods, we investigate the effects of three types of engagement communication (showing attention, showing imitation, and showing both) via a within-subject user study. Results suggest that the proposed cues enable robots to be perceived to be significantly more engaged in the learning process and behave significantly more acceptably in RLfD than with no engagement communication. Also, these engagement cues significantly affect the human participants’ estimation of robots’ learning capabilities and the participants’ expectation of the learning outcomes, even though all the robots have no actual learning abilities. In particular, imitation cue influences instructors’ perceptions significantly more than attention cue, while the hybrid cues significantly outperform a single cue. We also find that showing behavioral or combined engagement significantly improves instructors’ assessments of demonstration qualities. This paper takes the first step to reveal the potential effects of communicating engagement on the humans in RLfD.

References

  • (1) Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM (2004)
  • (2) Admoni, H., Scassellati, B.: Robot nonverbal communication as an ai problem (and solution). In: 2015 AAAI Fall Symposium Series (2015)
  • (3) Anzalone, S.M., Boucenna, S., Ivaldi, S., Chetouani, M.: Evaluating the engagement with social robots. International Journal of Social Robotics 7(4), 465–478 (2015)
  • (4) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009)
  • (5) Argyle, M., Cook, M.: Gaze and mutual gaze. (1976)
  • (6) Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on signal processing 50(2), 174–188 (2002)
  • (7) Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: ICML, vol. 97, pp. 12–20 (1997)
  • (8) Bailenson, J.N., Yee, N.: Digital chameleons: Automatic assimilation of nonverbal gestures in immersive virtual environments. Psychological science 16(10), 814–819 (2005)
  • (9) Bentivegna, D.C., Ude, A., Atkeson, C.G., Cheng, G.: Humanoid robot learning and game playing using pc-based vision. In: Intelligent Robots and Systems, 2002. IEEE/RSJ International Conference on, vol. 3, pp. 2449–2454. IEEE (2002)
  • (10) Bernieri, F.J., Rosenthal, R.: Interpersonal coordination: Behavior matching and interactional synchrony. (1991)
  • (11) Breazeal, C., Brooks, A., Gray, J., Hoffman, G., Kidd, C., Lee, H., Lieberman, J., Lockerd, A., Mulanda, D.: Humanoid robots as cooperative partners for people. Int. Journal of Humanoid Robots 1(2), 1–34 (2004)
  • (12) Calinon, S., Billard, A.: Active teaching in robot programming by demonstration. In: RO-MAN 2007-The 16th IEEE International Symposium on Robot and Human Interactive Communication, pp. 702–707. IEEE (2007)
  • (13) Chao, C., Cakmak, M., Thomaz, A.L.: Transparent active learning for robots. In: 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 317–324. IEEE (2010)
  • (14) Chartrand, T.L., Bargh, J.A.: The chameleon effect: the perception–behavior link and social interaction. Journal of personality and social psychology 76(6), 893 (1999)
  • (15) Chartrand, T.L., Maddux, W.W., Lakin, J.L.: Beyond the perception-behavior link: The ubiquitous utility and motivational moderators of nonconscious mimicry. The new unconscious pp. 334–361 (2005)
  • (16) Chernova, S., Thomaz, A.L.: Robot learning from human teachers.

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    8(3), 1–121 (2014)
  • (17) Chernova, S., Veloso, M.: Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research 34, 1–25 (2009)
  • (18) Crick, C., Osentoski, S., Jay, G., Jenkins, O.C.: Human and robot perception in large-scale learning from demonstration. In: Proceedings of the 6th international conference on Human-robot interaction, pp. 339–346. ACM (2011)
  • (19) Dale, E.: Audiovisual methods in teaching (1969)
  • (20) Eriksen, C.W., Hoffman, J.E.: Temporal and spatial characteristics of selective encoding from visual displays. Perception & psychophysics 12(2), 201–204 (1972)
  • (21) Fuente, L.A., Ierardi, H., Pilling, M., Crook, N.T.: Influence of upper body pose mirroring in human-robot interaction. In: International Conference on Social Robotics, pp. 214–223. Springer (2015)
  • (22) Gleicher, M.: Retargetting motion to new characters. In: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp. 33–42. ACM (1998)
  • (23) Gonsior, B., Sosnowski, S., Mayer, C., Blume, J., Radig, B., Wollherr, D., Kühnlenz, K.: Improving aspects of empathy subjective performance for hri through mirroring emotions. In: Proc. IEEE Intern. Symposium on Robot and Human Interactive Communication, RO-MAN 2011, Atlanta, USA (2011)
  • (24) Grochow, K., Martin, S.L., Hertzmann, A., Popović, Z.: Style-based inverse kinematics. In: ACM transactions on graphics (TOG), vol. 23, pp. 522–531. ACM (2004)
  • (25) Guthrie, J.T., Cox, K.E.: Classroom conditions for motivation and engagement in reading. Educational psychology review 13(3), 283–302 (2001)
  • (26) Hatfield, E., Cacioppo, J.T., Rapson, R.L.: Emotional contagion. Current directions in psychological science 2(3), 96–100 (1993)
  • (27) Hove, M.J., Risen, J.L.: It’s all in the timing: Interpersonal synchrony increases affiliation. Social Cognition 27(6), 949–960 (2009)
  • (28) Ijspeert, A.J., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical systems in humanoid robots. In: Robotics and Automation, 2002. Proceedings. ICRA’02. IEEE International Conference on, vol. 2, pp. 1398–1403. IEEE (2002)
  • (29) Jackson, S.L., Krajcik, J., Soloway, E.: The design of guided learner-adaptable scaffolding in interactive learning environments. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 187–194. ACM Press/Addison-Wesley Publishing Co. (1998)
  • (30) Jonides, J.: Further toward a model of the mind’s eye’s movement. Bulletin of the Psychonomic Society 21(4), 247–250 (1983)
  • (31) Kim, S., Kim, C., You, B., Oh, S.: Stable whole-body motion generation for humanoid robots to imitate human motions. In: Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pp. 2518–2524. IEEE (2009)
  • (32) Koenemann, J., Burget, F., Bennewitz, M.: Real-time imitation of human whole-body motions by humanoids. In: Robotics and Automation (ICRA), 2014 IEEE International Conference on, pp. 2806–2812. IEEE (2014)
  • (33) Koenig, N., Takayama, L., Matarić, M.: Communication and knowledge sharing in human–robot interaction and learning from demonstration. Neural Networks 23(8-9), 1104–1112 (2010)
  • (34) Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research 31(3), 360–375 (2012)
  • (35) Kuno, Y., Sadazuka, K., Kawashima, M., Yamazaki, K., Yamazaki, A., Kuzuoka, H.: Museum guide robot based on sociological interaction analysis. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 1191–1194. ACM (2007)
  • (36) Kwon, M., Huang, S.H., Dragan, A.D.: Expressing robot incapability. In: Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’18, pp. 87–95. ACM, New York, NY, USA (2018). DOI 10.1145/3171221.3171276. URL http://doi.acm.org/10.1145/3171221.3171276
  • (37)

    Laskey, M., Chuck, C., Lee, J., Mahler, J., Krishnan, S., Jamieson, K., Dragan, A., Goldberg, K.: Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations.

    In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 358–365. IEEE (2017)
  • (38) Li, J., Ju, W., Nass, C.: Observer perception of dominance and mirroring behavior in human-robot relationships. In: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, pp. 133–140. ACM (2015)
  • (39) Lockerd, A., Breazeal, C.: Tutelage and socially guided robot learning. In: Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, vol. 4, pp. 3475–3480. IEEE (2004)
  • (40) Mutlu, B., Forlizzi, J., Hodgins, J.: A storytelling robot: Modeling and evaluation of human-like gaze behavior. In: Humanoid robots, 2006 6th IEEE-RAS international conference on, pp. 518–523. Citeseer (2006)
  • (41) Mutlu, B., Shiwa, T., Kanda, T., Ishiguro, H., Hagita, N.: Footing in human-robot conversations: how robots might shape participant roles using gaze cues. In: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, pp. 61–68. ACM (2009)
  • (42) Nagai, Y., Muhl, C., Rohlfing, K.J.: Toward designing a robot that learns actions from parental demonstrations. In: Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, pp. 3545–3550. IEEE (2008)
  • (43) Pekrun, R., Linnenbrink-Garcia, L.: Academic emotions and student engagement. In: Handbook of research on student engagement, pp. 259–282. Springer (2012)
  • (44) Pitsch, K., Vollmer, A.L., Mühlig, M.: Robot feedback shapes the tutor’s presentation: How a robot’s online gaze strategies lead to micro-adaptation of the human’s conduct. Interaction Studies 14(2), 268–296 (2013)
  • (45) Riek, L.D., Paul, P.C., Robinson, P.: When my robot smiles at me: Enabling human-robot rapport via real-time head gesture mimicry. Journal on Multimodal User Interfaces 3(1-2), 99–108 (2010)
  • (46) Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635 (2011)
  • (47) Saunders, J., Nehaniv, C.L., Dautenhahn, K.: Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, pp. 118–125. ACM (2006)
  • (48) Schaal, S.: Learning from demonstration. In: Advances in neural information processing systems, pp. 1040–1046 (1997)
  • (49) Sena, A., Zhao, Y., Howard, M.J.: Teaching human teachers to teach robot learners. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–7. IEEE (2018)
  • (50) Sidner, C.L., Kidd, C.D., Lee, C., Lesh, N.: Where to look: a study of human-robot engagement. In: Proceedings of the 9th international conference on Intelligent user interfaces, pp. 78–84. ACM (2004)
  • (51) Sidner, C.L., Lee, C., Kidd, C.D., Lesh, N., Rich, C.: Explorations in engagement for humans and robots. Artificial Intelligence 166(1-2), 140–164 (2005)
  • (52) Silpasuwanchai, C., Ma, X., Shigemasu, H., Ren, X.: Developing a comprehensive engagement framework of gamification for reflective learning. In: Proceedings of the 2016 ACM Conference on Designing Interactive Systems, pp. 459–472. ACM (2016)
  • (53) Skinner, E.A., Belmont, M.J.: Motivation in the classroom: Reciprocal effects of teacher behavior and student engagement across the school year. Journal of educational psychology 85(4), 571 (1993)
  • (54) Strait, M., Vujovic, L., Floerke, V., Scheutz, M., Urry, H.: Too much humanness for human-robot interaction: Exposure to highly humanlike robots elicits aversive responding in observers. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI ’15, pp. 3593–3602. ACM, New York, NY, USA (2015). DOI 10.1145/2702123.2702415. URL http://doi.acm.org/10.1145/2702123.2702415
  • (55) Suleiman, W., Yoshida, E., Kanehiro, F., Laumond, J.P., Monin, A.: On human motion imitation by humanoid robot. In: Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, pp. 2697–2704. IEEE (2008)
  • (56) Sun, M., Ma, X.: Adversarial imitation learning from incomplete demonstrations. arXiv preprint arXiv:1905.12310 (2019)
  • (57) Sun, M., Mou, Y., Xie, H., Xia, M., Wong, M., Ma, X.: Estimating emotional intensity from body poses for human-robot interaction. arXiv preprint arXiv:1904.09435 (2019)
  • (58) Sun, M., Zhao, Z., Ma, X.: Sensing and handling engagement dynamics in human-robot interaction involving peripheral computing devices. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 556–567. ACM (2017)
  • (59) Sun, M., Zhao, Z., Ma, X.: Sensing and handling engagement dynamics in human-robot interaction involving peripheral computing devices. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, pp. 556–567. ACM, New York, NY, USA (2017). DOI 10.1145/3025453.3025469. URL http://doi.acm.org/10.1145/3025453.3025469
  • (60) Szafir, D., Mutlu, B.: Pay attention!: designing adaptive agents that monitor and improve user engagement. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp. 11–20. ACM (2012)
  • (61) Takayama, L., Dooley, D., Ju, W.: Expressing thought: improving robot readability with animation principles. In: Human-Robot Interaction (HRI), 2011 6th ACM/IEEE International Conference on, pp. 69–76. IEEE (2011)
  • (62) Thomas, F., Johnston, O., Thomas, F.: The illusion of life: Disney animation. Hyperion New York (1995)
  • (63) Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. MIT press (2005)
  • (64) Yamazaki, K., Yamazaki, A., Okada, M., Kuno, Y., Kobayashi, Y., Hoshi, Y., Pitsch, K., Luff, P., vom Lehn, D., Heath, C.: Revealing gauguin: engaging visitors in robot guide’s explanation in an art museum. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1437–1446. ACM (2009)