The capabilities of autonomous robotic systems have increased significantly over the last few years and are used in increasingly complex environments for a wide range of applications. One such an application, that we will explore in this paper, is the use of autonomous robotic systems in socially challenging environments. In human-robot collaborative settings, robots require awareness of the current task as well as of the surrounding social environment. Consequently, they have to be able to make inferences in multiple levels of abstraction, to reason about and plan effectively, by combining task related actions and social interactions with humans .
In this paper we explore a social autonomous robotic system that performs a sorting game together with a small group of children, while it stays aware of the social-emotional states of the children in the same environment and keeps them emotionally engaged and positive. We aim for seamless social human-robot interaction. We consider existing theories to design socially appropriate robot strategies - high level categories for groups of behaviours - for a sustained child-robot interaction.
We use a planning-centric approach; prior to acting we use a planner to create a temporal plan that achieves the goals of the robot (e.g. sort some toys according to predefined rules), while not breaking any of the constraints (e.g. leaving children in a prolonged negative emotional state). The benefit of using a planning-based approach is that by reasoning in advance, we can find good solutions that minimise time and pre-emptively interact with children to maintain their positive emotional states. The alternative is a reactive system that needs to stop fulfilling its task whenever a child is in a negative emotional state. Such a system has no guarantee when – or indeed if – it will finish its tasks.
To create a robust plan
, we require a predictive model of the emotional states of the children. In particular how the emotions develop over time. Using this model, the planner can reason how the children’s emotions are affected by the action (or inaction) of the robot. We present a predictive model based on the Pleasure-Arousal-Dominance (PAD) emotional state model, which was adapted to capture temporal features for the development of a dynamic interaction framework.
In Section II we discuss the related work. In Section III we present the predictive dynamic interaction framework, which is based on the PAD model. In Section IV we introduce the formalism of the planning system and some preliminary results. Then, we present initial steps towards the evaluation of our predictive model in Section V. We finish by presenting our conclusions and future directions in Section VI.
Ii Related work
Ii-a Planning-based social child-robot interaction
One of the focus points for the development of autonomous robots that interact with humans is the social intelligence of the system. Among the first attempts for the development of socially intelligent robots was that of Dauntenhahn’s work , who refers to robotic systems that collect mental and social experiences and based on this input mature over time. The robot considers user’s behavioural, affective and mental states to provide appropriate responses for the evolution of the interaction loop. In this context, for the optimization of the interaction, robotic systems integrate planning and learning frameworks by taking into consideration human abilities and preferences , . More specifically, in the area of child-robot interaction, there is an increasing interest in the development of socially intelligent robots acting as learning companions for typically developing children e.g.  as well as therapeutic social agents for children on the autism spectrum e.g. , , and . Despite the growing body of research on socially intelligent systems for child-robot interaction, the settings are usually well-defined and restricted, while the even more challenging area of child-robot interaction in dynamic play settings needs further investigation and development.
Ii-B Collaborative play and the importance of emotions
This project uses a dynamic play setting for child-robot collaboration, in which the child and the robot share the same goal (e.g. sorting toys according to predefined rules) in the form of a guided activity . The importance of collaborative play for children’s development has been previously highlighted in terms of children’s developing cognitive and socio-emotional skills and the establishment of their intrinsic motivation for learning . Based on the hypothesis that the development of these skills is more effective when the child is in an optimum affective state, previous research indicated a positive correlation of children’s emotional competence with their concurrent and future social competences , as well as with the development of cognitive abilities . These findings indicate the importance of maintaining an optimum affective state for children during play. Towards this direction, this project incorporates approaches that focus on continuous input  and analysis of children’s affective state.
Ii-C Models for affective states identification
Emotional states in humans have been traditionally described by categorical or dimensional models. Categorical models such as the Differential Emotions Theory (DET)  and Ekman’s theory of basic discrete emotional states  emphasize the existence of particular emotions that are assumed to have innate neural substrates, unique and universally recognized facial expressions and distinctive universals in antecedent events. On the other side, according to the dimensional approaches the emotion domain can be represented by a small number of continuous dimensions. Plutchik , for example, suggested three dimensions: the emotional state, the intensity and the degree of similarity to other emotions. Recently,  suggested the theory of constructed emotion, according to which an instance of emotion is constructed the same way that all other perceptions are constructed, using the same neuroanatomical principles for information flow within the brain, cancelling the distinct categorical nature of emotions. The dynamic nature of our setting requires a dimensional approach of emotions to depict changes of affective states over time.
Iii Towards a dynamic interaction framework
For the purpose of this project, we adopted the dimensional model of the Pleasure - Arousal - Dominance (PAD) framework to detect, evaluate and predict users’ emotional states. We used this model to develop a dynamic interaction framework that takes into consideration temporal features of affective development.
Iii-a The PAD model of emotions
Given the developmental nature of this project, which aims for long-term social human-robot interaction, we adopted the PAD model. The PAD model is a dimensional model that can be used to represent changes of emotional states over time. The PAD dimensional framework assumes that the dimensions of pleasure, arousal and dominance are necessary and sufficient to represent emotional states . They are described as follows:
Pleasure-displeasure: Defined as positive versus negative affective states. Pleasure-displeasure corresponds to cognitive judgements of evaluation, with higher evaluations of stimuli being associated with greater pleasure induced by the stimuli;
Arousal-nonarousal: Defined in terms of level of mental alertness and physical activity;
Dominance-submissiveness: Defined as a feeling of control and influence over one’s surroundings and others versus feeling controlled or influenced by situations and others.
Adopting a model that considers the level of dominance, in addition to the traditionally used valence and arousal dimensions, represents children’s emotional state more accurately. Especially given the collaborative nature of the settings in this project. For instance, both anger and anxiety arise from low-pleasure and high-arousal events. However, anger and anxiety are on opposite sides of the dominance dimension. The PAD framework has been previously used for the development of robotic systems in the context of social human-robot interaction . However, a recent systematic literature review  showed that there are a limited amount of studies that focus on developmental perspective for long-term sustained child-robot interaction in dynamic settings.
Iii-B Temporal considerations
Emotional processing is a dynamic phenomenon which is subject to stimuli such as external interventions from social agents. According to the generic timing hypothesis, an emotion is thought to come into being and develop through a recursive situation – attention – appraisal – response sequence . The interventions distinguish between: antecedent-focused strategies that start operating early in a given iteration of the emotion-generative process, before response tendencies are fully activated; and response-focused strategies that start operating later on, after emotion response tendencies are more fully activated . Based on this theory we hypothesize that temporal planning supports a balance of best performance in completing a task whilst maintaining appropriate emotions and engagement of the children.
In the context of this project, we focused on external interventions which are made by the robot. The robot starts with a perceived initial emotional state of the children. To maintain children’s optimum emotional level, it applies an intervention / strategy to achieve user’s reappraisal or attention deployment early in the emotion-generative trajectory, while monitoring the evolution of the user’s emotional state.
Iv Planning system
We model our problem and domain files with PDDL 2.1 . This modelling language supports durative actions and temporal constraints. These features are necessary to capture the evolution of the emotional states over time. We will first define a temporal planning problem, followed by the model of the problem formulated in this paper.
Temporal Planning Problem Representation We represent a temporal planning problem as where is a set of atoms, is the set of clauses over representing the initial state, is a conjunction over that represents the goal that needs to be achieved, is a set of operators that affect the world. Every operator has a precondition and a set of effects . Each clause in the preconditions and effects are annotated with a temporal constraint; A precondition clause must either hold: at the beginning of the action, at the end of the action, or during the entire duration. Effects are applied either at the beginning or end of an action.
Iv-a Modelling the planning problem
Using the PAD model described in Section III we created a planning model that encapsulates the children’s emotional state and its evolution of time. In this paper we use a case study from the EU project SQUIRREL 111http://www.squirrel-project.eu/; The robot is tasked with sorting a set of toys while at the same time collaborating with three children that are active in the same area. We assume we know the initial emotional state of the children and we have an array of sensors to monitor the children’s emotional state during execution as described in section V.
The PDDL Domain is listed in Figure 2, most actions have been abbreviated due to space constraints. Children’s emotional states are encoded using a triplet of functions that correspond to the three domains of the PAD-model: pleasure, arousal, dominance. All actions in the domain affect the emotional state of each child. It is assumed that robot’s task-related actions usually have less effect on the emotional state of the children and generally tend to lower pleasure which will eventually lead to boredom.
Social-emotional actions, like accommodate-distress, have more effect on the children’s emotional states as they tend to interact with children directly and not contribute to the overall task. We present three strategies that can be used by the robot to alter the children’s emotions. There are:
Accommodate: The robot gives the time for the child to familiarize him/herself to the new situation.
Maintain: The robot has an interactive role to maintain the positive state of the child.
Improve: The robot initiates and actively applies strategies to trigger a change from a neutral to a positive state.
Each strategy contains a set of various behaviours for execution.One action does both at the same time; the action kid-give has the robot ask a child to give it an item. This action contributes to the task and improves the emotional state of the child that helps the robot. The effect of these strategies depend on the current emotional state of the child. For instance, if a child is in a very positive emotional state (e.g. Pleasure, Arousal, and Dominance are both high) then applying the Improve strategy will not affect the emotional state much. However, if the child is sad (e.g. Pleasure and Arousal are low, but Dominance is high) then applying the improve strategy will have a noticeable, positive, effect on the child’s emotion.
We have modelled four separate emotions we can detect in children, Distress, Sadness, Boredom, and Happiness. The relevant relations between these emotions and the PAD levels are depicted in Table I.
The effects of the actions on the emotional states are listed in Table II. In our domain we model the three domains of the PAD model using numerical values. We limit the range of these values between -1 and 1, where 1 is the highest value and corresponds to the high value. 0 is considered to be low. We do not want any of these domains to become low during planning execution, so the robot is unable to execute any task-related action until the emotional states of all children are not negative.
An example planning problem is listed in Figure 1. We define the emotional state of each of the three children (c1, c2, and c3). The child c1 is bordering boredom, c2 is very happy, and c3 is satisfied but not very active. The goal is to store away the three toys (toy1, toy2, and toy3) in the provided box.
The evolution of the Pleasure, Arousal, and Dominance are depicted in Figure 4. The planner aims to minimise the time it takes to complete the task; It allows the emotional states of the children to border the acceptable and keep it there.
V Preliminary Evaluation
We present an empirical pilot study for the evaluation of the proposed Dynamic Interaction Framework; in this study we detect and interpret children’s arousal level as an indicator of their task engagement. We define a threshold of arousal level. The robot performs a strategy to improve children’s arousal level by executing an unexpected behaviour.
We conducted two sessions. In each session two children aged 8-9 years played together with a non-humanoid robot with the aim to sort a set of toys according to predefined rules. In order to record individual behaviours (including speech, pose, and gestures) and avoid occlusions, we used individual Lapel microphones and 4 Kinects. We did not employ a speaker identification system in this study; We relied on individual recordings that are not easily applicable to children in the wild. The speaker identification issue is considered in future work.
V-B Speech Emotion Detection System
In a dynamic play environment, we assume that children frequently show occlusions between their motions and it would be challenging to track their faces constantly. Hence, we used a speech emotion recognition system to monitor their affective states. In this pilot study, we focused only on measurement of children’s arousal level. To this end, we have developed a deep multi-task learning based speech emotion recognition model using aggregated corpora that provides better generalisation. The model has two layers of Long-Short-Term-Memory (LSTM) with 128 cells. Details of the method and used corpora can be found in. The unweighed accuracy on the arousal dimension (low, high levels) was %.
The system consists of three modules: voice activity detector, feature extractor, and classifier. For robustness in a noisy environment, we adopt Gaussian Mixture Models classifying frames with a length of 20ms into speech and non-speech frames. Then, consecutive speech frames bridged by a short silence (shorter than a half sec.) but segmented by a long silence (longer than a half sec.) forms an utterance to classify. We only classify sufficiently long utterances (longer than 1 sec.). Next, manually-engineered feature vectors are extracted from an utterance (See the details in
. Lastly, the trained LSTM network estimates the probabilistic distribution of the two classes.
We classified utterances extracted from each recording of a child to analyse their arousal level. Table III summarises the classified states. As shown, we found more utterances with the high level of arousal in session 1 than those in session 2, which is aligned with our behavioural observations. In addition, we observed how the robot’s strategy of unexpected behaviour (i.e. the robot did not follow the verbal command of the child) affected arousal states of children. Figure 5 presents examples. We detected arousal states regardless of who speaks. First, (a) shows there was no high-level of arousal when the robot behaved as expected by the children. However, (b) and (c) showed high-level of arousal when the robot demonstrated unexpected behaviours, which indicate the efficiency of the strategy in the specific context.
|Session||Child A||Child B||Average|
Vi Discussion and Future Work
This paper describes the initial steps towards the design of a planning based robotic system for social child-robot interaction in a play environment. We have proposed a Dynamic Interaction Framework based on the existing PAD model of emotions for social HRI. The robot uses a planning to create plans that complete tasks while being socially aware and executing specific strategies to keep the interacting children positive and engaged.
The temporal model we created for this scenario includes task-related actions and social-emotional actions that have the robot interact with the children directly to improve their emotional state. By creating a plan in advance we can pre-emptively improve children’s emotional states and finish the task in a good time.
Finally, we have presented a pilot study; we evaluated part of the proposed Dynamic Interaction Framework as well as the strategy of robot’s unexpected behaviour. We demonstrated that the framework is applicable in real settings and the strategy has a positive impact on children’s arousal level.
The proposed Dynamic Interaction Framework aims to support child-robot interaction in dynamic play settings but it has some limitations. One of the major challenges relates to temporal considerations. While, the framework takes into account timing aspects for the execution of a specific strategy, due to the complexity of the dynamic setting this is challenging to be accurate enough during the execution.
In future work, we intend to empirically investigate temporal aspects of robot’s behaviour in play environments and their effectiveness. In addition, we aim to integrate further modalities for the identification of children’s emotional states and engagement level. By further developing the Dynamic Interaction Framework for planning based robotic systems, we aim to improve its transferability in socially complex settings such as children’s play environments.
- Barrett  L. F. Barrett. The theory of constructed emotion: an active inference account of interoception and categorization. Social cognitive and affective neuroscience, 12(1):1–23, 2017.
- Bernardini and Porayska-Pomsta  S. Bernardini and K. Porayska-Pomsta. Planning-Based Social Partners for Children with Autism, volume N/A, pages 362–370. AAAI Press, n/a edition, 2013.
- Charisi et al.  V. Charisi, D. Davison, D. Reidsma, and V. Evers. Evaluation methods for user-centered child-robot interaction. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pages 545–550, Aug 2016. doi: 10.1109/ROMAN.2016.7745171.
- Coles et al.  A. J. Coles, A. I. Coles, M. Fox, and D. Long. Forward-chaining partial-order planning. In Proceedings of the Twentieth International Conference on Automated Planning and Scheduling (ICAPS-10), May 2010.
- Dautenhahn  K. Dautenhahn. Getting to know each other—artificial social intelligence for autonomous robots. Robotics and autonomous systems, 16(2-4):333–356, 1995.
- Davis and Levine  E. L. Davis and L. J. Levine. Emotion regulation strategies that promote learning: Reappraisal enhances children’s memory for educational information. Child Development, 84(1):361–374, 2013.
- Denham  S. A. Denham. Social-emotional competence as support for school readiness: What is it and how do we assess it? Early education and development, 17(1):57–89, 2006.
- Ekman  P. Ekman. Facial expressions. Handbook of cognition and emotion, 16:301–320, 1999.
- Esteban et al.  P. G. Esteban, P. Baxter, T. Belpaeme, et al. How to build a supervised autonomous system for robot-enhanced therapy for children with autism spectrum disorder. Paladyn, Journal of Behavioral Robotics, 8(1), 2017.
- Fox and Long  M. Fox and D. Long. PDDL2.1: an extension to PDDL for expressing temporal planning domains. CoRR, abs/1106.4561, 2011.
Gordon et al. 
G. Gordon, S. Spaulding, J. Kory Westlund, et al.
Affective personalization of a social robot tutor for children’s
second language skills.
Thirtieth AAAI Conference on Artificial Intelligence, 2016.
-  H. Gunes and B. Schuller. Categorical and dimensional affect analysis in continuous input: Current trends and future directions.
- Hayes and Scassellati  B. Hayes and B. Scassellati. Autonomously constructing hierarchical task networks for planning and human-robot collaboration. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 5469–5476, May 2016. doi: 10.1109/ICRA.2016.7487760.
- Izard  C. E. Izard. Basic emotions, relations among emotions, and emotion-cognition relations. 1992.
- Kim et al.  J. Kim, G. Englebienne, K. P. Truong, and V. Evers. Towards speech emotion recognition “in the wild” using aggregated corpora and deep multi-task learning. In Proceedings of INTERSPEECH, page To be appeared, 2017.
- Kruse et al.  T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch. Human-aware robot navigation: A survey. Robotics and Autonomous Systems, 61:1726–1743, 2013.
- Lillard  A. S. Lillard. The Development of Play. John Wiley, Inc., 2015. ISBN 9781118963418. doi: 10.1002/9781118963418.childpsy211.
- Mehrabian  A. Mehrabian. Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4):261–292, 1996. ISSN 1936-4733. doi: 10.1007/BF02686918.
- Park et al.  J. W. Park, W. H. Kim, et al. How to completely use the pad space for socially interactive robots. In 2011 IEEE International Conference on Robotics and Biomimetics, pages 3005–3010, Dec 2011.
- Plutchik  R. Plutchik. The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American scientist, 89(4):344–350, 2001.
- Schönfelder et al.  S. Schönfelder, P. Kanske, J. Heissler, and M. Wessa. Time course of emotion-related responding during distraction and reappraisal. Social Cognitive and Affective Neuroscience, 9(9):1310, 2014. doi: 10.1093/scan/nst116.
- Sheppes and Gross  G. Sheppes and J. J. Gross. Is timing everything? temporal considerations in emotion regulation. Personality and Social Psychology Review, 15(4):319–331, 2011.
- Weisberg et al.  D. S. Weisberg, A. K. Kittredge, K. Hirsh-Pasek, et al. Making play work for education. Phi Delta Kappan, 96(8):8–13, 2015. doi: 10.1177/0031721715583955.
- Zheng et al.  Z. Zheng, Z. Warren, A. Weitlauf, et al. Brief report: Evaluation of an intelligent learning environment for young children with autism spectrum disorder. Journal of Autism and Developmental Disorders, 46(11):3615–3621, 2016. doi: 10.1007/s10803-016-2896-0.