Log In Sign Up

Spoken Dialogue Strategy Focusing on Asymmetric Communication with Android Robots

by   Daisuke Kawakubo, et al.

Humans are easily conscious of small differences in an android robot's (AR's) behaviors and utterances, resulting in treating the AR as not-human, while ARs treat us as humans. Thus, there exists asymmetric communication between ARs and humans. In our system at Dialogue Robot Competition 2022, this asymmetry was a considerable research target in our dialogue strategy. For example, tricky phrases such as questions related to personal matters and forceful requests for agreement were experimentally used in AR's utterances. We assumed that these AR phrases would have a reasonable chance of success, although humans would likely hesitate to use the phrases. Additionally, during a five-minute dialogue, our AR's character, such as its voice tones and sentence expressions, changed from mechanical to human-like type in order to pretend to tailor to customers. The characteristics of the AR developed by our team, DSML-TDU, are introduced in this paper.


page 1

page 3

page 4


Dialogue system with humanoid robot

Today, as seen in smart speakers, spoken dialogue technology is rapidly ...

Personality-adapted multimodal dialogue system

This paper describes a personality-adaptive multimodal dialogue system d...

Improving User's Sense of Participation in Robot-Driven Dialogue

In task-oriented dialogues with symbiotic robots, the robot usually take...

Smartphone app with usage of AR technologies - SolAR System

The article describes the AR mobile system for Sun system simulation. Th...

Spoken Dialogue System Based on Attribute Vector for Travel Agent Robot

In this study, we develop a dialogue system for a dialogue robot competi...

A Research Platform for Multi-Robot Dialogue with Humans

This paper presents a research platform that supports spoken dialogue in...

Intelligent Conversational Android ERICA Applied to Attentive Listening and Job Interview

Following the success of spoken dialogue systems (SDS) in smartphone ass...

I Introduction

In addition to a human-like appearance, smooth movements, and rich facial expressions, the intelligence of android robots (ARs) is being improved so as to become closer to the human level in spoken dialogues. AR systems developed by Ishiguro et al, such as the Geminoid [1] and ERICA [2], were made available to us participants in the Dialogue Robot Competition (DRC). The scope of the competition was to evaluate an AR’s practicality in real situations. For the task of sightseeing-spot recommendation at a travel agency, participants tried to develop an original multimodal dialogue system using Android I, an ERICA-based platform (Fig. 1). For the 2022 competition [3], our team, DSML-TDU, which had joined in the first competition in 2020 [4], updated the system, focusing on asymmetric communication [5] as explained below.

Fig. 1: Android I works as counter salesperson and has dialogue with customer. Pictures of two spots are shown on the display.

Humans are easily conscious of small differences in an AR’s behaviors and utterances. Therefore, we tend to treat ARs nowadays as not-human, although our developed ARs treat us as humans. This asymmetricity can be seen when we communicate with an AR.

We should try to develop dialogue strategies in consideration of this asymmetric communication between robots and humans, as Bono [6]

also said “we can only develop conversations between robots and humans on the basis of the ‘differences’ that humans unconsciously recognize as species.” We assumed that ARs would have acceptable communication styles which real humans would probably avoid using. In the case of communicating with someone for the first time, for example, humans will talk starting with casual topics and avoid personal matters. Here, we have a question whether we can allow the AR’s impoliteness or lacking consideration. As a discussion of this asymmetric communication, our team’s system and competition results at DRC 2022 are described in this paper.

Fig. 2: Dialogue flow of proposed system.
Fig. 3: Configuration of system by Team DSML-TDU. Our developed modules are in the frame filled in salmon pink. They were implemented into android robot system prepared by competition organizers. The AR used multimodal inputs from customer to communicate with him/her.

Ii Our Tasks in Drc 2022

The preliminary round of DRC 2022 was held at the mock travel agency booth in Miraikan. The detail of the competition are shown in the overview paper [3]. Android I acts as a salesperson at a travel agency and communicate with a customer. A customer as an experimental participant who has two alternative spots selected previously by him/herself decides one through a 5-minute dialogue with Android I. In this situation, our challenging tasks are to give a customer enough information for both spots, to make him/her feel enjoyable through the dialogue, and to lead his/her decision to the recommended spot randomly designated by organizers.

Iii Dialogue Strategy Focusing on Asymmetric Communication

The dialogue of the proposed system consisted of 7 phases (Fig. 2). Our system consists of a dialogue transition control system, a fashion item detection module, a chit-chat dialogue generation module, a knowledge selection module, and an answer generation module (Fig. 3).

ARs have a human-like appearance, so their behaviors that deviate greatly from humans can cause customers to feel uncomfortable. This is important in focusing on asymmetric communication. We used tricky phrases in the proposed system. If these phrases were used by a human, the customer would feel uncomfortable, but if they are used by an AR, the customer could see them as acceptable and have a good impression.

Iii-a Shortening Psychological Distance from Customer

At the beginning of dialogue, the customer expects the AR to make utterances and has an attitude to listen. Therefore, from the beginning, the AR uses the customer’s fashion item to say, “Your glasses are very nice.” If a human made this utterance in a first meeting with a customer, the customer would feel uncomfortable as this would be too personal, but if an AR makes this utterance, it can shorten the psychological distance with him/her from the beginning of dialogue.

Iii-B Natural Leading Without Feeling Intentional

After providing information on a recommended spot to the customer, the AR asks, “Does today’s dialogue make you want to visit the recommended spot?” The purpose of this question is to give the customer the impression that the customer had chosen the recommended spot by their own will. If a human asks this question, the customer would feel mentally pressured. However, since the AR asks this question, the customer would be more likely to believe that this question was not intentional, and the robot recommendation effect will increase further.

If the customer responds affirmatively to this question, the AR engages in free discussion related to the topic of the recommended spot. In case of the negative response, or if the customer wants information on the other spot (non-recommended spot), the free discussion is then related to the topic of the non-recommended spot. Although providing information on the non-recommended spot would cause the robot recommendation effect to be lower, this was done to follow a competition regulation stating that the customer was to check information on both spots.

Iv Proposed System

In previous section, we discussed asymmetric communication such as tricky phrases that would be acceptable for humans. Then, in this session, we introduce AR’s behaviors and utterances aiming at human-likeness.

Iv-a Character Transforming for Pretending to Tailor to Customer

A dialogue system should change its response in accordance with the customer’s internal state [7]. By clearly changing the AR’s attributes, personality, and interests to be tailored to the customer, as shown in the Fig. 4, the customer can gain a feeling of familiarity toward the AR. During the introduction in the dialogue, the AR says, “I will guide you with a character tailored to you,” and then transforms from a machine-like AR with a low voice and stiff tone of voice to a human-like AR with a high voice and soft tone of voice.

Although we could prepare several types of characters (shown in Fig. 4), a customer could not see others’ dialogues under the competition regulation. Therefore, the parameters of human-like and machine-like was consistently set to above mentioned condition through our preliminary round in DRC 2022 so as to analyze the experimental results easily. In spite of the consistency through all dialogues, we think all customers might regard the character change as robot’s serving for themselves.

Fig. 4: Transition in robot’s internal states during a 5-minute dialogue. Robot takes M (machine-like) mode at beginning and end of dialogue, and H (human-like) mode between them. Machine-like character is transformed to human-like character so that customer can feel that AR is serving him/her.
Fig. 5: Transition of AR’s movements for smooth turn-taking.

Iv-B Reducing Mental Stress in Customer

Unsmooth turn-taking may cause the customer to feel uncomfortable. Therefore, by clearly indicating the listening state of the AR, the customer’s mental stress can be reduced. Our developed transition chart was shown in Fig. 5. When the AR started to have a question, it leaned forward to indicate to the customer that turn-taking will be occurring. During the customer answering, the AR tilted its head at regular intervals to indicate to the customer that it was listening. After finishing the customer’s utterance, the AR nodded its head deeply twice to indicate to him/her that it was recognizing the utterance while generating next AR’s utterance.

Iv-C Building More Natural Dialogue

If the AR only asked closed questions to customers, it could give the impression of interrogating them. In the dialogue, the AR asked open questions during chit-chat, which allowed customers to speak freely and get a good impression of the AR (Table I).

The important thing in a dialogue is to share and understand each other’s thoughts, so an AR should not speak and ask questions unilaterally. The dialogue of the proposed system incorporated free discussion in which the customer’s questions are answered, so we aimed to achieve this kind of dialogue (Table II).

The AR needs to respond naturally to the customer’s free utterances. In some cases, the AR responds to the customer’s utterances using sentences generated by the model for chit-chat dialogue system [8].

Now, I would like to ask you a few questions so that I can
recommend a spot! First, What past trips do you remember?
あなたについていくつか聞きます. まず,あなたの
Cus Germany. (ドイツ)
Germany sounds great! So, what do you remember about
Germany? (ドイツいいですね! じゃあ,ドイツへの
I remember seeing castles, eating meals, and riding the train.
I wonder what German castles look like. Who would you
like to go on the trip with?(ドイツのお城どんなふうに
なっているのか気になります 今日考えてる旅行へは誰と
Cus I want to go with friends. (友達と行きたいです)
So you are traveling with friends. (友達との旅行なんですね.)
TABLE I: Example of the dialogue during open questions from the AR.
So, do you have any questions about Tokyo Trick Art Museum,
such as fees or the parking area? (では,東京トリックアート
Cus How much are the fees? (料金はいくらですか)
High school students and older are 1,000 yen, 4 years old to
junior high school students are 700 yen, and children under
3 years old are free. Do you have any questions? (高校生以上
Cus Where is the nearest station? (最寄り駅はどこですか)
It is a 2-minute walk from Odaiba Marine Park Station on the
Yurikamome Line or a 3-minute drive from Daiba Gateway on
the Metropolitan Expressway. (ゆりかもめ「お台場海浜公園駅」
TABLE II: Example of the dialogue responding to customer’s questions.


Questionnaire items
mean±SD mean


Sat/c 4.7±1.6 4.2
Inf 5.0±1.7 4.0
Nat 3.9±1.6 3.8
App 4.5±1.7 4.4
Lik 4.7±1.7 4.6
Sat/d 5.0±1.6 4.1
Tru 4.5±1.9 4.3
Use 4.9±1.4 4.7
Reu 4.6±1.8 4.1
Recom 11.5±26.2 5.4


TABLE III: Impression evaluation of the dialogue and the robot recommended effect by the questionnaire survey from 29 customers. Baseline system is general recommendation dialogue system created by organizers. Sat/c, Inf, Nat, App, Lik, Sat/d, Tru, Use, Reu, and Recom denote satisfaction with choice, informativeness, naturalness, appropriateness, likeability, satisfaction with dialogue, trustworthiness, usefulness, intention to reuse, and robot recommendation effect, respectively.

V Analysis of Results

Questionnaire results from the competition are shown in Table III. The definition of detail items in this questionnaire are shown in the overview paper [3]. In order to evaluate the asymmetric communication, we focused on three items of “satisfaction with the dialogue (Sat/d)”, “trustworthiness (Tru)” and “the robot recommendation effect (Recom)”. The former two scores (Sat/d and Tru) of our system were higher than ones of the baseline system, showing and , respectively. The last (Recom) which means the degree of success to lead the customer to the recommendation spot was also higher than baseline, showing . These results showed our dialogue strategy may be effective. The other scores which seemed to have no significant difference from baseline will be discussed in the future work with other teams data.

We watched videos of dialogues with customers that had a low evaluation. From the videos, we confirmed cases of dialogue breakdown due to errors in the speech recognition software and the fashion item detection module, as well as cases in which the speech synthesizer misread the generated sentences.

A total of 29 dialogue evaluations of our system were conducted in one day. Looking at the questionnaire results over time, there were 9 later cases for which the results were especially low. The experiment was conducted on a holiday, so the environment in the latter half of the day was noisy. This may have reduced the customers’ ability to concentrate on the dialogue, resulting in lowered evaluations.

Vi Conclusion

We developed a hybrid system of human-like and machine-like AR for DRC 2022 that has behaviors/utterances that are as smooth and natural as possible while focusing on asymmetric communication. Our AR started with a machine-like character and transformed its voice tone and sentence expression in order to tailored to each customer. This resulted in the customer’s reaction being one of surprise or laughter. Some tricky phrases for our experimental purpose, such as questions related to customer privacy and forceful requests for agreement, had a certain effect on customer’s feeling of pleasure, reducing psychological distance and stress. Although we still need further analyses, from an overall subjective evaluation after a 5-minute dialogue, the good scores for “satisfaction with dialogue,” “trustworthiness,” and “robot recommendation effect” suggest that our asymmetric communication system would be acceptable for humans nowadays.