Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2

12/05/2016 ∙ by Eugénio Ribeiro, et al. ∙ Inesc-ID 0

In this paper we present strategies for mapping the dialog act annotations of the LEGO corpus into the communicative functions of the ISO 24617-2 standard. Using these strategies, we obtained an additional 347 dialogs annotated according to the standard. This is particularly important given the reduced amount of existing data in those conditions due to the recency of the standard. Furthermore, these are dialogs from a widely explored corpus for dialog related tasks. However, its dialog annotations have been neglected due to their high domain-dependency, which renders them unuseful outside the context of the corpus. Thus, through our mapping process, we both obtain more data annotated according to a recent standard and provide useful dialog act annotations for a widely explored corpus in the context of dialog research.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

During a conversation, dialog acts are the minimal units of linguistic communication, since they are able to reveal the intention behind the uttered words [19]. Thus, they play an important role and have been widely explored by linguists and computer scientists in the Natural Language Processing (NLP) area. In this sense, automatic dialog act recognition is particularly important in the context of dialog systems [16]. However, in order to develop systems that are able to successfully recognize dialog acts, annotated data is required. This takes us to the problem of dialog act annotation and the multiple methodologies and tag sets described in the different studies performed along the years. Dialog act annotation was typically performed in the context of projects or the development of datasets, each introducing new tag sets or modifying the existing ones. This led to a wide scattering of data in terms of the used annotation, which hardens the comparison of results and conclusions obtained using different approaches. Thus, in an attempt to set the ground for more comparable research in the area, a standard, ISO 24617-2 [6], was developed. However, since the standard is relatively recent and the described annotation process is non-trivial, the amount of data annotated according to it is reduced. In order to increase that amount, we defined strategies for mapping the original dialog act annotations of the LEGO corpus [18] into the communicative functions of the standard. We chose this corpus because although it has been widely explored in dialog related tasks, its original dialog act annotations have been neglected. This is due to the high domain dependence of the labels, which can only be used on that specific dataset. However, such specificity also simplifies the mapping of the labels into the higher level domain-independent communicative functions of the standard.

In the remaining sections of this paper we start by providing some insight into the ISO 24617-2 and the LEGO corpus in Sections 2 and 3, respectively. After that, in Sections 4 and 5, we describe our mapping strategies for both system and user turns of the corpus. Finally, in Section 6, we discuss the distribution of the communicative functions of the standard in the corpus and the quality of the conversion process.

2 Iso 24617-2

In an attempt to standardize dialog act annotation and, thus, set the ground for more comparable research in the area, Bunt et al. [6] defined the ISO 24617-2 standard. The first thing that should be noticed in the standard is that annotations should be performed on functional segments rather than on turns or utterances [9]. This should happen because a single turn or utterance may have multiple functions, revealing different intentions. However, automatic, and even manual, functional segmentation is a complex task on its own. Furthermore, according to the standard, dialog act annotation does not consist of a single label, but rather of a complex structure containing information about the participants, relations with other functional segments, the semantic dimension of the dialog act, its communicative function, and optional qualifiers concerning certainty, conditionality, partiality, and sentiment. In terms of semantic dimensions, the standard defines nine – Task, Auto-Feedback, Allo-Feedback, Turn Management, Time Management, Discourse Structuring, Own Communication Management, Partner Communication Management, and Social Obligations Management. Communicative functions correspond to the dialog act labels present in the multiple tag sets used to annotate data before the introduction of the standard [3, 8, 15, 2, 11, 1]. They were divided into general-purpose functions, which can occur in any semantic dimension, and dimension-specific functions, which, as the name indicates, are specific to a certain dimension. The set of general-purpose functions is hierarchically distributed according to Figure 1. Dimension-specific functions are all at the same level and are distributed across dimensions according to Figure 2. We can see that there are dimension-specific functions for only eight of the nine dimensions. This means that the task dimension contains general-purpose functions only. Furthermore, although there are ten communicative functions specific for the Social Obligations Management dimension, some consist of the initial and return counterparts of the same function. These are usually paired to form higher level communicative functions.

Figure 1: Distribution of general-purpose communicative functions according to the ISO 24617-2 standard. Adapted from [5].
Figure 2: Distribution of dimension-specific communicative functions according to the ISO 24617-2 standard.

Since the standard is relatively recent and the annotation process is non-trivial, the amount of data annotated according to it is reduced. In this sense, the largest amount of annotated data is provided by the Tilburg University DialogBank [7] 111https://dialogbank.uvt.nl/. If features dialogs from different corpora in multiple languages. Most of the annotated data was obtained through conversion processes of previously annotated dialogs from corpora such as Switchboard [13] and HCRC Map Task [21]. However, in order to obtain a complete annotation, these conversion processes, such as the one for the Switchboard corpus [12], require manual steps, which are both time and resource consuming. This is especially notorious for resegmentation processes of turns and utterances into functional segments. Thus, in spite of the efforts, some of the dialogs in the DialogBank are not completely annotated according to the standard and the addition of new dialogs is slow. Overall, this means that in order to disseminate the standard and perform solid experiments using its annotation methodology, it is important to put some effort into obtaining larger amounts of data annotated according to it.

3 LEGO Corpus

The LEGO corpus [18] is an annotated subset of the The Let’s Go corpus [17]. It features data from the Carnegie Mellon University (CMU)’s Let’s Go Bus Information System, which provides information about bus schedules in the city of Pittsburg, through spoken telephonic interaction with a dialog system.

A set of 347 calls recorded during 2006 was later annotated by the Dialogue Systems Group at Ulm University, Germany. This annotated subset is the one usually referred to as LEGO. It features 14186 utterances – 9083 system utterances and 5103 user utterances. The number of utterances is different because some system turns contain multiple utterances and some of its prompts are not answered by the user. There is no standard division of this corpus into training and testing sets.

Since system utterances are generated through slot filling of fixed templates, they have no errors and contain casing and punctuation information. In contrast, the transcriptions of user utterances were obtained using an Automatic Speech Recognition (ASR) system and, thus, contain no casing nor punctuation information. Furthermore, the recognition was not always correct, due to the error rate of the system. Unfortunately, a concrete value for the Word Error Rate (WER) is not revealed. Figure 3 presents an excerpt of a transcription.

System: Welcome to the CMU Let’s Go bus information system.

System: To get help at any time, just say Help or press zero.

System: What can I do for you?

User: PENN AT BUTLER TIME IS IT IS THE CENTRE AT HIGHLAND FROM RUNS THE 61A EIGHT THIRTY 61B CAN’T OAKLAND

System: Leaving from OAKLAND . Is this correct?

User: OAKLAND

System: Leaving from OAKLAND . Is this correct?

User: YES

System: Right.

System: Where do you want to go?

User: SQUIRREL HILL OF FIFTH AVENUE

System: Going to FIFTH AVENUE . Is this correct?

User: YES

Figure 3: An excerpt of a LEGO corpus transcription.

In terms of dialog acts, the original LEGO corpus annotations relied on two distinct and domain-dependent tag sets for system an user turns. The set for system turns contains 28 labels, such as Ask Destination, Ask Bus, and Ask Time, while the set for user turns contains 22 labels, such as Place Information, Confirm Destination, and Reject Bus. When using such tag sets, context information is clearly very important both for dialog act annotation and recognition, since a given dialog act drastically reduces the number of non-disruptive possibilities, that is, that do not break the dialog flow, for the next one. Tables 1 and 2 show the distribution of the labels among the system and user turns, respectively. It is important to notice that the second most frequent label in user turns is Unqualified / Unrecognized, accounting for 15.09% of those turns, which reveals a high number of communication problems between the user and the system.

Label Count %
Confirm Understood 912 10.04
Confirm Departure 823 9.06
Ask Another Query 728 8.01
Ask Bus 714 7.86
Ask Confirm Time 497 5.47
Ask Time 493 5.43
Ask Confirm Destination 458 5.04
Explain 438 4.82
Ask Confirm Bus 425 4.68
Ask Departure 410 4.51
Deliver Result 410 4.51
Filler 410 4.51
Announce Querying 364 4.01
Offer Help 348 3.83
Greeting 347 3.82
Ask Destination 345 3.80
Inform No Schedule 343 3.78
Ask Confirm With Keys 142 1.56
Ask Confirm Neighborhood 128 1.41
Announce Restart 84 0.92
Inform Shorter Answer 75 0.82
Inform Help 67 0.74
Goodbye 36 0.40
Disambiguate Bus Stop 32 0.35
Inform No Route 25 0.28
Instruct Louder 13 0.11
Confirm Restart 10 0.07
Instruct More Quiet 6 0.01
Table 1: Original label distribution in the system turns of the LEGO corpus.
Label Count %
Place Information 879 16.94
Unqualified / Unrecognized 783 15.09
Reject 746 14.38
Line Information 440 8.48
Time Information 391 7.54
Confirm Departure 291 5.61
Confirm Destination 246 4.74
Confirm Time 225 4.34
Confirm 214 4.12
Confirm Bus 179 3.45
Request Next Bus 159 3.06
Reject Departure 135 2.60
New Query 98 1.89
Reject Time 95 1.83
Request Previous Bus 75 1.44
Reject Bus 69 1.33
Reject Destination 53 1.02
Request Help 52 1.00
Goodbye 29 0.59
Request Schedule 18 0.35
Polite 8 0.15
Inform 3 0.06
Table 2: Original label distribution in the user turns of the LEGO corpus.

Although the LEGO corpus has been used in many research tasks related to dialog and interaction with Interactive Voice Response (IVR) systems [22, 20, 4, 14], its dialog act annotations have been neglected. In fact, to our knowledge, only we have used them in some dialog act recognition experiments in the context of the SpeDial project [10] 222http://www.spedial.euprobably due to the domain dependence of the labels, which would not be useful in any other domain. Furthermore, since the labels are so specific, even a system dealing with the same domain would only be able to benefit from them if the dialog had the same characteristics as the ones from the LEGO corpus. However, such specificity also leads to almost direct mappings of these labels into the higher level communicative functions of the ISO 24617-2 standard. This mapping does not produce a complete annotation according to the standard for three main reasons. First, not all the communicative functions present in the turns are covered, since it is not possible to obtain information related to certain dimensions from the transcriptions and the original labels alone. Secondly, communicative functions are just part of the annotation. Thirdly, according to the standard, annotations are made at the functional segment level and not at the turn level. Still, the mapping of the original labels into the communicative functions of the standard is able to provide a large amount of data for applications in the area of dialog act recognition. In fact, as stated in Section 2, a large part of the existing data annotated according to the standard was annotated through conversion processes and does not provide all the required information to form a complete annotation according to the requirements of the standard. Thus, in the next sections we describe our strategies to map the original dialog act annotations of the LEGO corpus into the communicative functions of the ISO 24617-2 standard.

4 Mapping of System Labels

Since the system turns of the LEGO corpus are generated through slot filling, each dialog act label is attributed to a small group of sentence templates. Thus, in order to map the labels into the communicative functions of the standard we analyzed those templates and attributed them the corresponding function. The strategies used for each label are presented below.

4.1 Announce Querying

The turns annotated with this label consist of wait requests while the system looks for the required information, such as “Just a second.” and “Hold on. I’ll look that up.”. These are techniques for pausing the dialog which fall under the Time Management dimension defined in the standard and, thus, we annotated them with the Pausing communicative function. Furthermore, in cases such as the second example, the system makes a promise to look for the required information. Thus, those cases were also annotated with the Promise label in the Task dimension.

4.2 Announce Restart

All turns annotated with this label contain the same utterance – “Okay, let’s start from the beginning.” – which reveals the intention of the system to structure the dialog towards a restart of the current interaction. Thus, we annotated these turns with the Interaction Structuring communicative function in the Discourse Structuring dimension.

4.3 Ask Another Query

The turns annotated with this label state the different options the user has at that point in an attempt to elicit one of the behaviors. Examples of such turns are “You can say, when is the next bus, when is the previous bus, start a new query, or goodbye.” and “To ask about a different trip, you can say, start a new query. If you are finished, you can say goodbye.”. On the one hand, these turns provide instructions to the user about the task and, thus, were annotated with the Instruct communicative function in the Task dimension. On the other hand, they are also an attempt to structure the dialog and, thus, were also annotated with the Interaction Structuring communicative function in the Discourse Structuring dimension.

4.4 Ask Bus

In the cases annotated with this label, the system asks what it can do for the user, or what bus schedule he or she wants to obtain information on. Examples are “What can I do for you?” and “What bus schedule information are you looking for?”. These are Set Questions in the Task dimension and, thus, were annotated accordingly.

4.5 Ask Departure

These turns are similar to the ones annotated with the Ask Bus label (Section 4.4), but asking for the departure place instead of a specific bus identifier. An example is “Where are you leaving from?”. Accordingly, we annotated them with the Set Question communicative function in the Task dimension as well.

4.6 Ask Destination

These turns are in all similar to the ones annotated with the Ask Departure label (Section 4.5), but asking for the destination place instead of the departure place. An example is “What is you destination”. Thus, they were also annotated with the Set Question communicative function in the Task dimension.

4.7 Ask Time

Similarly to the previous three labels, turns annotated with this label consist of the system asking for information, in this case about the time of travel. An example is “When do you wanna travel?”. Consequently, we also annotated them with Set Question communicative function in the Task dimension.

4.8 Ask Confirm Bus

The turns annotated with this label consist of two segments. The first states the identifier of the bus, as understood by the system, which is a case of auto-feedback, leading to annotation with the Auto Positive communicative function in the Auto-Feedback dimension. The second asks for confirmation by the user, which is a Check Question in the Task dimension. Examples of such turns are “The 54C. Did I get that right?” and “The 28X. Is this correct?”.

4.9 Ask Confirm Departure

These turns are similar to the ones annotated with the previous label, but confirming information about the place of departure. An example is “Leaving from Oakland. Is this correct?”. Thus, they were also annotated with the Auto Positive and Check Question communicative functions, in the Auto-Feedback and Task dimensions, respectively. However, some of the turns seem to be wrongly annotated, as they correspond to instructions by the system on how to confirm or reject using the keys and, thus, should have the Ask Confirm With Keys label. These cases were converted using the strategy for that label, as described in Section 4.13.

4.10 Ask Confirm Destination

These turns are in all similar to the ones annotated with the Ask Confirm Departure label (Section 4.5), but concerning the destination place instead of the departure place. An example is “Going to Fifth Avenue. Is this correct?”. Consequently, they were also annotated with the Auto Positive and Check Question communicative functions, in the Auto-Feedback and Task dimensions, respectively. Once again, some of the turns should have the Ask Confirm With Keys label instead. These cases were converted using the strategy for that label, as described in Section 4.13.

4.11 Ask Confirm Neighborhood

Turns annotated with this label are similar to the ones annotated with the previous two labels, but confirming a zone instead of a specific stop. An example is “Waterworks Mall. Is this correct?”. Accordingly, they were also annotated with the Auto Positive and Check Question communicative functions, in the Auto-Feedback and Task dimensions, respectively.

4.12 Ask Confirm Time

Similarly to the previous four labels, turns annotated with this label consist of the system trying to confirm the the understood information, in this case concerning the time of travel. An example is “Leaving at 5 a.m.. Did I get that right?”. Thus, we also annotated these turns with the Auto Positive and Check Question communicative functions, in the Auto-Feedback and Task dimensions, respectively.

4.13 Ask Confirm With Keys

Similarly to the previous labels, turns annotated with Ask Confirm With Keys states the understood information to provide feedback and requests confirmation. However, in this case, instead of making a question, the system instructs the user on how to answer using the keypad. An example is “If you want the schedule of the 54C say yes or press one, otherwise say no or press three.”. Thus, consistently with the previous labels, we annotated these turns with the Auto Positive communicative function in the Auto-Feedback dimension. However, in the Task dimension, we annotated them with the Instruct communicative function instead of Check Question.

4.14 Confirm Restart

All turns annotated with this label contain the same utterance – “Are you sure you want to start over?” – which is a check by the system on whether the user really wants to restart the interaction. Thus, we annotated these turns with the Check Question communicative function in the Task dimension. Furthermore, since the system states that it understood a restart request, we also annotated these turns with the Auto Positive communicative function in the Auto-Feedback dimension.

4.15 Confirm Understood

These turns signal the system’s understanding of the user’s intention, through feedback utterances such as “Right” and “Ok”. Accordingly, we annotated them with the Auto Positive communicative function in the Auto-Feedback dimension.

4.16 Deliver Result

In the turns annotated with this label the system provides the bus schedule information according to the parameters discussed along the dialog. An example is “The next 61C leaves Eighth Avenue at Ann at 7:45 p.m. and arrives at Second Street at Grant at 7:59 p.m..”. Thus, we annotated them with the Inform communicative function in the Task dimension.

4.17 Disambiguate Bus Stop

The turns annotated with this label occur when there is some confusion about the requested bus stops. Cases like “Which stop in Duquesne are you leaving from?” are questions that ask for a specific bus stop instead of a zone, while cases like “Downtown and Forbes are both the same stop. Please provide a different start or end point.” request a different start or ending point. Thus, in the Task dimension, we annotated turns similar to the first example with the Set Question communicative function and ones similar to the second example with Request .

4.18 Explain

The turns annotated with this label provide instructions and examples to the user according to what he or she can do at that time. An example is “For example, you can say, Forbes and Murray, Downtown, or McKeesport.”. Accordingly, we annotated these turns with the Instruct communicative function in the Task dimension.

4.19 Filler

All turns annotated with this label contain the same utterance – “Alright” – and serve the same purpose as the ones annotated as Confirm Understood (Section 4.15). Consequently, we also annotated them with the Auto Positive communicative function in the Auto-Feedback dimension.

4.20 Goodbye

These turns correspond to the system politely ending the dialog, with utterances such as “Thank you for using the CMU Let’s Go Bus Information System. Goodbye.” . Thus, they fall on the Social Obligations Management dimension of the standard and were annotated with the Goodbye communicative function.

4.21 Greeting

All turns annotated with this label contain the same utterance – “Welcome to the CMU Let’s Go bus information system.” – which opens the interaction and greets the user. Thus, it has two functions on two different dimensions and, accordingly, we annotated them with the Opening communicative function in the Discourse Structuring dimension and with Greeting in the Social Obligations Management dimension.

4.22 Inform Help

All turns annotated with this label contain the same utterance – “I am an automated spoken dialogue system that can give you schedule information for bus routes in Pittsburgh’s East End. You can ask me about the following buses: 28X, 54C, 56U, 59U, 61A, 61B, 61C, 61D, 61F, 64A, 69A, and 501.” – which informs the user of the buses that the system has information about. Thus, we annotated them with the Inform communicative function in the Task dimension.

4.23 Inform No Route

The turns annotated with this label state that there are no buses satisfying the indicated parameters and apologize for that. An example is “I’m sorry, but there is no bus that goes between CMU and Squirrel Hill at that time.”. Thus, we annotated them with the Inform communicative function in the Task dimension. Furthermore, to cover the apologizing function, we also annotated them with the Apology communicative function in the Social Obligations Management dimension.

4.24 Inform No Schedule

Similarly to the previous label, the turns annotated with this label state that the system does not have the schedule for the requested bus and apologize for that. An example is “I’m sorry but I do not have the schedule for the 500. The routes I currently cover are the following: 28X, 54C, 56U, 59U, 61A, 61B, 61C, 61D, 61F, 64A, 69A, and 501.”. Thus, they were also annotated with the Inform and Apology communicative functions in the Task and Social Obligations Management dimensions, respectively. However, many turns were wrongly annotated with this label, which required manual and individual mapping.

4.25 Inform Shorter Answer

In these turns, the system asks the user to use shorter answers, both using polite requests, such as “Please use shorter answers because I have trouble understanding long sentences.”, and more assertive commands, such as “I need you to give me a short answer.”. In the Task dimension, the first were annotated with the Request communicative function, while the second were annotated with Instruct.

4.26 Instruct Louder

In these turns the system asks the user to speak louder using a polite request – “I’m having some trouble hearing you. If you’re still there, please try to talk a little bit louder or closer to the phone.”. Thus, we annotated them with the Request communicative function in the Task dimension.

4.27 Instruct More Quiet

In these turns the system asks the user speak more quietly using a polite request, such as “I can’t understand loud speech. Please speak more quietly.” . Similarly to the turns annotated with the previous label, we annotated these with the Request communicative function in the Task dimension.

4.28 Offer Help

All turns annotated with this label contain the same utterance – “To get help at any time, just say Help or press zero.” – which instructs the user on how to get help. Accordingly, we annotated them with the Instruct communicative function in the Task dimension.

5 Mapping of User Labels

Contrarily to system turns, the user turns are open and contain recognition errors. Thus, the mapping of their labels into the communicative functions of the standard is not as straightforward. Still, since the turns are typically short and the labels are highly domain-dependent, there are still mapping strategies that can be applied. For each label they are presented below.

5.1 Confirm

The turns annotated with this label consist of user confirmations that the system understood correctly, such as “Correct” and “Yes”. Thus, we annotated them with the Confirm communicative function in the Task dimension. Furthermore, we also annotated them with the Allo Positive communicative function in the Allo-Feedback dimension, since they serve as feedback for the system.

5.2 Confirm Bus

The turns annotated with this label are in all similar to the ones annotated with the Confirm label (Section 5.1). They have a different label since they correspond to confirmations of the bus identifier specifically. Thus, we also annotated them with the Confirm and Allo Positive communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.3 Confirm Departure

The turns annotated with this label are in all similar to the ones annotated with the Confirm label (Section 5.1). They have a different label since they correspond to confirmations of the place of departure specifically. Thus, we also annotated them with the Confirm and Allo Positive communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.4 Confirm Destination

The turns annotated with this label are in all similar to the ones annotated with the Confirm label (Section 5.1). They have a different label since they correspond to confirmations of the destination stop specifically. Thus, we also annotated them with the Confirm and Allo Positive communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.5 Confirm Time

The turns annotated with this label are in all similar to the ones annotated with the Confirm label (Section 5.1). They have a different label since they correspond to confirmations of time information specifically. Thus, we also annotated them with the Confirm and Allo Positive communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.6 Goodbye

Similarly to the system turns annotated with the Goodbye label (Section 4.20, these turns intend to end the dialog by using the “Goodbye” keyword. Consequently we also annotated them with the Goodbye communicative function in the Social Obligations Management dimension.

5.7 Inform

There are only three turns annotated with this label and one of them should be annotated with the Time Information label (Section 5.10) instead. The remaining two provide information unrelated to the system’s question and were annotated with the Inform communicative function in the Task dimension.

5.8 Line Information

The turns annotated with this label provide information about the bus that the user wants to obtain information about. However, this information may come in the form of a question, such as “When is the next 28X from Downtown to the Airport?”, or a statement, such as “The 61A”. In the Task dimension, cases such as the first were annotated with the Set Question communicative function, while cases such as the second were annotated with Inform.

5.9 Place Information

The turns annotated with this label answer a system prompt for a departure or destination bus stop. Examples are “Downtown” and “Duquesne”. Since these are answers to a specific question, in the Task dimension, we annotated them with the Answer communicative function instead of Inform.

5.10 Time Information

Similarly to the previous, the turns annotated with this label answer a system prompt. In this case, related to time information. Examples are “Eleven o’clock” and “Now”. Consequently, we also annotated them with the Answer communicative function in the Task dimension.

5.11 New Query

In these turns, the user instructs the system to start a new query, with utterances such as “Start a new query”, structuring the dialog in that direction. Thus, on the one hand we annotated them with the Instruct communicative function in the Task dimension and, on the other hand, with the Interaction Structuring communicative function in the Discourse Structuring dimension.

5.12 Polite

In the turns annotated with this label, the user thanks the system for some reason. An example is “Thank you” . Thus, we annotated them with the Thanking communicative function in the Social Obligations Management dimension.

5.13 Reject

The turns annotated with this label consist of user rejections or corrections of the system’s understanding, such as “No” and “No, I need the next bus” . Thus, we annotated them with the Disconfirm label in the Task dimension. Furthermore, we also annotated them with the Allo Negative communicative function in the Allo-Feedback dimension, since they serve as feedback for the system.

5.14 Reject Bus

The turns annotated with this label are in all similar to the ones annotated with the Reject label (Section 5.13). They have a different label since they correspond to rejections of the bus identifier specifically. Thus, we also annotated them with the Disconfirm and Allo Negative communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.15 Reject Departure

The turns annotated with this label are in all similar to the ones annotated with the Reject label (Section 5.13). They have a different label since they correspond to rejections of the place of departure specifically. Thus, we also annotated them with the Disconfirm and Allo Negative communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.16 Reject Destination

The turns annotated with this label are in all similar to the ones annotated with the Reject label (Section 5.13). They have a different label since they correspond to rejections of the destination stop specifically. Thus, we also annotated them with the Disconfirm and Allo Negative communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.17 Reject Time

The turns annotated with this label are in all similar to the ones annotated with the Reject label (Section 5.13). They have a different label since they correspond to rejections of time information specifically. Thus, we also annotated them with the Disconfirm and Allo Negative communicative functions in the Task and Allo-Feedback dimensions, respectively.

5.18 Request Help

In the turns annotated with this label, the user asks for help using the keyword “Help” or by pressing the corresponding numeric key. Thus, we annotated them with the Request communicative function in the Task dimension.

5.19 Request Next Bus

The turns annotated with this label consist of the user asking for information about the next bus. However, this request may come in the form of a question, such as “When is the next bus?”, or a statement, such as “The next bus”. Furthermore, when a statement is used, it may be in response to a time request by the system. Thus, in the Task dimension, we used three different communicative functions to annotate these turns according to their nature. Respectively, Set Question, Inform, and Answer.

5.20 Request Previous Bus

These turns are similar to the ones annotated with the previous label, but asking for information about the previous bus instead of the next. Thus, in a similar manner, we annotated them with the Set Question, Inform, or Answer communicative functions in the Task dimension according to their different natures.

5.21 Request Schedule

These turns consist of the user stating that he or she wants schedule information. An example is “Holiday schedule”. Thus, we annotated them with the Inform communicative function in the Task dimension.

5.22 Unqualified / Unrecognized

The turns annotated with this label typically correspond to problems in the dialog. For instance, cases when the ASR system failed to recognize most of the sentence and only outputted a small part of it, such as “The”; cases of self-talk or third-party talk, such as “I’m having fun”; and cases when the user says utterances unrelated to the task or that disrupt the dialog flow. Many of these turns are gibberish and do not correspond to any communicative function of the standard. However, some of them actually reveal an intention and, thus, should be annotated regardless of whether they make sense according to the flow of the dialog. Thus, these cases were annotated manually and individually with different labels.

6 Discussion

System User All
Dimension Function Count % Count % Count %
Answer 0 0.00 1462 28.70 1462 10.31
Check Question 2256 24.84 1 0.02 2257 15.92
Confirm 0 0.00 1162 22.81 1162 8.20
Disconfirm 0 0.00 1105 21.69 1105 7.79
Task Inform 656 7.22 600 11.78 1256 8.86
Instruct 1812 19.95 106 2.08 1918 13.53
Promise 277 3.05 0 0.00 277 1.95
Request 70 0.77 85 1.67 155 1.09
Set Question 1987 21.88 210 4.12 2197 15.50
Suggest 40 0.44 0 0.00 40 0.28
Total 7098 78.15 4731 92.87 11829 83.44
Allo Negative 0 0.00 1105 21.69 1105 7.79
Allo-Feedback Allo Positive 0 0.00 1162 22.81 1162 8.20
Total 0 0.00 2267 44.50 2267 15.99
Auto Negative 0 0.00 19 0.37 19 0.13
Auto-Feedback Auto Positive 3814 41.99 71 1.39 3885 27.40
Total 3814 41.99 90 1.77 3904 27.54
Discourse Interaction Structuring 852 9.38 103 2.02 955 6.74
Structuring Opening 347 3.82 0 0.00 347 2.45
Total 1199 13.20 103 2.02 1302 9.18
Apology 163 1.79 3 0.06 166 1.17
Social Obligations Goodbye 36 0.40 31 0.61 67 0.47
Management Greeting 347 3.82 47 0.92 394 2.78
Thanking 0 0.00 10 0.20 10 0.07
Total 546 6.01 91 1.79 637 4.49
Time Management Pausing 364 4.01 0 0.00 364 2.57
Total 364 4.01 0 0.00 364 2.57
Table 3: ISO 24617-2 communicative function distribution in the LEGO corpus.

The result of applying the mapping process to the LEGO corpus is presented in Table 3. We can see that communicative functions in the Task dimension are more predominant for user segments than for system segments, with 92.87% of the user segments having communicative functions in that dimension versus the 78.15% for system segments. While for system segments the most frequent functions are Check Question (24.84%), Set Question (21.88%), and Instruct (19.95%), for user segments those are replaced by the Answer (28.70%), Confirm (22.81%), and Disconfirm (21.69%) functions. This is coherent with the nature of the dialogs, which typically consists of the system questioning the user to obtain the required information. All segments considered, 83.44% have communicative functions in the Task dimension. Since there are more system segments, the most frequent functions are the same as when considering those segments only, but with lower impact – 15.92%, 15.50%, and 13.53%, respectively.

In terms of the feedback dimensions, 44.50% of the user segments have functions in the Allo-Feedback dimension and no functions in the Auto-Feedback dimension. For system segments, the values are reversed, with 41.99% of the segments having functions in the Auto-Feedback dimension and only 1.77% in the Allo-Feedback dimension. Once again, these values are coherent with the nature of the dialogs, since the system typically uses feedback functions to check whether it understood what the user said, while the user confirms or disconfirms that.

As for the remaining semantic dimensions, it is important to refer that it is difficult to find communicative functions in those dimensions by simply converting the original labels of the LEGO corpus. Thus, the identified functions are just a small part of all those that exist in the corpus and that could be found through an manual and exhaustive processing of each segment. Still, it is interesting to notice that at least 13.20% of the system segments contain communicative functions in the Discourse Structuring dimension. This reveals the rigid nature of the system’s utterances and its intention to structure the dialog according to a specific path.

Overall, by applying the mapping process described in this paper, 347 additional dialogs are annotated with the communicative functions of the ISO 24617-2 standard. Although communicative functions alone do not form a complete dialog act annotation according to the standard, the amount of generated data is important for dialog act research, especially for dialog act recognition experiments. This is particularly important given the reduced amount of existing data annotated according to the standard, which limits the conclusions that can be drawn from experiments using it.

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013, by Universidade de Lisboa, and by EU-IST FP7 project SpeDial under contract number 611396.

References

  • [1] Alexandersson, J., Buschbeck-Wolf, B., Fujinami, T., Kipp, M., Koch, S., Maier, E., Reithinger, N., Schmitz, B., Siegel, M.: Dialogue Acts in VERBMOBIL-2: Second Edition. Tech. rep., DFKI (1998)
  • [2] Allen, J., Core, M.: DAMSL: Dialogue Act Markup in Several Layers. Tech. Rep. Draft 2.1, Multiparty Discourse Group, Discourse Resource Initiative (1997)
  • [3] Allen, J., Schubert, L., Ferguson, G., Heeman, P., Hwang, C.H., Kato, T., Light, M., Martin, N., Miller, B., Poesio, M., Traum, D.: The TRAINS Project: A Case Study in Defining a Conversational Planning Agent. Tech. Rep. 532, Department of Computer Science, University of Rochester, New York (1994)
  • [4]

    Brester, C., Semenkin, E., Sidorov, M., Semenkina, O.: Multicriteria Neural Network Design in the Speech-Based Emotion Recognition Problem. In: Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO). vol. 1, pp. 621–628 (2015)

  • [5] Bunt, H., Alexandersson, J., Carletta, J., Choe, J.W., Fang, A.C.: Towards an ISO Standard for Dialogue Act Annotation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) (2010)
  • [6] Bunt, H., Alexandersson, J., Choe, J.W., Fang, A.C., Hasida, K., Petukhova, V., Popescu-Belis, A., Traum, D.: ISO 24617-2: A Semantically-Based Standard for Dialogue Annotation. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (2012)
  • [7] Bunt, H., Petukhova, V., Malchanau, A., Wijnhoven, K., Fang, A.: The DialogBank. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (2016)
  • [8] Carletta, J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., Anderson, A.: HCRC Dialogue Structure Coding Manual. Tech. rep., Human Communication Research Centre, University of Edinburgh (1996)
  • [9] Carroll, J.M., Tanenhaus, M.K.: Functional Clauses and Sentence Segmentation. Journal of Speech, Language, and Hearing Research 21(4), 793–808 (1978)
  • [10] Chorianopoulou, A., Palogiannidi, E., Iosif, E., Koutsakis, P., Georgiladakis, S., Trancoso, I., Batista, F., Moniz, H., Ribeiro, E., Abad, A., Lopes, J., Meena, R., Skantze, G., Gustafson, J.: SpeDial D2.1 Interim Report on IVR Analytics and Evaluation. Tech. Rep. 2.1, SpeDial Consortium (2015)
  • [11] Eugenio, B.D., Jordan, P.W., Pylkkänen, L.: The COCONUT project: Dialogue Annotation Manual. Tech. rep., University of Pittsburg (1998)
  • [12] Fang, A., Cao, J., Bunt, H., Liu, X.: The Annotation of the Switchboard Corpus with the New ISO Standard for Dialogue Act Analysis. In: Proceedings of ISA-8. pp. 13–18 (2012)
  • [13] Godfrey, J., Holliman, E., McDaniel, J.: SWITCHBOARD: Telephone Speech Corpus for Research and Development. In: IEEE International Conference on Speech, and Signal Processing, ICASSP’92. vol. 1, pp. 517–520 (1992)
  • [14]

    Griol, D., Iglesias, J.A., Ledezma, A., Sanchis, A.: A Two-Stage Combining Classifier Model for the Development of Adaptive Dialog Systems. International Journal of Neural Systems 26(1) (2016)

  • [15] Jurafsky, D., Shriberg, E., Biasca, D.: Switchboard SWBD-DAMSL shallow-discourse-function annotation coder manual. Tech. Rep. Draft 13, University of Colorado, Institute of Cognitive Science (1997)
  • [16] Kral, P., Cerisara, C.: Dialogue Act Recognition Approaches. Computing and Informatics 29(2), 227–250 (2010)
  • [17] Raux, A., Bohus, D., Langner, B., Black, A.W., Eskenazi, M.: Doing Research on a Deployed Spoken Dialogue System: One Year of Let’s Go! Experience. In: Proceedings of INTERSPEECH 2006. pp. 65–68 (2006)
  • [18] Schmitt, A., Ultes, S., Minker, W.: A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System. In: Chair), N.C.C., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation, LREC’12. European Language Resources Association (ELRA), Istanbul, Turkey (2012)
  • [19] Searle, J.R.: Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, London (1969)
  • [20]

    Sidorov, M., Brester, C., Semenkin, E., Minker, W.: Speaker State Recognition with Neural Network-Based Classification and Self-Adaptive Heuristic Feature Selection. In: Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO). vol. 1, pp. 699–703 (2014)

  • [21] Thompson, H.S., Anderson, A., Bard, E.G., Doherty-Sneddon, G., Newlands, A., Sotillo, C.: The HCRC Map Task Corpus: Natural Dialogue for Speech Recognition. In: Proceedings of the Workshop on Human Language Technology. pp. 25–30. HLT ’93 (1993)
  • [22] Ultes, S., Schmitt, A., Minker, W.: On Quality Ratings for Spoken Dialogue Systems-Experts vs. Users. In: Proceedings of HLT-NAACL. pp. 569–578 (2013)