An ASP Methodology for Understanding Narratives about Stereotypical Activities

by   Daniela Inclezan, et al.
Drexel University
Miami University

We describe an application of Answer Set Programming to the understanding of narratives about stereotypical activities, demonstrated via question answering. Substantial work in this direction was done by Erik Mueller, who modeled stereotypical activities as scripts. His systems were able to understand a good number of narratives, but could not process texts describing exceptional scenarios. We propose addressing this problem by using a theory of intentions developed by Blount, Gelfond, and Balduccini. We present a methodology in which we substitute scripts by activities (i.e., hierarchical plans associated with goals) and employ the concept of an intentional agent to reason about both normal and exceptional scenarios. We exemplify the application of this methodology by answering questions about a number of restaurant stories. This paper is under consideration for acceptance in TPLP.



page 1

page 2

page 3

page 4


An Application of ASP Theories of Intentions to Understanding Restaurant Scenarios: Insights and Narrative Corpus

This paper presents a practical application of Answer Set Programming to...

An ASP-based Approach to Answering Natural Language Questions for Texts

An approach based on answer set programming (ASP) is proposed in this pa...

Knowledge-based Embodied Question Answering

In this paper, we propose a novel Knowledge-based Embodied Question Answ...

An ASP-based Solution to the Chemotherapy Treatment Scheduling problem

The problem of scheduling chemotherapy treatments in oncology clinics is...

Encoding Higher Level Extensions of Petri Nets in Answer Set Programming

Answering realistic questions about biological systems and pathways simi...

SQuARE: Semantics-based Question Answering and Reasoning Engine

Understanding the meaning of a text is a fundamental challenge of natura...

A Neuro-Symbolic ASP Pipeline for Visual Question Answering

We present a neuro-symbolic visual question answering (VQA) pipeline for...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper describes an application of Answer Set Programming to the understanding of narratives. According to Schank and Abelson sa77, stories frequently narrate episodes related to stereotypical activitiessequences of actions normally performed in a certain order by one or more actors, according to cultural conventions. One example of a stereotypical activity is dining in a restaurant with table service, in which the following actions are expected to occur: the customer enters, he is greeted by the waiter who leads him to a table, the customer sits down, reads the menu, orders some dish, the waiter brings the dish, the customer eats and then asks for the bill, the waiter places the bill on the table, the customer pays and then leaves. A story mentioning a stereotypical activity is not required to state explicitly all of the actions that are part of it, as it is assumed that the reader is capable of filling in the blanks with his own commonsense knowledge about the activity sa77. Consider, for instance, the following narrative:

Example 1 (Scenario 1, adapted from m07)

Nicole went to a vegetarian restaurant. She ordered lentil soup. The waitress set the soup in the middle of the table. Nicole enjoyed the soup. She left the restaurant.

Norms indicate, for instance, that customers are expected to pay for their meal. Readers are supposed to know such conventions, and so this information is missing from the text.

Schank and Abelson sa77 introduced the concept of a script to model stereotypical activities: a fixed sequence of actions that are always executed in a specific order. Following these ideas, Mueller conducted substantial work on texts about stereotypical activities: news about terrorist incidents m04 and restaurant stories m07. In the latter he developed a system that took as an input a restaurant story, processed it using information extraction techniques, and used a commonsense knowledge base about the restaurant domain to demonstrate an understanding of the narrative by answering questions whose answers were not necessarily stated in the text. The system performed well but the rigidity of scripts did not allow for the correct processing of scenarios describing exceptions (e.g., someone else paying for the customer’s meal). To be able to handle such scenarios, all possible exceptions of a script would have to be foreseen and encoded as new scripts by the designer of the knowledge base, which is an important hurdle.

In this paper, we propose a new representation methodology and reasoning approach, which makes it possible to answer, in both normal and exception scenarios, questions about events that did or did not take place. We overcome limitations in Mueller’s work by abandoning the rigid script-based approach. To the best of our knowledge, ours is the first scalable approach to the understanding of exceptional scenarios.

Instead, we propose to view characters in stories about stereotypical activities (e.g., the customer, waiter, and cook in a restaurant scenario), as BDI agents that intend to perform some actions in order to achieve certain goals, but may not always need to/ be able to perform these actions as soon as intended. It is instrumental for our purpose to use a theory of intentions developed by Blount et al. bgb15 that introduces the concept of an activity — a sequence of agent actions and sub-activities that are supposed to achieve a goal. The theory of intentions is written in an action language () and is translatable into Answer Set Prolog (ASP) gl91. It can be easily coupled with ASP commonsense knowledge bases about actions and their effects, and with reasoning algorithms encoded in ASP or its extensions, to build executable systems. Other implementations of BDI agents exist. Some cannot be immediately integrated in executable systems rg91; it remains to be seen whether others bhw07 may be more readily integrated in our methodology.

Blount et al. also introduced an architecture () of an intentional agent, an agent that obeys his intentions. According to , at each time step, the agent observes the world, explains observations incompatible with its expectations (diagnosis), and determines what action to execute next (planning). The reasoning module implementing allows an agent to reason about a wide variety of scenarios, including the serendipitous achievement of its goal by exogenous actions or the realization that an active activity has no chance to achieve its goal anymore (i.e., it is futile), illustrated by the texts below.

Example 2 (Serendipity)

Nicole went to a vegetarian restaurant. She ordered lentil soup. When the waitress brought her the soup, she told her that it was on the house. Nicole enjoyed the soup and then left. (The reader should understand that Nicole did not pay for the soup.)

Example 3 (Detecting Futile Activity)

Nicole went to a vegetarian restaurant. She sat down and wanted to order lentil soup, but it was not on the menu. (The reader should deduce that Nicole stopped her plan of eating lentil soup here.)

Example 4 (Diagnosis)

Nicole went to a vegetarian restaurant. She ordered lentil soup. The waitress brought her a miso soup instead. (The reader is supposed to produce some explanations for what may have gone wrong: either the waitress or the cook misunderstood the order.)

In contrast with , which encodes an agent’s reasoning process about his own goals, intentions, and ways to achieve them, we need to represent the reasoning process of a (cautious) reader that learns about the actions of an intentional agent from a narrative. For instance, while an intelligent agent creates or selects its own activity to achieve a goal, in a narrative context, the reader learns about the activity that was selected by the agent from the text. As a consequence, one of our goals in this work is to understand what parts of the architecture can be adapted to our purpose of modeling the reasoning of a narrative reader and how. The main difficulty is that stereotypical activities normally include several agents (e.g., customer, waiter, cook), not just one. We had to extend Blount et al.’s theory to be able to track the intentions of several agents at a time.

Our methodology can be applied to narratives about other stereotypical activities. This is just a first exploration of the subject,111An extended abstract on this work was previously published by Inclezan et al. izbi17; in previous work by Zhang and Inclezan zi17, only the customer was modeled as a goal-driven agent. which can be further expanded by addressing, for instance, script variations. We propose a prototypical implementation of the end-to-end system in which understanding is tested via question answering. We convert questions to logic forms to closely match their meanings in English and encode corresponding rules in ASP to retrieve answers. Mueller’s collection of training and test excerpts is proprietary, and creating a benchmark of exceptional scenarios is a laborious task. As a result, we evaluate our work on a smaller collection of texts collected from the Internet.

In what follows, the paper provides a review of related and background work. It then continues with the description of the methodology and a preliminary implementation, and their exemplification on sample scenarios. It ends with conclusions and future work.

2 Related Work

Story Understanding. An extensive review of narrative processing systems can be found in Mueller’s paper m07. Newer systems exist, for example the logic-based systems discussed by Michael lm13 or Diakidoy et al. dkmm15, but do not focus specifically on stereotypical activities. The task we undertake here is a more difficult one because stories about stereotypical activities tend to omit more information about the events taking place compared to other texts, as such information is expected to be filled in by the reader.

Restaurant Narratives. Erik Mueller’s work is based on the hypothesis that readers of a text understand it by constructing a mental model of the narrative. Mueller’s system m07 showed an understanding of restaurant narratives by answering questions about time and space aspects that were not necessarily mentioned explicitly in the text. His system relied on two important pieces of background knowledge: (1) a commonsense knowledge base about actions occurring in a restaurant, their effects and preconditions, encoded in Event Calculus s97 and (2) a script describing a sequence of actions performed by different characters in a normal unfolding of a restaurant episode. The system processed English text using information extraction techniques in order to fill out slot values in a template. In particular, it detected the last action from the restaurant script that was mentioned in the text, and constructed a logic form in which the occurrence of all actions in the script up to that last one mentioned was assumed and recorded via facts. Clearly, this approach cannot be applied to exceptional cases. For instance, the last script action identifiable in the scenario in Example 2 is that Nicole left the restaurant. As a result, the reasoning problem constructed by Mueller’s system for this excerpt would state as a fact that Nicole also executed a preceding action in the script, that of paying for her meal, which would be incorrect. Mueller’s system was evaluated on text excerpts retrieved from the web or Project Gutenberg collection. Scenarios with exceptional cases were not processed correctly because of a lack of flexibility of scripts.

Activity Recognition. The task we undertake here presents some similarities to activity recognition, in that it requires observing agents and their environment in order to complete the picture about the agents’ actions and activities. However, unlike activity recognition, understanding narratives limited to a single stereotypical activity (restaurant dining) does not require identifying agents’ goals, which are always the same for each role in our case (e.g., the customer always wants to become satiated). Gabaldon Gabaldon09 performed activity recognition using a simpler theory of intentions by Baral and Gelfond bg05i that did not consider goal-driven agents. Nieves et al. ngl13 proposed an argumentation-based approach for activity recognition, applied to activities defined as pairs of a motive and a set of goal-directed actions; in contrast, in Blount et al.’s work, basic actions in an activity may, but are not required to, have an associated goal. A few decades earlier, Ng and Mooney nm92 used abduction to create a plan recognition system and tested it on a collection of short narratives that included restaurant dining. However, their system cannot reason about serendipitous achievement of an agent’s goals by someone else’s actions (Example 2), nor answer questions about an agent’s intentions.

3 Preliminary: Theory of Intentions

Blount et al. thesisblount13; bgb15 developed a theory about the intentions of a goal-driven agent by substantially elaborating on previous work by Baral and Gelfond bg05i. In their theory, each sequence of actions (i.e., plan) of an agent was associated with a goal that it was meant to achieve, and the combination of the two was called an activity. Activities could have nested sub-activities, and were encoded using the predicates: (m is an activity); (the goal of activity m is g); (the length of activity m is n); and (the component of activity m is x, where x is either an action or a sub-activity).

The authors introduced the concept of an intentional agent222 There are similarities between Blount et al.’s intentional agents and BDI commitment agents rg91; wm09. To the best of our knowledge, they have not yet been studied precisely. However, a link can be drawn, at the intuitive level, as follows. If we consider the perspective that ASP formalizations are typically focused on the beliefs of an agent about its environment, an intentional agent can be viewed as an open-minded commitment agent. However, if the ASP formalization reflects accurately the physical reality, then it can be viewed as a single-minded commitment agent. — one that has goals that it intends to pursue, “only attempts to perform those actions that are intended and does so without delay.” As normally done in our field, the agent is expected to possess knowledge about the changing world around it. This can be represented as a transition diagram in which nodes denote physical states of the world and arcs are labeled by physically executable actions that may take the world from one state to the other. States describe the values of relevant properties of the world, where properties are divided into fluents (those that can be changed by actions) and statics (those that cannot). To accommodate intentions and decisions of an intentional agent, Blount et al. expanded the traditional transition diagram with mental fluents and actions. Three important mental fluents in their theory are (m is in progress if ; not yet started or stopped if ), g (“goal g is active”), and (“the next physical action to be executed as part of activity m is a”). Axioms describe how the execution of physical actions affects the status of activities and sub-activities, activates (or inactivates) goals and sub-goals, and determines the selection of the next action to be executed (see thesisblount13 for a complete list of axioms). Mental actions include and for goals, and and for activities. The new transition diagram is encoded in action language ; in what follows, we denote by the ASP translation of the encoding.

Additionally, Blount et al. developed an agent architecture for an intentional agent, implemented in CR-Prolog bg03a; b07 – an extension of ASP. Blount thesisblount13 adapted the agent loop proposed by Balduccini and Gelfond bg08 and outlined the control loop that governs the behavior of an intentional agent, which we reproduce in Figure 1.

Observe the world and initialize history with observations; [leftmargin=*,noitemsep,topsep=0pt] interpret observations; find an intended action ; attempt to perform and update history with a record of the attempt; observe the world, update history with observations, and go to step 1.

Figure 1: control loop

For each step of the control loop, we provide a summary of the original description (see pages 43-44 of thesisblount13). In step 1, the agent uses diagnostic reasoning to explain unexpected observations, which involves determining which exogenous (i.e., non-agent) actions may have occurred without being observed. From the point of view of our approach, step 2 is arguably one of the most critical. The goal of this step is to allow the agent to find an intended action. The following intended actions are considered:

  • [noitemsep,topsep=0pt]

  • To continue executing an ongoing activity that is expected to achieve its goal;

  • To stop an ongoing activity whose goal is no longer active (because it has been either achieved, as in Example 2, or abandoned);

  • To stop an activity that is no longer expected to achieve its goal (as in Example 3); or

  • To start a chosen activity that is expected to achieve its goal.

Under certain conditions, there may be no way for the agent to achieve its goal, or the agent may simply have no goal. In either case, the agent’s intended action is to wait. For the case when the agent continues executing an ongoing activity, the fluent in the theory of intentions becomes relevant as it indicates the action in activity m that the agent would have to attempt next. In step 3, the agent acts and records its attempt to perform the intended action. In the final step 4, the agent observes the values of fluents, the result of his attempt to act from step 3, and possibly the occurrence of some exogenous actions.

Restaurant stories require reasoning simultaneously about the intentions of multiple goal-driven agents. To accommodate for this, we added an extra argument ag to associate an agent to each mental fluent and action of (e.g., became ). We also extended by the ASP axioms below, needed to make explicit Blount et al.’s assumption that an agent has only one top-level goal at a time. This restriction is important when modeling an external observer and was not fully captured previously by . The first two axioms say that an agent cannot select a goal if it already has an active goal or if it selects another goal at the same time. The third rule says that the stopping of an activity inactivates the goals of all of its sub-activities.

4 Methodology

In this section, we outline a methodology of using the theory of intentions and parts of the architecture to design a program that can show an understanding of stories about stereotypical activities, exemplified on restaurant stories. We distinguish between the story time line containing strictly the events mentioned in the text, and the reasoning time line corresponding to the mental model that the reader constructs. We begin with assumptions and the general methodology, on which we elaborate in the next subsections.

Assumptions. We assume that a wide coverage commonsense knowledge base () written in ASP is available to us and that it contains information about a large number of actions, their effects and preconditions, including actions in the stereotypical activity. How to actually build such a knowledge base is a difficult research question, but it is orthogonal to our goal. To see the first necessary steps for building such a knowledge base see Diakidoy et al. dkmm15. In practice, in order to be able to evaluate our methodology, we have built a basic knowledge base with core information about restaurants and, whenever a scenario needed new information, we expanded the knowledge base with new actions and fluents. We operated under the assumption that all this information would be in from the beginning. To simplify this first attempt to use a theory of intentions to reason about stereotypical activities, we assumed that there is only one customer that wants to dine, only one waiter, one cook, and one ordered dish.

Methodology. According to our methodology, for each input text and set of questions

, we construct a logic program

(simply if is empty). Its answer sets represent models of the narrative and answers to questions in . This logic program has two parts, one that is pre-defined, and another that depends on the input.

The pre-defined part consists of the following items:

[leftmargin=*,noitemsep,topsep=0pt] The knowledge base, with a core describing sorts, fluents, actions, and some pre-defined objects relevant to the stereotypical activity of focus; The ASP theory of intentions, ; A module encoding stereotypical activities as activities for each character; and A reasoning module, encoding (i) a mapping of time points on the story time line into points on the reasoning time line; (ii) reasoning components adapted from the architecture to reflect a reader’s reasoning process and expected to allow reasoning about serendipitous achievement of goals, decisions to stop futile activities, and diagnosis; and (iii) a question answering component.

The input-dependent part (i.e., the logic form obtained by translating the English text and questions in into ASP facts) consists of the following:

[leftmargin=*,noitemsep,topsep=0pt] Facts defining objects mentioned in the text as instances of relevant sorts in ; Observations about the values of fluents and the occurrences of actions at different points on the story time line; Default information about the values of fluents in the initial situation; and Facts encoding each question in .

4.1 The Core of the Commonsense Knowledge Base

The core of defines knowledge related to the restaurant environment. It includes a hierarchy of sorts with main sorts person, thing, restaurant, and location; person has sub-sorts customer, waiter, and cook; and thing has sub-sorts food, menu, and bill. In this paper, the following pre-defined instances of sorts are used: entrance, kt (kitchen), ct (counter), outside, t (table) are instances of location; m is a menu; and b is the customer’s bill. The core describes actions and fluents related to the restaurant environment that can be seen in Table 4.1, in which t denotes the table and we use c for a customer; w for a waiter; ck for a cook; f for a food; r for a restaurant; t1 and t2 for things; l, l1 and l2 for locations; p, p1, and p2 for persons. We denote the agent performing each action a by using the static . Each action has a unique actor, except in which both w and c (waiter and customer) are considered actors. All fluents are inertial (i.e., they normally maintain their previous values unless changed by an action), except the five on the last column that are defined-positive fluents, i.e., their positive value is completely defined in terms of other fluents; otherwise their default value is false.

Table 1: Important actions and fluents in the restaurant-related core of