Log In Sign Up

APIA: An Architecture for Policy-Aware Intentional Agents

by   John Meyer, et al.
Miami University

This paper introduces the APIA architecture for policy-aware intentional agents. These agents, acting in changing environments, are driven by intentions and yet abide by domain-relevant policies. This work leverages the AIA architecture for intention-driven intelligent agents by Blount, Gelfond, and Balduccini. It expands AIA with notions of policy compliance for authorization and obligation policies specified in the language AOPL by Gelfond and Lobo. APIA introduces various agent behavior modes, corresponding to different levels of adherence to policies. APIA reasoning tasks are reduced to computing answer sets using the Clingo solver and its Python API.


page 1

page 2

page 3

page 4


Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

In Multi-Agent Reinforcement Learning (MARL), specialized channels are o...

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks

This effort is focused on examining the behavior of reinforcement learni...

Policy Learning with Competing Agents

Decision makers often aim to learn a treatment assignment policy under a...

ESTRELA: Automated Policy Enforcement Across Remote APIs

Web applications routinely access sensitive and confidential data of use...

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently require...

Inference Policies

It is suggested that an AI inference system should reflect an inference ...

1 Introduction

This paper introduces ,111 stands for “Architecture for Policy-aware Intentional Agents.” an architecture for intentional agents that are aware of policies and abide by them. It leverages and bridges together research by Blount, Gelfond, and Balduccini [6, 7] on intention-driven agents ( 222 stands for “Architecture for Intentional Agents.”), and work by Gelfond and Lobo [13] on authorization and obligation policy languages ( 333 stands for “Authorization and Obligation Policy Language.”). Both and are expressed in action languages [12]

that have a seamless translation into logic programs

[11], and are implemented in Answer Set Programming (ASP) [15].

With the current rise in autonomous systems the question arises of how to best construct agents that are capable of acting in a changing environment in pursuit of their goals, while also abiding by domain-relevant policies. For instance, we would want a self-driving car to not only take us to our destination but also do so while respecting the law and cultural conventions. This work is a first step in this direction. As in Blount et al.’s work, we focus on agents:

  • whose environment (including actions and their effects) and mental states can be represented by a transition diagram. Physical properties of the environment and the agent’s mental states form the nodes of this transition diagram. Arcs from one state to another are labeled by actions that may cause these transitions.

  • who are capable of making correct observations, remembering the domain history, and correctly recording the results of their attempts to perform actions;

  • who are normally capable of observing the occurrences of exogenous actions; and

  • whose knowledge bases may contain some pre-computed plans for achieving certain goals, which we call activities, while agents can compute other plans on demand.

Additionally, these agents possess specifications of authorization and obligation policies and have access to reasoning mechanisms for evaluating the policy compliance of their actions.

Thus, in our work, we extend the architecture introduced by Blount et al. [6, 7] with the notion of policy compliance. agents are Belief-Desire-Intention (BDI) agents [17] who are driven by goals, have intentions that persist, and can operate with pre-computed activities. However, they are oblivious to the different nuances of authorization and obligation policies (e.g., actions whose execution is permitted vs. not permitted), and policy compliance. These intentional agents have only two behavioral modes: either ignore policies altogether (when policies are not represented by any means in the agent’s knowledge base) or blindly obey policies that disallow certain actions from being performed, even if this may be detrimental (when policies are represented as actions that are impossible to be executed). Instead, we introduce a wider range of possible behaviors that may be set by an agent’s controller (e.g., prefer plans that are certainly policy-compliant to others that are only possibly policy-compliant, but if no plans of the former type exist, accept plans of the latter kind).

In formalizing the different policy compliance behavior modes, we rely on the language [13] for specifying authorization and obligation policies, and priorities between such policies. While Gelfond and Lobo specify reasoning algorithms for determining the degree of policy compliance for a sequence of actions, this is done from the perspective of a third-person observer analyzing the actions of all agents after they already occurred. Instead, our work focuses on an agent making decisions about its own future actions. As a result, various courses of action may be compared to determine the most policy-compliant one according to the behavior mode set by the agent’s controller. Moreover, our research addresses interactions between authorization and obligation policies that were not discussed in the work on .

There have been several other attempts to enable agents to reason over various kinds of policies. However, these involve reasoning over access control policies only [8, 18, 2] and a few utilize ASP as a reasoning tool for this purpose [4, 5]. Access control policies are more restrictive than the kinds of policies an agent using can reason over. The work that is closest to our goal is the PDC-agent by Liao, Huang, and Gao [14]. The PDC architecture extends the BDI architecture with a policy and contract-aware methodology the authors call BGI-PDC logic. A PDC-agent is an event-driven multi-component framework which allows for controlled and coordinated behavior among independent cooperative agents. Liao et al. use policies to control agent behavior, and contracts as a mechanism to coordinate actions between agents. This architecture was later extended to support reasoning over social norms (the NPDC-agent) [16]. The PDC-agent architecture is defined as a 7-tuple of the following components: (Event Treating Engine, Belief Update, Contract Engine, Policy Engine, Goal Maintenance, Plan Engine, Plan Library) [14]. A major distinction of the PDC-agent agent architecture is that it supports coordination among multiple agents. This is beyond the scope of our work. Both and focus on an agent with individual goals. Expanding these architectures into multi-agent frameworks by introducing communication acts is still part of future work. However, knowledge about the changing environment is expressed in the PDC-agent architecture in terms of a Domain Conceptualization Language (DCL) [9] and a Concept Instance Pattern (CIP). While DCL and CIP can represent plans (which are analogous to activities in the architecture), there is no support for expressing the direct or indirect effects of an action. This is a disadvantage in comparison to action-language-based architectures since plans have to be pre-computed and the goals that they accomplish must be annotated according to the agent’s designer’s intuition. Since action languages only require a description of the effects of individual actions (and plans consisting of all combinations of actions can be automatically computed), there is significantly less work for a human designer when working with than the PDC-agent architecture.

Thus, our proposed architecture is, to the best of our knowledge, the only intentional agent architecture that is capable to model compliance with complex authorization and obligation policies, while allowing agents to come up with policy-compliant activities on the fly.

The major contributions presented in this paper work are as follows:

  1. Create a bridge between research on intentional agents and policy compliance, thus producing a policy-aware intentional agent architecture .

  2. Introduce various agent behavior modes with respect to compliance with authorization and obligation policies.

  3. Introduce mechanisms to check the consistency of a policy containing both authorization and obligation policies and reason over the interactions between these two.

  4. Implement in clingo (version 5.4.1)444 while leveraging clingo’s Python API.555 (As a by-product, was also updated to this clingo version).

2 Background

In this section, we briefly present the architecture and language, which form the two pillars of our work. We direct the unfamiliar reader to outside resources on Answer Set Programming [11, 15] and action language [10], which are also relevant to our research.

2.1 : Architecture for Intentional Agents

The Architecture for Intentional Agents, [7], builds upon the Observe-Think-Act control loop of the AAA architecture666AAA stands for “Autonomous Agent Architecture.” [3] and extends it in a couple of directions. First, adds the possibility for action failure: the agent attempts to perform an action during its control loop but may find that it is unable to do so. In this case, the action is deemed non-executable. Second, addresses a limitation of AAA in which plans are not persisted across iterations of the control loop. In , agents pursue goals by persisting in their intentions to execute action plans known to satisfy these goals (i.e., activities). Activities are represented via a set of statics [6, 7]:

where is a unique identifier for the activity; are the components of the activity; and is the goal that achieves. Some activities are pre-computed and stored in the agent’s knowledge base, while the rest can be generated on demand.

In addition to fluents and actions describing the agent’s environment, introduces mental fluents to keep track of the agent’s progress in the currently intended activity and towards its desired goal. Mental fluents are updated through mental actions. The Theory of Intentions is a collection of axioms that maintain an agent’s mental state. Elements of the agent’s mental state include the currently selected goal , stored in the inertial fluent, and the current planned activity, stored in the inertial fluent. When either a goal is selected or an activity is planned, they are said to be intended. Mental action initiates the agent’s intention to execute activity and terminates it. Though most actions are executed by the agent itself, some must be executed by the agent’s controller. The exogenous mental action causes the agent to intend to achieve goal while causes the agent to cease its intent to achieve goal .

An agent in the architecture performs the following loop:


2.2 : Authorization and Obligation Policies in Dynamic Systems

In real-world applications, an autonomous agent may be required to follow certain rules or ethical constraints, and may be penalized when acting in violation of them. Thus, it is necessary to discuss policies for agent behavior and a formalism with which agents can deduce the compliance of their actions. Gelfond and Lobo [13] introduce the Authorization and Obligation Policy Language for policy specification. An authorization policy is a set of conditions that denote whether an agent’s action is permitted or not. An obligation policy describes what an agent must do or must not do. works in conjunction with a dynamic system description written in an action language such as . An agent’s policy is the subset of the trajectories in the domain’s transition diagram that are desired by the agent’s controller.

Policies of are specified using predicates for authorization policies, for obligation policies, and static laws similar to those from action language :

where is an action; is a happening (i.e., an action or its negation777If is true, then the agent must not execute in the current state.); and is a, possibly empty, conjunction of fluents, actions, or their negations. In addition to these strict policy statements, supports defeasible statements and priorities between them as in:


Gelfond and Lobo define policy compliance separately for authorizations vs. obligations:

Definition 1 (Policy Compliance – adapted from Gelfond and Lobo [13])

Authorization: A set of actions occurring at a transition system state is strongly compliant with a policy if all are known to be permitted at using rules in . is non-compliant with if any of the actions are known to be not permitted at using rules in . Otherwise, if is unclear and does not specify whether an action is permitted or not at , then the set of actions is weakly compliant.

Obligation: A set of actions occurring at state is compliant with an obligation policy if whenever can be derived from the rules of and whenever can be derived from at . Otherwise, it is non-compliant.

Note that a set of actions can be strongly, weakly, or non-compliant with an authorization policy, but can only be compliant or non-compliant with an obligation policy. Given an policy (with authorization and obligation policy statements), is strongly compliant if it is strongly compliant with its authorization policy and compliant with its obligation policy. Likewise for weak compliance and non-compliance. Computing policy compliance is reduced to the problem of finding answer sets of a logic program obtained by translating the policy rules into ASP. In this translation, predicates and are extended to include an extra argument standing for the time step, while denotes the ASP transformation of , where can be a rule, an action literal, a fluent literal, or a condition.

does not discuss interactions between authorization and obligation policies on the same action, does not define compliance in terms of obligation policies for a trajectory in the dynamic system, and does not compare the degree of compliance of two trajectories. All of these aspects are relevant when modeling a policy-aware intentional agent and are addressed in our work.

3 Architecture

We can now introduce our architecture for policy-aware intentional agents. We focus on two main aspects of : reframing to fit an agent-centered architecture, and the encoding of different policy compliance behavior modes of an agent.

3.1 Re-envisioning Policies in an Agent-Centered Architecture

Gelfond and Lobo [13] conceived as a means to evaluate policy compliance in a dynamic system. This differs from in the following ways:

  • evaluates trajectories in a domain’s transition diagram from a global perspective whereas distinguishes between agent actions and exogenous actions, and chooses which agent actions to attempt next.

  • evaluates histories at “the end of the day” whereas the architecture, while still reasoning over past actions in its diagnosis mode, places an emphasis on planning future actions to achieve a future desired state.

These differences prevent policies from interoperating with the agent architecture out of the box. To address the first issue, we constrain policies to describe only agent actions in our architecture. For the second issue, we adjust our policy compliance rules such that only future actions affect policy compliance. Since our focus is on planning, in past actions are always considered “compliant” although they might not have been at the time. For an agent that previously had no choice but a non-compliant action, this allows the agent to conceive of “turning a new leaf” and seeking policy-compliant actions in the future.

Also, does not include specification on how authorization policy statements interact with obligation policy statements. For example, consider the following policy:

which is contradictory, since the agent is permitted to perform action but at the same time is obligated to refrain from it. Appealing to common sense, if an agent is obligated to refrain from an action, one would conclude that the action is not permitted. Likewise, it makes sense to say that, if an agent is obligated to do an action, then it must be permitted. Thus, we take these intuitions and create the following non-contradiction ASP axioms, in which we use literals and expanded to include a new argument representing the time step:


These enforce that, at the very least, the authorization and obligation policies do not contradict each other, while allowing for defeasible policies to work appropriately.

We also extend the translation of defeasible policy statements. Suppose we have the following:


Using Gelfond and Lobo’s approach [13], the corresponding ASP translation would be:

where and represent the logic programming encoding of and , respectively. Based on policy (4), both and would be true at a time step when is met. This violates the non-contradiction axioms in (3). So, we replace the translation of the defeasible statement in (4) with the following encoding:

This allows the presence of to be an exceptional case to the defeasible rule.

In general, we propose translating the different types of defeasible statements in (2) as follows, respectively, where predicate facilitates dealing with possible additional (weak) exceptions:

3.2 Policy-Aware Agent Behavior

The architecture, which is the underlying basis of , introduces mental fluents and actions in addition to physical ones. In , we additionally introduce policy fluents and actions needed to reason over policy compliance (see Table 1). The new policy action descriptions encode the effects of future agent actions on policy compliance and provide means for the control loop to deem non-compliant activities futile and execute compliant ones in their place.

Fluents Actions Inertial: Defined: , for every physical fluent     for every physical action
Table 1: List of Policy Fluents and Actions in the Architecture

For example, the dynamic causal laws in (5) define inertial policy fluents and according to the definitions for strong and weak authorization policy compliance of seen in Definition 1:


These rules are defined for every physical action of the transition system. Should an action occur where is not known to be true, then the scenario ceases to be strongly compliant (i.e., it becomes weakly compliant). Since cannot be made true again by any action, the rest of the scenario remains weakly compliant by inertia. Likewise, should an action occur where is false, then the scenario ceases to be weakly compliant (i.e., it becomes non-compliant) and remains in this state by inertia.

For every physical fluent , we introduce a new defined policy fluent . This allows us to reuse the control loop shown in (1) as is. When the agent controller wants to specify that the agent should achieve goal in a policy-compliant manner, the controller simply has to initiate action

instead of the original . The policy fluent is true iff is true and is true, for some minimum compliance threshold set by the controller. Thus, when is an agent’s goal, activities below -compliance are deemed as futile and the agent works to achieve fluent subject -compliance.

Authorization Policies and Agent Behavior. To allow for cases when the threshold set by the controller is not the maximum possible level of compliance or, in the future, cases when an agent deliberately chooses to act without -compliance, we add policy actions and , where is a physical action. By executing these actions concurrently with , our agent ignores ’s effect at that time step on weak compliance or non-compliance, respectively. This enables our agent to look for activities with a lower level of compliance if no activities that achieve are strongly-compliant. One can imagine that this capability can be used to model multiple agent behaviors, based on what the minimum and maximum requirements of adherence to their authorization policy are. We have parameterized the agent’s behavior as seen in Table 2, and introduced names for these possible agent behaviors.

Require weak Prefer weak over non-compl. Ok with non-compl.
Require strong Paranoid (Invalid) (Invalid)
Prefer strong over weak Cautious Best effort (Invalid)
Ok with weak Subordinate Subordinate when possible Utilitarian
Table 2: Authorization Policy Modes

One behavior mode is for the agent to strictly adhere to its authorization policy such that it never chooses to perform . This causes all non-compliant actions to indirectly cause to be false, if they are executed. Hence, only activities with weakly or strongly compliant actions are considered. Since this mode never dares to become non-compliant, it is called subordinate.

A similar behavior mode causes the agent to never perform . This causes all weak and non-compliant actions to indirectly cause to be false when executed. Hence, only activities with strongly compliant actions are considered. Since weakly compliant actions are actions for which the policy compliance is unknown, this mode is called paranoid as it treats weakly compliant actions as if they were non-compliant.

Another behavior mode allows unrestricted access to the two actions, and . This mode is called utilitarian because it reduces the behavior of to that of , where policies are not considered at all.

An interesting feature of the and actions is the ability to optimize compliance. Using preference statements in ASP, we can require the control loop to minimize the use of these two policy actions. Hence, if it is possible to execute an activity that is strongly compliant, the agent will prefer it over a weakly or non-compliant one (since the use of these actions is required to allow to be true). Under this condition, and are only used when it is impossible to achieve the fluent in a strongly or weakly compliant manner, respectively.

The combination of compliance optimization with the first three behavior modes allows for more possible configurations. For example, adding optimization to the subordinate option makes a cautious mode. In this mode, the agent will try to mimic the behavior of the paranoid mode (all strongly compliant actions), but ultimately it will reduce to subordinate (all weakly compliant actions) in the worst case. Likewise, adding optimization to the utilitarian mode adds two options: best effort and subordinate when possible. Best effort prefers strong compliance over weak compliance and weak compliance over non-compliance, but ultimately permits non-compliance when no better alternatives exist. Subordinate when possible prefers weak compliance over non-compliance but does not optimize from weak compliance to strong compliance.

A new feature of this approach to optimization is the ability to optimize within the weakly and non-compliant categories. Consider two weakly compliant activities, 1 and 2, where activity 1 has more weakly compliant actions than activity 2. Since weakly compliant actions do require a concurrent action, activity 1 will have more actions than activity 2. Hence, activity 2 will be preferred to activity 1, even though they both fall in the weakly compliant category. Gelfond and Lobo [13] do not consider such a feature.

Obligation Policies and Agent Behavior. So far we discussed authorization policies induced by an policy. To address obligation policies, we add policy fluents and with policy actions and , as seen in Table 1. (For configurability, we consider obligation policies to do actions and to refrain from actions separately). We extend the definition of to require both and to be true. Like with authorization compliance, if is true but action does not occur, then becomes false and remains false by inertia. Likewise for . If or are performed, then these effects on the fluents are temporarily waived.

Honor Prefer honoring Ignore
Honor Subordinate Permit commissions (Not reasonable)
Prefer honoring Permit omissions Best effort (Not reasonable)
Ignore (Not reasonable) (Not reasonable) Utilitarian
Table 3: Obligation Policy Modes

There are five different configurations (or behavior modes) an agent in the architecture can have regarding its obligation policy (see Table 3). When in subordinate mode, the agent will never use either and actions. Hence, all activities achieving will be compliant with both aspects of its obligation policy. When in best effort mode, the agent prefers using other actions over these policy actions. Hence, activities will be compliant if possible but may include non-compliant elements when no other goal-achieving activities exist. The permit omissions and permit commissions options are variations of these modes. Mode permit omissions is like best effort with regards to policy statements, but like subordinate with regards to policy statements. Likewise, permit commissions is like subordinate with regards to policy statements but like best effort regarding statements. Utilitarian mode, like with authorization policies, reduces the behavior of an agent with respect to its obligation policy to that of an agent.

Behavior Mode Configurations. An agent’s combined authorization and obligation policy configuration can be represented by a 2-tuple , where is the authorization mode and is the obligation mode. When an agent is running in mode , its behavior reduces to that of an agent (i.e., policy actions are not used in this mode). This is due to an optimization we provide internally.

For each configuration, we adjust the definition of such that excess policy actions are not required. For instance, in the case of an agent with a subordinate authorization mode, we adjust such that is never needed since such an agent always disregards strong compliance.

4 Examples

To demonstrate the operations of an agent in the architecture, we will introduce a series of examples that illustrate prototypical cases. For conciseness, we will focus on three configurations: (paranoid, subordinate), (best effort, best effort), and (utilitarian, utilitarian), and we limit ourselves to examples about authorization policies.

4.1 Example A: Fortunate case

To begin with a simple case, suppose that two people are in an office space that has four rooms with doors in between them. Room 1 is connected by door to Room 2. Room 2 is connected by door to Room 3 and so on. Door has a lock and is currently in the unlocked position. Suppose our agent, Alice, wants to greet another agent, Bob. This scenario is represented by a dynamic domain description that considers:

  • Fluents: for each door , , where person is greeted by person ; and

  • Actions: , , , , where is the person doing the action, is a door, and a person (the direct object of the action).

Assume that agent Alice is given a policy specifying that all actions are permitted along with the following pre-computed activity that is stored in her knowledge base as the set of facts:

Before the control loop begins, Alice observes that she is in Room 1, Bob is in Room 4, the door is unlocked, and that she has not yet greeted Bob.

At timestep 0, the first iteration of the control loop begins. In this first step, Alice analyzes her observations and interprets unexpected observations by assuming undetected exogenous actions occurred. None of her observations are unexpected, so no exogenous actions are assumed to occur. Alice then intends to wait at timestep 0. Alice attempts wait. Alice observes that her wait action was successful and that, in the meantime, the exogenous action

happened. The time step is incremented and Alice does not observe any fluents.

The second iteration of the control loop begins. Alice analyzes her observation of
and determines that
is true. Alice then starts planning to achieve and determines that she intends to start activity 1. Since each action in activity 1 is strongly compliant, no policy actions are needed.

The rest of the example is very straight forward and is almost identical to scenarios discussed by [6, 7] in the architecture.

4.2 Example B: Strong compliance degrades to weak compliance

Let us consider a less fortunate example, in which a strongly compliant activity becomes weakly compliant due to an unexpected environmental observation. In the same scenario, suppose we modify Alice’s policy from Example A such that regarding we have:


We also have new fluents and , and actions and . Let Alice’s knowledge base contain two additional activities, 2 and 3, with the same goal as 1 and defined by the sets of facts:

Alice observes that Bob is not busy working, in addition to the initial observations of Example A. At timestep 0, the first iteration of the control loop begins. During the second iteration of the control loop (at timestep 1), Alice plans to achieve the goal. Since she believes Bob is not busy working, activity 1 is still strongly compliant and so is activity 2. Alice chooses activity 1 over activity 2 because it requires a shorter sequence of actions. She then executes activity 1 like in Example A until she enters Room 3, at which points she observes that Bob is busy working. During the next iteration (at timestep 5), the agent interprets this observation by inferring that happened at the previous timestep (4).

As a result, activity 1 becomes weakly compliant. Since Bob is busy working but Alice has not knocked on the door, no policy statement describes whether our next action, , is compliant or not. If Alice is operating in (utilitarian, utilitarian) mode, she continues the execution of activity 1 and greets Bob anyway. (This happens without the use of policy actions due to our internal optimizations). Otherwise, Alice will stop the activity and then either refuse to plan another weakly compliant activity or use a concurrent policy action to dismiss this event.

If our agent is running in (paranoid, subordinate), Alice will refuse to execute a weakly compliant activity. Through planning, Alice will discover that a new activity that includes knocking at the door is strongly compliant (e.g. activity 3) and begin its execution. If our agent is running in (best effort, best effort), she will behave likewise because activity 3 is strongly compliant. The difference is that, if there did not exist a strongly compliant activity, she would plan a new activity that involved a policy action and greeted Bob anyway. Alice knocks on door at timestep 7, greets Bob at timestep 8, and stops activity 3 at timestep 9.

4.3 Example C: Compliance degrades to non-compliant

Suppose we take policy rule (6), make it defeasible, and add this rule :

Now, let us imagine that Bob is Alice’s supervisor. Similar to Example B, our agent executes activity 1 until the observation that Bob is busy working. This time, we have a strict authorization statement forbidding greeting Bob since he is Alice’s supervisor. Under the (utilitarian, utilitarian) option, we proceed on with activity 1 anyway. With the (paranoid, subordinate) option, our agent stops activity 1 but cannot construct a new activity that achieves the goal subject to its policy. Hence, the goal is futile and the agent waits until its environment changes such that a strongly compliant activity exists. Under the (best effort, best effort) option however, our agent constructs a new activity that contains greets Bob anyway. The activity contains: , , and .

4.4 Example D: Hierarchy of contradictory defeasible statements

Further extending Example C, suppose we turn all policy statements from Example A into defeasible ones (i.e., all actions are normally permitted) and add another policy statement:

and the static with as a fact. Since we have two contradictory defeasible statements, we need to add a preference between the two (without a preference our agent can non-deterministically choose between which of the two rules to apply). If we add:

then, when Alice observes that Bob is not busy working at the beginning of the scenario, an agent running in (paranoid, subordinate) mode will immediately consider the goal to be futile. Unlike in Example C, our agent knows this immediately because is a static, not an unexpected observation. If our agent is running in (best effort, best effort) mode, it creates an activity like activity 1, except that it contains and . Our utilitarian agent, like always, completely ignores our policy and executes activity 1.

5 Implementation

In this section, we discuss two important implementation aspects: the refactoring of the implementation including its Theory of Intentions and control loop, and the implementation of the control loop using clingo’s Python API.

5.1 Theory of Intentions and Control Loop

Since takes as a basis, we first update Blount et al.’s [7] implementation such that it requires a state-of-the-art solver: clingo (version 5.4.1).888 Our updated implementation is available at

and is released under the MIT open-source license.

For this purpose, we re-implement the logic program in ASP using only the description of the architecture presented by Blount et al. [6, 7]. During this process, we make minor modifications to as a whole. First, we refactor the arrangement of ASP rules into multiple files according to their purpose in the architecture (e.g. whether they are part of the Theory of Intentions, ’s rules for computing models of history, or the intended action rules). Second, we refactor the names of mental fluents in the Theory of Intentions so that their names are more descriptive and self-documenting. Thirdly, we extensively add inline comments to each ASP rule with reference quotations and page numbers from Blount et al.’s work. Lastly, we make minor corrections to ASP rules to match the translation of particular scenarios (i.e., histories) with the mathematical definitions proposed by Blount et al..

In addition to upgrading the logic program, we also refactor the implementation of the control loop. In his dissertation, Blount [6] introduced the Agent Manager. This is an interactive Java program that allows an end-user to assign values to agent observations in a graphical interface for each control loop iteration. Since this requires manual input, it does not easily lend itself to automation and reproducibility of execution, which are required for performance benchmarking. Furthermore, the Agent Manager is structured around interacting with an underlying solver using subprocesses and process pipes. While the Agent Manager could conceivably invoke clingo as a subprocess, clingo 5 provides a unique opportunity for more advanced integrations using its Python API.

Because of these two points, we replace the Agent Manager with a new implementation of the control loop written in Python 3.9.0. This new implementation uses a command-line interface and allows for reproducible execution through ASP input files. Since this control loop is also the basis for our implementation, we will discuss it more in the next subsection.

5.2 Python Component

We provide an implementation of the control loop for the architecture.999 Our implementation is available at and is released under the MIT open-source license. The control loop is implemented using Python 3.9.0 and clingo 5.4.1 using clingo’s Python API. We provide two modes: an automatic mode and a manual mode. The automatic mode is intended to be used for normal execution while the manual mode is intended to aid in debugging unexpected output in answer sets. The automatic mode uses a command-line interface to specify the ASP files of the input domain, the observations of the agent, and the policy compliance mode the agent should use. The control loop then provides human-readable output as to what happens at each control loop step (see Figure 1 in the appendix).

In the case of unexpected output, the manual mode allows one to examine the answer set at each step of the control loop. It also provides scripts to highlight differences between answer sets of different timesteps in a visual manner and to step through the control loop like one would do in a traditional debugger. Additionally, manual mode addresses certain violations of and underlying assumptions. For example, it generates an invalid predicate when there exists an action that is neither a physical, mental, or policy action. Likewise when an action is neither an agent action nor an exogenous action. In addition, it generates an invalid predicate when an policy statement describes an object that is not declared as an action. These rules have been very useful in debugging the implementation of the architecture and they will aid future end-users who encode and execute scenarios using this architecture. Since these rules are intended during debugging, they are not executed during the automatic mode.

6 Conclusions and Future Work

In this paper, we created an architecture for a policy-aware intentional agent by bridging together previous work on intentional agents [6, 7] and reasoning algorithms for authorization and obligation policies [13]. A main difficulty was adapting the work on policy compliance so that it would be relevant for an agent deciding on which course of actions to take. While Gelfond and Lobo’s work could determine whether a trajectory (i.e., sequence of actions) was strongly compliant, weakly compliant, or non-compliant, we introduced a wider range of agent behavior modes, which additionally explore the interactions between authorization and obligation policies.

This work can be further expanded by refining the decision making process in the planning phase of by introducing a relative ranking system between activities that would achieve the same goal, based on the number of actions that are strongly, weakly, or non-compliant. Moreover, it would be interesting to allow the agent’s controller to switch behavior modes while the agent is active, in the middle of executing an activity.


  • [1]
  • [2] Sandra Alves & Maribel Fernandez (2017): A graph-based framework for the analysis of access control policies. Theoretical Computer Science 685, pp. 3–22, doi:10.1016/j.tcs.2016.10.018.
  • [3] Marcello Balduccini & Michael Gelfond (2008): The AAA Architecture: An Overview. In: Architectures for Intelligent Theory-Based Agents, Papers from the 2008 AAAI Spring Symposium, 2008, AAAI Press, pp. 1–6. Available at
  • [4] Steve Barker (2012): Logical Approaches to Authorization Policies. In Alexander Artikis, Robert Craven, Nihan Kesim Cicekli, Babak Sadighi & Kostas Stathis, editors: Logic Programs, Norms and Action - Essays in Honor of Marek J. Sergot on the Occasion of His 60th Birthday, Lecture Notes in Computer Science 7360, Springer, pp. 349–373, doi:10.1007/978-3-642-29414-3_19.
  • [5] Steve Barker, Guido Boella, Dov Gabbay & Valerio Genovese (2014): Reasoning about delegation and revocation schemes in answer set programming. Journal of Logic and Computation 24(1), pp. 89–116, doi:10.1093/logcom/exs014. Publisher: Oxford Academic.
  • [6] Justin Blount (2013): An architecture for intentional agents. Ph.D. thesis, Texas Tech University.
  • [7] Justin Lane Blount, Michael Gelfond & Marcello Balduccini (2014): Towards a Theory of Intentional Agents. In: 2014 AAAI Spring Symposium Series, pp. 10–17. Available at
  • [8] David Ferraiolo, Janet Cugini & D Richard Kuhn (1995): Role-based access control (RBAC): Features and motivations. In: Proceedings of the 11th Annual Computer Security Applications Conference, pp. 241–48. Available at
  • [9] Ji Gao, Cheng-Xiang Yuan & Jing Wang (2005): SASA5: A Method System for Supporting Agent Social Activities. Chinese Journal of Computers 28(5), pp. 838–848. Available at
  • [10] Michael Gelfond & Yulia Kahl (2014): Knowledge Representation, Reasoning, and the Design of Intelligent Agents. Cambridge University Press, doi:10.1017/CBO9781139342124.
  • [11] Michael Gelfond & Vladimir Lifschitz (1991): Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing 9(3/4), pp. 365–386, doi:10.1007/BF03037169.
  • [12] Michael Gelfond & Vladimir Lifschitz (1998): Action languages. Electronic Transactions on AI 3(16), pp. 193–210. Available at
  • [13] Michael Gelfond & Jorge Lobo (2008): Authorization and Obligation Policies in Dynamic Systems. In Maria Garcia de la Banda & Enrico Pontelli, editors: Logic Programming, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, pp. 22–36, doi:10.1007/978-3-540-89982-2_7.
  • [14] Bei-shui Liao, Hua-xin Huang & Ji Gao (2006): An Extended BDI Agent with Policies and Contracts. In Zhong-Zhi Shi & Ramakoti Sadananda, editors: Agent Computing and Multi-Agent Systems, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, pp. 94–104, doi:10.1007/11802372_12.
  • [15] Victor W. Marek & Miroslaw Truszczynski (1999): Stable Models and an Alternative Logic Programming Paradigm. In Krzysztof R. Apt, Victor W. Marek, Mirek Truszczynski & David Scott Warren, editors: The Logic Programming Paradigm - A 25-Year Perspective

    , Artificial Intelligence, Springer, pp. 375–398, doi:

  • [16] Yan-Bin Peng, Ji Gao, Jie-Qin Ai, Cun-Hao Wang & Hang Guo (2008): An Extended Agent BDI Model with Norms, Policies and Contracts. In: 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1–4, doi:10.1109/WiCom.2008.1197. ISSN: 2161-9654.
  • [17] Anand S. Rao & Michael P. Georgeff (1991): Modeling Rational Agents within a BDI-Architecture. In: Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning (KR’91). Cambridge, MA, USA, April 22-25, 1991., pp. 473–484. Available at
  • [18] Khair Eddin Sabri & Nadim Obeid (2016): A temporal defeasible logic for handling access control policies. Applied Intelligence 44(1), pp. 30–42, doi:10.1007/s10489-015-0692-8.

Appendix A Output

Figure 1: Automatic execution of Example A using configuration