Representing and reasoning in multi-agent domains are two of the most active research areas in multi-agent system (MAS) research. The literature in this area is extensive, and it provides a plethora of logics for representing and reasoning about various aspects of MAS domains, e.g., [20, 14, 24, 22, 12].
A large number of the logics proposed in the literature have been designed to specifically focus on particular aspects of the problem of modeling MAS, often justified by a specific application scenario. This makes them suitable to address specific subsets of the general features required to model real-world MAS domains. The task of generalizing some of these existing proposals to create a uniform and comprehensive framework for modeling several different aspects of MAS domains is an open problem. Although we do not dispute the possibility of extending several of these existing proposals in various directions, the task does not seem easy. Similarly, a variety of multi-agent programming platforms have been proposed, mostly in the style of multi-agent programming languages, like Jason , ConGolog , 3APL , GOAL , but with limited planning capabilities.
Our effort in this paper is focused on the development of a novel action language for multi-agent systems. The foundations of this effort can be found in the action language ; this is a flexible single-agent action language, which generalizes the action language  with support for multi-valued fluents, non-Markovian domains, and constraint-based formulations—enabling, for example, the formulation of costs and preferences. has been implemented in CLP().
In this work, we extend to support MAS domains. The perspective is that of a distributed environment, with agents pursuing individual goals but capable of interacting through shared knowledge and through collaborative actions. A first step in this direction has been described in the language , a multi-agent action language with capabilities for centralized planning. In this paper, we expand on this by moving towards a truly distributed multi-agent platform. The language is extended with Communication primitives for modeling interactions among Autonomous Agents. We refer to this language simply as . Differently from , agents in the framework proposed in this paper have private goals and are capable of developing independent plans. Agents’ plans are composed in a distributed fashion, leading to replanning and/or introduction of communication activities to enable a consistent global execution.
The design of is validated in a prototype, available from http://www.dimi.uniud.it/dovier/BAAC, that uses CLP( for the development of the individual plans of each agent and Linda for the coordination and interaction among them.
2 Syntax of the Multiagent Language
The signature of consists of:
A set of agent names, used to identify the agents in the system;
A set of action names;
A set of fluent names—i.e., predicates describing properties of objects in the world, and providing description of states of the world; such properties might be affected by the execution of actions; and
A set of values for the fluents in —we assume .
The behavior of each agent is specified by an action description theory , composed of axioms of the forms described next.
Considering the action theory of an agent , name and priority of the agent are specified by agent declarations:
where . We adopt the convention that denotes the highest priority—which also represents the default priority in absence of a declaration. As we will see, priorities can be used to resolve possible conflicts among actions of different agents.
It is possible to specify which agents are known to the agent , as follows:
Agent can explicitly communicate with any of the agents , as discussed below.
We assume the existence of a “global” set of fluents, and any agent knows and can access only those fluents that are declared in by axioms of the form:
with , , and is a set of values representing the admissible values for each (possibly represented as an interval ). These fluents describe the “local state” of the agent. We assume that the fluents accessed by multiple agents are defined consistently in each agent’s local theory.
Let us specify a domain inspired by volleyball. There are two teams: black and white, with one player in each team; let us focus on the domain for the white team (Sect. 3.8 deals with the case that involves more players). We introduce fluents to model the positions of the players and of the ball, the possession of the ball, the score, and a numerical fluent defense_time. All players know the positions of all players. Since the teams are separated by the net, the x-coordinates of a black and white players must differ. This can be stated by:
agent player(white,X) :- num(X). known_agents player(black,X) :- num(X). fluent x(player(white,X)) valued [B,E] :- num(X), net(NET),B is NET+1, linex(E). fluent x(player(black,X)) valued [1,E] :- num(X), net(NET),E is NET-1. fluent y(A) valued [1,MY] :- player(A), liney(MY). fluent x(ball) valued [1,MX] :- linex(MX). fluent y(ball) valued [1,MY] :- liney(MY). fluent hasball(A) valued [0,1] :- agent(A). fluent point(T) valued [0,1] :- team(T). fluent defense_time valued [0,1]. team(black). team(white). num(1). linex(11). net(6). liney(5).
where linex/liney are the field sizes, and net is the x-coordinate of the net.
Fluents are used in Fluent Expressions (FE), which are defined as follows:
where , , , , and . FE is referred to as a timeless expression if it contains no occurrences of with . can be used as a shorthand of . The notation is an annotated fluent expression. The expression refers to a relative time reference, indicating the value had steps in the past. The last alternative in (4), a reified expression, requires the notion of constraint C, introduced below. represents a Boolean value that reflects the truth value of C. A Primitive Constraint (PC) is formula , where and are fluent expressions, and . A constraint C is a propositional combination of PCs. We will refer to the primitive constraints of the form , where , as a basic primitive constraint. We accept the constraint as syntactic sugar of and .
An axiom of the form in , declares that the action is executable by the agent . Observe that the same action name can be used for different actions executable by different agents. This does not cause ambiguity, since each agent’s knowledge is described by its own action theory. A special action, nop, is executable by every agent, and it does not change any of the fluents.
The actions for each player of Example 1 are:
one step in direction , where is one of the eight directions: north, north-east, east, , west, north-west.
the ball in direction (same eight directions as above) with strength varying from 1 to a maximum throw power ( in our example).
Moreover, the player of each team is in charge of checking if a point has been scored (in such case, he whistles). We write the actions as act(,action_name) and state these axioms:
action act([A],move(D)) :- whiteplayer(A),direction(D). action act([A],throw(D,F)) :- whiteplayer(A),direction(D),power(F). action act([player(white,1)],whistle).
where whiteplayer, power, and direction can be defined as follows:
The executability of the actions is described by axioms of the form:
where and C is a constraint. The axiom states that is executable only if C is entailed by the state of the world. We assume that at least one executability axiom is present for each action; multiple executability axioms are treated disjunctively.
In our working example, we can state executability as follows:
executable act([player(white,1)],whistle) if [S eq 0] :- build_sum(S). executable act([A],move(D)) if [hasball(A) eq 0, defense_time gt 0, Net lt x(A)+DX, x(A)+DX leq MX, 1 leq y(A)+DY, y(A)+DY leq MY] :- action(act([A],move(D))), delta(D,DX,DY), net(Net), linex(MX), liney(MY). executable act([A],throw(D,F)) if [hasball(A) gt 0,defense_time eq 0, 1 leq x(A)+DX*F, x(A)+DX*F leq MX, 1 leq y(A)+DY*F, y(A)+DY*F leq MY] :- action(act([A],throw(D,F))), delta(D,DX,DY), linex(MX), liney(MY).
These axioms state that neither a player nor the ball can leave the field. build_sum is recursively defined to return the expression: where are the players (i.e., player(white,1) and player(black,1)). The operators , etc. are concretely represented by eq, neq, leq, lt, respectively.
The effects of an action execution are modeled by dynamic causal laws:
where , is a constraint, and Eff is a conjunction of basic primitive constraints. The axiom asserts that if is true with respect to the current state, then Eff must hold after the execution of .
Since agents share fluents, their actions may interfere and cause inconsistencies. A conflict happens when the effects of different concurrent actions are incompatible and would lead to an inconsistent state; note that we allow only consistent states to exist during the evolution of the world. A procedure has to be applied to resolve a conflict and determine a consistent subset of the conflicting actions (see Sect. 3.3).
Let us describe the effects of the actions in the volleyball domain. When the ball is thrown with force in direction , it reaches a destination cell whose distance is as follows: a) if is either north or south then ; b) if is east or west then ; c) if is any other direction, . An additional effect is to set the fluent defense_time (to in our example).
act([A],throw(D,F)) causes hasball(A) eq 0 :- action(act([A],throw(D,F))). act([A],throw(D,F)) causes defense_time eq 1 :- action(act([A],throw(D,F))). act([A],throw(D,F)) causes pair(x(ball),y(ball)) eq pair(x(A)+ F*DX,y(A)+ F*DY) :- action(act([A],throw(D,F))), delta(D,DX,DY). act([A],throw(D,F), causes hasball(B) eq 1 if [pair(x(B),y(B)) eq pair(x(A)+F*DX, y(A)+F*DY)] :- action(act([A],throw(D,F))), player(B), neq(A,B),delta(D,DX,DY). act([A],throw(D,F)) causes point(black) eq 1 if [x(A)+F*DX eq Net] :- action(act([A],throw(D,F))), delta(D,DX,_), net(Net).
The effects of the other two actions move and whistle can be stated by:
act([player(white,1)],whistle) causes point(white) eq 1 if [x(ball) lt NET] :- net(NET). act([player(white,1)],whistle) causes point(black) eq 1 if [NET lt x(ball)] :- net(NET). act([A],move(D)) causes pair(x(A),y(A)) eq pair(x(A)+DX,y(A)+DY) :- action(act([A],move(D))), delta(D,DX,DY). act([A],move(D)) causes defense_time eq defense_time- 1 :- action(act([A],move(D))). act([A],move(D)) causes hasball(A) eq 1 if [pair(x(ball),y(ball)) eq pair(x(A)+DX,y(A)+DY)] :- action(act([A],move(D))), delta(D,DX,DY).
In presence of a conflict (i.e., two agents executing actions that assign a distinct value to the same fluent), at least two perspectives can be followed, by assigning either a passive or an active role to the conflicting agents. In the first case, a supervising entity is in charge of resolving the conflict, and all the agents will comply with the supervisor’s decisions. Alternatively, the agents themselves are in charge of reaching an agreement, possibly through negotiation. In the latter case, the following declarations allow one to specify in the action theories some basic reaction policies the agents might apply:
with defined as:
where is a number of steps and is a constraint. Notice that one can also specify policies to be adopted whenever a failure occurs in executing an action.
We remark here the difference between conflict and failure. A conflict occurs whenever concurrent actions performed by different agents try to make inconsistent modifications to the state of the world. A failure occurs whenever an action cannot be executed as planned by an agent . This might happen, for instance, because after the detection of a conflict involving , the outcome of the conflict resolution phase requires to be inhibited. In this case the agent might have to reconsider its plan. Hence, reacting to a failure is a “local” activity the agent might perform after the state transition has been completed. In axioms of the form (7), one can specify different reactions to a conflict (resp. a failure) of the same action. Alternatives will be considered in their order of appearance.
Let us assume that the agents and have priority , while agent has lower priority . Let us also assume that the current state is such that actions act_a, act_b, and act_c are all executable (respectively, by agents , , and ), where their effects on fluent are of setting it to 1, 2, and 3, respectively. This indicates a situation of conflict, since the effects of the concurrent execution of the three actions are inconsistent. Assume that the following options have been defined:
action act_a on_conflict retry_after 2 action act_b on_conflict forego action act_c on_failure retry_after 3
and that the plan of agent (resp., , ) requires the execution of action act_a (resp., act_b, act_c) in the current state. One possible conflict resolution is to focus the priority of the agents. This causes act_c to be removed from the execution list. Thus, agent fails in executing act_c and will retry the same action after 3 steps.
Some policy must be now chosen to resolve the conflict between and . The first possibility is that agents have passive roles in conflict resolution, and a supervisor selects, according to some criteria, a consistent subset of the actions/agents. For example, if is selected (e.g., by lexicographic order), then the state will be modified by setting , declaring act_a successful, while agent will fail.
An alternative is to allow the agents and to directly resolve the conflict, using their on_conflict options. This causes to retry the execution of act_a after 2 time steps and to forego the execution of act_b. Both of them will get a failure message, because neither act_a nor act_b are executed.
Apart from the possible communications occurring among agents during the conflict resolution phase, other forms of “planned” communication can be modeled in an action theory. An axiom of this form
describes a special static causal law that allows an agent to broadcast a request, whenever a certain condition () is encountered. By executing this action, an agent asks if there is another agent that can make the constraint true. Only an agent knowing all of the fluents occurring in is allowed to answer such request.
Instead of broadcasting an help request, an agent can send such a message directly to another agent by providing its name:111Any request sent to a nonexistent agent will never receive an answer.
The following communication primitive subsumes the previous ones:
If the last option is used, the requesting agent also provides a “reward” by promising to ensure in case of acceptance of the proposal. Axioms of this type allow us to model negotiations and other forms of bargaining and transactions.
In turn, agents may declare their willingness to accept requests and serve other agents using statements of the form
where is either a list of agent names —denoting that the agent in question can serve requests coming from the agents —or the keyword —denoting the fact that the agent can accept requests coming from any source. The optional condition allows the agent to select which requests to consider depending on properties of the current state of the world.
Let us consider a domain with three agents: a guitar maker, a joiner that provides wooden parts of guitars (bodies and necks), and a seller that sells strings and pickups. We assume that the maker has plenty of money (so we do not take into account what it spends), that the seller wants to be paid for its materials, and that necks and bodies can be obtained for free (e.g., the joiner has a fixed salary paid by the maker). The income of the seller is modeled by changes to the value of the fluent seller_account. In Figure 1 we report an action description that models the agent guitar_maker—analogous theories can be formulated for the other two agents. Observe that two point-to-point interactions are modeled—namely, the one between the guitar_maker and the joiner, to obtain necks and bodies, and the one between the guitar_maker and the seller, to buy strings ($8) and pickups ($60). Two kind of guitars can be made, differing in the number of pickups.
Various forms of global constraint can be exploited to impose control knowledge and maintenance goals. These constraints represent properties that must always persist in the world where the agents act. Some examples:
. This constraint is satisfied if the fluent constraint holds at the time step.
. This constraint imposes the condition that the fluent constraint holds in all the states of the evolution of the world.
Semantics of these constraints is reported in Section 3.1.
An action domain description consists of a collection of axioms of the forms described so far, for each agent . Moreover it includes, for each agent , a collection of goal axioms (objectives), of the form goal C, where C is a constraint, and a collection of initial state axioms of the form: initially C, where C is a constraint involving only timeless expressions. For the sake of simplicity, we assume that all the sets are drawn from a consistent global initial state description , i.e., . A specific instance of a planning problem is a triple
3 System behavior
The behavior of can be split into two parts: the semantics of the action description language, parametric on the supervisor selection strategy, and these strategies that can be programmed. We present the former in Section 3.1, the latter in Sections 3.2–3.5. Finally, some implementation notes are reported in Section 3.7.
3.1 Semantics of
The semantics of the action language is described by a transition function that operates on states. A state is identified by a total function . We assume a given horizon , within which the planning activities of all agents have to be completed.
Let be a state sequence, with . Given , , and a fluent expression , we define the concept of value of in at time , denoted by , as follows:
where . The last two cases specify the semantics of reification that relies on the notion of satisfaction, which in turn is defined by structural induction on constrains, as follows. Given a primitive constraint and a state sequence , the notion of satisfaction at time is defined as: iff . The notion is generalized to the case of propositional combinations of fluent constraints in the usual manner. For the case of pair, we have that if and only if .
We recall that a timeless fluent is a fluent expression of the form (and ).
Given a constraint and a state sequence , let be the set of timeless fluents occurring in . A function is a -solution of if . Let us observe that this definition makes use of a slight abuse of notation, since is potentially not a complete state (some fluents may have not been assigned a value by ). Nevertheless, the choice of fluents in guarantees the possibility of correctly evaluating . In other words, can be seen as a partial state contributing (with ) to the satisfaction of at time . Let us see how to complete this state using inertia: if is a -solution of a constraint , is defined as follows:
Fluents not appearing in are considered inertial (namely they maintain their previous values) and therefore the state is completed using the function ine.
An action is executable by agent in a state sequence if there is at least an axiom in and it holds that . If there is more than one executability condition, it is sufficient for one of them to apply.
Let us denote with the set of dynamic causal law axioms for action . The desired effect of executing in state sequence , denoted by , is a constraint defined as follows:
Request accomplishment actions can be used in the construction of this set.
Given a state sequence , a state , and a set of actions , a triple is a valid state transition if:
for all , the action is executable in by some agent , and
, where is a -solution of the constraint .
Observe that if , then will be a valid state transition.
Let be a sequence of states, an instance of a planning problem, and be sets of actions. We say that is a valid trajectory if:
for each agent and for each axiom of the form in , we have that ,
for all , is a valid state transition.
A valid trajectory is successful for an agent if, for each axiom of the form in , it holds that .
At each time step , each agent might propose a set of actions for execution—we assume that all the proposed actions are executable in the state sequence . Let be this set of actions. The supervisor selects a subset such that the constraint , defined as:
is satisfiable w.r.t. —i.e., there exists a complete state such that is a valid state transition. It is the job of the supervisor to determine the subset given and —as a maximal consistent subset, using agent priorities or other approaches, as discussed in Section 3.3. If an agent cannot find a plan at the time step it will ask for a nop and try again the next step.
Let us complete the semantics of the language by dealing with request and help laws. A request of the agent
is executable in a state sequence if it holds that . If the request above is executable, it can be accomplished in the successive state if there is an axiom
in and . The semantics of the help law is that of enabling a request accomplishment (after a request demand) and it can be viewed as the execution of an ordinary action by agent .222We hypothetically assume that has access to all fluents of . We can view this as if had an additional action defined in as:
Observe that, as happens for executability laws, multiple help preconditions are considered disjunctively. If the request includes also the option offering , then the action will cause as effect.
Let us add some comments on agents’ requests for action execution. Each agent wishes to execute some actions and to ask some requests. After the supervisor has decided which actions will be executed, each agent retrieves the relevant requests and analyzes them in order to possibly fulfill them in the next time step (see below for further details). These requests behave like an action , as stated above.
Two global constraints are allowed by the language . Their effect is to filter out sequences of states that do not fulfill those constraints:
imposes that any valid trajectory must satisfy
imposes that any valid trajectory must satisfy for all .
The supervisor is in charge of checking if these constraints can be satisfied while selecting as mentioned before. If the fluents involved in the constraints are all known to an agent , the set of actions proposed by are such that they will guarantee the property if all of them (and only them) are selected for application.
Each agent , at each time step , selects a set of actions it wishes to execute. For doing that, looks for a sequence of (sets of) actions to achieve its local goal, given the current state sequence . The set of actions are those to be executed at the current time step. If the new state communicated by the supervisor is different from the state it expected after the application of all the actions in the set (due either to the fact that some of these actions are not selected, or that other agents have executed actions that have unexpectedly changed some values), it will need to replan. Let us observe that, although globally the supervisor views a valid trajectory, locally this is not true (some state transitions are not justified by the actions of agent alone). However, in looking for a plan (and in replanning), it reasons on an “internal” valid trajectory from the current time to the future.
Let us focus on the problem of reacting to requests. Suppose that an agent , at time in a state sequence , receives the requests , where is of the form
and, moreover, assume that these requests are ordered (e.g., by the priorities of the requesting agent ). For , if contains an axiom
such that , the agent adds temporarily to its theory the constraint
and looks for a plan in the enlarged theory. If such a plan exists, the constraint (12) is definitely stored in , otherwise the request is ignored. In both cases, proceeds with next request (). At the end, some (possibly none) of the constraints will be fulfilled by a plan and the set of actions of the next step of this plan are passed to the supervisor.
Let us focus now on how the agent deals with the options related to a failure (this is also developed in Section 3.4). Let us assume an action submitted for execution at time has not been selected by the supervisor, and, therefore, a failure signal is returned to the agent . The current sequence of states is .
Let us analyze what happens in the three options:
: if then agent declares its failure. From this point onwards, the agent will not generate any actions, nor interact with other agents.
: if then is added in (and then the agent starts replanning)
: if then for time steps the agent requires only nop to the supervisor, at time step the action is required again.
If the if option is missing, the condition will be assumed to be satisfied. If the add_goal option is missing, no new goal will be added.
3.2 Concurrent plan execution
The agents are autonomous and develop their activities independently, except for the execution of the actions/plans. In executing their plans, the agents must take into account the effects of concurrent actions.
We developed the basic communication mechanism among agents by exploiting a tuple space, whose access and manipulation follows the blackboard principles introduced in the Linda model . Linda is a popular model for coordination and communication among processes; Linda offers coordination via a shared memory, commonly referred to as a blackboard or tuple-space. All the information are stored in the blackboard in the form of tuples—the shared blackboard provides atomic access and associative memory behavior (in retrieving and removing tuples). The SICStus Prolog implementation of Linda allows the definition of a server process, in charge of managing the blackboard, and client processes, that can add tuples (using the out operation), read tuples (using the rd operation) and remove tuples (using the in operation).
Most of the interactions among concurrent agents, especially those interactions aimed at resolving conflicts, are managed by a specific process, the supervisor, that also provides a global time to all agents, enabling them to execute their actions synchronously. The supervisor process stores the initial state and the changes caused by the successful executions of actions. It synchronizes the actions execution, and controls the coordination and the arbitration in case of conflicts. It also sends a success or a failure signal to each agent at each action execution attempt, together with the list of changes to its local state.
Let us describe how the execution of concurrent plans proceeds. As mentioned, each action description includes a set of constraints describing a portion of the initial state.
At the beginning, the supervisor acquires the specification of the initial state.
At each time step the supervisor starts a new state transition:
Each agent sends to the supervisor a request to perform an action—i.e., the next action of its locally computed plan—by specifying its effects on the (local) state.
The supervisor collects all these requests and starts an analysis, aimed at determining the subsets of actions/agents that conflict (if any). A conflict occurs whenever agents require incompatible assignments of values to the same fluents. The transition takes place once all conflicts have been resolved and a subset of compatible actions has been identified by means of some policy (see below). These actions are enabled while the remaining ones are inhibited.
All the enabled actions are executed, producing changes to the global state.
These changes are then sent back to all agents, to achieve the corresponding updates of each agent’s local state. All agents are also notified about the outcome of the procedure. In particular, those agents whose actions have been inhibited receive a failure message.
The computation stops when the time is reached.
Observe that, after each step of the local plan execution, each agent needs to check if the reached state still supports its successive planned actions. If not, the agent has to reason locally and revise its plan, i.e., initiate a replanning phase. This is due to the fact that the reached state might be different from the expected one. This may occur in two cases:
The proposed action was inhibited, so the agent actually executed a nop; this case occurs when the agent receives a failure message from the supervisor.
The interaction was successful, i.e., the planned action was executed, but the effects of the actions performed by other agents affected fluents in its local state, preventing the successful continuation of the remaining part of the local plan. For instance, the agent may have assumed that the fluent maintained its value by inertia, but another agent, say , changed such value. There is no direct conflict between the actions of and , but agent has to verify that the rest of its plan is still applicable (e.g., the next action in ’s plan may have lost its executability condition).
3.3 Conflict resolution
A conflict resolution procedure is invoked by the supervisor whenever it determines the presence of a set of incompatible actions. Different policies can be adopted in this phase and different roles can be played by the supervisor.
First of all, the supervisor exploits the priorities of the agents to attempt a resolution of the conflict, by inhibiting the actions issued by low priority agents. If this does not suffice, further options are applied. We describe here some of the easiest viable possibilities, that we have already implemented in our prototype. The architecture of the system is modular (see Sect. 3.7), and can be easily extended to include more complex policies and protocols.
The two approaches we implemented so far differ by assigning the active role in resolving the conflict either (a) to the supervisor or (b) to the conflicting agents.
In the first case, the supervisor has an active role—it acts as a referee and decides, without any further interaction with the agents, which actions have to be inhibited. In the current prototype, the arbitration strategy is limited to:
A random selection of a single action to be executed; or
The computation of a maximal set of compatible actions to be executed. This computation is done by solving a CSP—which is dynamically generated using a CLP() encoding.
Note that, in this strategy, the on_conflict policies assigned to actions by axioms (7) are ignored. This “centralized” approach is relatively simple; it has also strong potential of facilitating the creation of optimal plans. On the other hand, the adoption of a centralized approach to conflict resolution might become a bottleneck in the system, since all conflicting agents must wait for supervisor’s decisions.
In the second case, the supervisor simply notifies the set of conflicting agents about the inconsistency of their actions. The agents involved in the conflict are completely in charge of resolving it by means of a negotiation phase. The supervisor waits for a solution from the agents. In solving the conflict, each agent makes use of one of the on_conflict directives (7) specified for its conflicting action . The semantics of these directives are as follows (in all the cases [provided ] is an optional qualifier; if it is omitted it is interpreted as provided true):
The option on_conflict forego provided causes the agent to “search” among the other conflicting agents for someone, say , that can guarantee the condition . In this case, performs its action while the execution of ’s action fails, and executes a nop in place of its action . Different strategies can be implemented in order to perform such a “search for help”. A simple one is the round-robin policy described below, but many other alternatives are possible and should be considered in completing the prototype.
The option on_conflict retry_after provided , will cause to execute nop during the following time steps and then it will try again to execute its action (if the preconditions still hold).
If there is no applicable option (e.g., no option is defined or none of the agents accept to guarantee ), the action is inhibited and its execution fails.
The way in which agents negotiate and exploit the on_conflict options can rely on several protocols, of different complexity. For instance, one possibility might be to nominate a “leader” within each of the conflicting sets of agents. The leader is in charge of coordinating the agents in to resolve the conflict without interacting with the supervisor.
Another approach consists of letting each agent in free to proceed and to find an agreement by sending proposals to other agents (possibly by adopting some order of execution, some priorities, etc.) and receiving their proposals/answers. In the current prototype, we implemented a round-robin policy. Let us assume that the state sequence already constructed is and let us assume that the agents in the list aim at executing the set of actions , respectively. Furthermore, let us assume that the execution of all actions in will introduce a constraint that does not have a -solution. There is a sorting of the agents, and they take turn in resolving the conflict. Suppose that at a certain round of the procedure the agent is selected. tries its next unexplored on_conflict OP provided option for its action and checks if .
If then will apply the OP option and and are removed from and , respectively.
Otherwise, the next agent is selected and the successive call to will consider the next on_conflict option.
If there are no successive options for then will be removed from and a failure for will occur. After each step, if has a -solution, then the procedure will terminate and the actions in will be executed. Observe that this procedure always terminates with a solution to the conflict, since a finite number of on_conflict options are defined for each action.
This a relatively rigid policy, and it represents a simple example of how to realize a terminating protocol for conflict resolution. Alternative solutions can be added to the prototype thanks to its modularity.
Once all conflicts have been addressed, the supervisor applies the enabled actions, and obtains the new global state. Each agent receives a communication containing the outcome of its action execution and the changes to its local state. Moreover, further information might be sent to the participating agents, depending on the outcome of the coordination procedure. For instance, when two agents agree on an on_conflict option, they “promise” to execute specific actions (e.g., the fact that one agent has to execute consecutive nop).
3.4 Failure policies
Agents receive a failure message from the supervisor whenever their requested actions have been inhibited. In such a case, the original plan of the agent has to be revised to detect if the local goal can still be reached, possibly by replanning. Also in this case different approaches can be applied. For instance, one agent could avoid developing an entire plan at each step, but limit itself to produce a partial plan for the very next step. Alternatively, an agent could attempt to determine the “minimal” modifications to the existing plan in order to make it valid with respect to the new encountered state.333At this time, the prototype includes only replanning from scratch at each step.
In this replanning phase, the agent can exploit the on_failure options associated to the corresponding inhibited action. The intuitive semantics of these options can be described as follows.
retry_after [if ]: the agent first evaluates the constraint ; if holds, then it executes the action nop times and then tries again the failed action (provided that its executability conditions still hold).
replan [if ] [add_goal ]: the agent first evaluates ; if it holds, then in the following replanning phase the goal is added to the current local goal. The option add_goal is optional; if it is not present then nothing is added to the goal, i.e., it is the same as add_goal true.
fail [if ]: this is analogous to replan [if ] add_goal false. In this case the agent declares that it is impossible to reach its goal. It quits and does not participate to the subsequent steps of the concurrent plan execution.
If none of the above options is applicable, then the agent will proceed as if the option replan if true is present.
All the options declared for the inhibited action are considered in the given order, executing the first applicable one.
It might be the case that some global constraints (such as holds_at and always, cf., Sect. 2) involve fluents that are not known by any of the agents. Therefore, none of the agents can consider such constraints while planning. Consequently, these constraints have to be enforced while merging the individual plans. In doing this, the supervisor adopts the same strategies introduced to deal with conflicts and failures among actions, as described earlier. Namely, whenever a global constraint would be violated by the concurrent execution of actions (taken from different agents’ plans) a conflict is generated and a conflict resolution procedure executed. Thus, some of the conflicting actions will be inhibited causing their failure.
3.5 Broadcasting and direct requests
Let us describe a simple protocol for implementing the point-to-point and broadcast communications among agents, following an explicit request of the form (10). In particular, let us assume that the current state is the -th one of the plan execution—hence, the supervisor is coordinating the transition to the -th state by executing the -th action of each local plan. The handling of requests is interleaved with the agent-supervisor interactions that realize plan execution; nevertheless, the supervisor does not intervene on the requests, and the requests and offers are directly exchanged among agents. We can sketch the main steps involved in a state transition, from the point of view of an agent , as follows:
Agent tries to execute its action and sends this information to the supervisor (Sect. 3.2).
Possibly after a coordination phase, receives from the supervisor the outcome of its attempt to execute the action (failure or success, the changes in the state, etc.)
If the action execution is successful, before declaring the current transition completed, the agent starts an interaction with the other agents to handle pending requests. All the communications associated to such interactions are realized using Linda’s tuple-space (requests and offers are posted and retrieved by agents).
Agent fetches the collection of all the requests still pending and generated until step . For each request of help , originating from some agent , agent decides whether to accept or not. Such a decision might involve planning activities, in order to determine if the requested condition can be achieved by , possibly by modifying its original plan. In the positive case, posts its offer into the tuple-space and waits for a rendez-vous with .
Agent checks whether there are replies to the requests it previously posted. For each request for which replies are available, collects the set of offers/agents that expressed their willingness to help . By using some strategy, selects one of the responding agents, say . The policy for choosing the responding agent can be programmed (e.g., by exploiting priorities, agent’s knowledge on other agents, random selection, trust criteria, utility and optimality considerations). Once the choice has been made, establishes a rendez-vous with the selected agent and
declares its availability to ,
communicates the fulfillment of the request to the other agents.
The request and the obsolete offers are removed from the tuple space.
At that point in time, the transition can be considered completed for the agent . By taking into account the information about the outcome of the coordination phase in solving conflicts (point (2)), the agreement reached in handling requests (point (3)), might need to modify its plan. If the replanning phase succeeds, then will proceed with the execution of the next action in its local plan.
Note that we provided separated descriptions for steps (3.a) and (3.b). In a concrete implementation, these two steps have to be executed in an interleaved manner, to avoid that a fixed order in sending requests and offers causes deadlocks or starvation. Furthermore, if an agent fails in executing an action, then it will skip the step (3) and proceed with step (4) in order to re-plan its activity.
3.6 The languages , , and
The language , and its implementation, heavily relies on its foundations and . In this section we briefly compare these three languages to clarify which parts of the solvers of the previous languages can be used for the implementation of presented in Subsection 3.7.
Let us focus first on . This is a single agent framework. Therefore, considering a given action theory, all fluents and actions are known to the single agent, and the language does not permit to specify private fluents or actions. Moreover, allows one to specify static causal laws. The syntax of fluent expressions and constraints is exactly the same as in . The syntax for executability and action effects is analogous to that of . More precisely, in , these laws take the forms:
causes(,,), where is the constraint that will hold in the next state if the action is executed in a state where holds.
These are just syntactical variants of (5) and (6), respectively. The semantics of is given via a transition system analogous to that introduced for . In particular, one might note that if a action description involves a single agent that knows all the fluents (and no communication laws are included), then its semantics coincides with the one of the corresponding program obtained by an immediat syntactical translation. The Prolog interpreter for is proved to be correct and complete (for soundness the absence of static laws is needed, but this is the case of , as presented here) with respect to the semantics in .
Let us consider now . It is a multiagent, centralized language, where collective actions, namely actions that require more than one agent for being executed, are allowed. For instance, a law of the form
|action executable by|
specifies that agents may execute together the action . In , instead, in the domain of an agent , an action definition implicitly states that the action is executed by (hence, this is a particular case of the law). On the other hand, since the reasoner is centralized, conflicts among effects never occur and all (concomitant) planned actions are always successfully executed. The declaration of fluents in is analogous to that in , whereas has a different syntax for dynamic laws, since they can refer directly to action-occurrences. A dynamic law has the form Prec causes Eff, where Prec and Eff are constraints and at least one reference to an action x must explicitly occur in Prec. Such references are specified by exploiting action flags of the form actocc().
The semantics of is given via the same notion of transition system used for and for . If a multi-agent action description in , together with initial state and goal, is such that during the plan, no conflict occurs, then the action description obtained by a simple (mostly one-to-one) translation, has exactly the same behaviour on the transition system. Let us observe that in this translation, collective actions are not generated.
3.7 Implementation issues
A first prototype of the system has been implemented in SICStus Prolog, using the libraries clpfd for agents reasoning (by exploiting the interpreters for Action Description Languages described in [10, 11]), and the libraries system, linda/server, and linda/client for handling process communication.
The system is structured in modules. Figure 2 displays the modules composing the Prolog prototype and their dependencies. The modules spaceServer (via lindaServer) and lindaClient implement the interfaces with the Linda tuple-space. These modules support all the communications among agents.
Each autonomous agent corresponds to an instance of the module plan_executor, which, in turn, relies on a planner (the module sicsplan/bmap in Figure 2) for planning/replanning activities, and on client for interacting with other agents in the system. As explained previously, a large part of the coordination is guided by the module supervisor. Notice that both the supervisor and client act as Linda-clients. Conflict resolution functionalities are provided to the modules client and supervisor by the modules ConflictSolver_client and ConflictSolver_super, respectively. Finally, the arbitration_opt module implements the arbitration protocol(s). In the current code distribution, we provide an arbitration strategy that maximizes the number of actions performed at each step.
Let us remark that all the policies exploited in coordination, arbitration, and conflict handling can be customized by simply providing a different implementation of individual predicates exported by the corresponding modules. For instance, to implement a conflict resolution strategy different from the round-robin described earlier, it suffices to add to the system a new implementation of the module ConflictSolver_super (and for ConflictSolver_client, if the specific strategy requires an active role of the conflicting agents). Similar extensions can be done for arbitration_opt.
The system execution is rooted in the server process runner—written either for Linux (.sh) or for Windows (.bat) platforms, in charge of generating the connection address that must be used by the client processes.
The file settings.pl describes the planning problem to be solved. In particular, the user must specify in this file, through Prolog facts, the number and the names of these files containing the action descriptions, a bound on the maximum length of the plan, and the selected strategies for conflict resolution and arbitration (default choices can be used).
As far as the reasoning/planning module is concerned, we slightly modified the interpreters of the and the languages [10, 11] to accept the extended syntax presented here. However, the system is open to further extensions and different planners (even not necessarily based on Prolog technology) can be easily integrated thanks to the simple interface with the module plan_executor, which consists of a few Prolog predicates.
Currently, two planners have been integrated in the system: sicsplan is the constraint logic programming planner for the single-agent action language ; bmap is instead a constraint logic programming engine that supports centralized planning for multi-agent systems (capable, e.g., of collaborating in pursuing a common goal). Thus, the implementation allows each individual agent (according to the discussion from the previous sections) to be itself a complex system composed of multiple agents (operating in a cooperative fashion and planning in a centralized manner).
To accommodate for this perspective, the design of the supervisor has been modified. The framework allows each concurrent planner that executes a multiple-action step, to specify the desired granularity of the conflict resolution phase. This is done by specifying (for each step in a plan) a partition of the set of actions composing the step into those subsets of actions that have to be considered independently and as a whole.
For instance, in the next section we describe a specification of a coordination problem between two multi-agent systems. Each multi-agent system develops a plan in a centralized manner. Each step of such plans consists of a set of, possibly complex, actions (instead of a single action, as happens for the planner sicsplan). The conflicts between the multi-agent plans occurring during the -th state transition are identified/resolved by considering a single action of each -th step proposed by each planner.
Let us make some considerations about the soundness of the implementation. Let us consider one step in the construction of the trajectory. The state sequence already constructed is . The agents propose some actions for execution; the overall set of all actions proposed by all agents is . Agents propose for execution actions that are executable in . At the implementation level, the soundness property is guaranteed by the correctness of the sicsplan/bmap module—see Section 3.6.
Let us denote with the constraint that captures the effects of action ; i.e., if the action has dynamic causal laws for , then
Let be a Boolean variable, intuitively denoting whether the supervisor has selected action for execution at time .
The arbitration_opt implements an arbitration protocol producing a substitution for such that the constraint
has a -solution .
For example, in the current code distribution, the protocol is defined as a substitution that maximizes .
From these definitions and from the properties of sicsplan/bmap, we have that is a valid state transition.
If the conflict resolution is left to the agents, then the protocol is the outcome of the conflict resolution procedure, e.g., the round-robin analysis of the conflicting actions described in Section 3.3, which is currently implemented. It is immediate to check that the round-robin procedure produces a protocol that satisfies the properties shown above.
Due to the generality of the language for agent-based on-conflict resolution, the correctness of any conflict resolution procedure must be independently proved. Correctness is not an immediate consequence of the language itself but is dependent on the specific on-conflict declaration are used in the specific procedure.
3.8 The volleyball domain
Let us describe a specification in of a coordination problem between two multi-agent systems—an extension of the domains described in Examples 1–4. There are two teams: black and white whose objective is to score a point, i.e., to throw the ball in the field of the other team (passing over the net) in such a way that no player of the other team can reach the ball before it touches the ground. Each team is modeled as a multi-agent system that elaborates its own plan in a centralized manner (thus, each step in the plan consists of a set of actions).
The playing field is discretized by fixing a rectangular grid that determines the positions where the players (and the ball) can move (see Fig. 3). The leftmost (rightmost) cells are those of the black (white) team, while the net () separates the two subfields. There are players per team ( in Fig. 3)—concretely, the fact num(2) is added to the theory. The allowable actions are: move, throw, and whistle. During the defense time, the players can move to catch the ball and/or to re-position themselves on the court. When a player reaches the ball (s)he will have the ball and will throw the ball again. A team scores a point either if it throws the ball to a cell in the opposite subfield that is not reached by any player of the other team in the defense time, or if the opposite team throws the ball in the net. The captain (first player) of each team is in charge of checking if a point has been scored. In this case, (s)he whistles.
Each team (either black or white) is modeled as a centralized multi-agent system, which acts as a singe agent in the interaction with the other team. Alternative options in modeling are also possible—for instance, one could model each single player as an independent agent that develops its own plan and interacts with all other players. The two teams have the goal of scoring a point: goal(point(black) eq 1). for blacks and goal(point(white) eq 1). for whites.
At the beginning of the execution every team has a winning strategy, developed as a local plan; these are possibly revised after each play to accommodate for the new state of the world reached. An execution (as printed by the system) is reported in Fig. 3, for a plan length of . The symbol 0 (respectively, Y) denotes the white (respectively, black) players, Q (resp. X) denotes a white player with the ball. The throw moves applied are:
|[player(black,1)]:throw(ne,3)||(time 1)||[player(black,2)]:throw(se,3)||(time 3)|
|[player(white,1)]:throw(w,5)||(time 5)||[player(black,1)]:throw(e,5)||(time 7)|
Let us observe that, although it would be in principle possible for the white team to reach the ball and throw it within the time allowed, it would be impossible to score a point. Therefore, players prefer to avoid to perform any move.
The complete description of the encoding of this domain is available at http://www.dimi.uniud.it/dovier/BAAC. The repository includes also additional domains—e.g., a domain inspired by games involving one ball and two-goals, as found in soccer. Although the encoding might seem similar to that of volleyball, the possibility of contact between two players makes this encoding more complex. Indeed, thanks to the fact that the net separates the two teams, in the volleyball domain rules like the following one suffice to avoid collisions:
always(pair(x(A),y(A)) neq pair(x(B),y(B))) :- A=player(black,N),B=player(black,M), num(N), num(M), N<M.
In a soccer world this is not true because only the supervisor can be aware, in advance, of possible contacts between different team players originating from concurrent actions. This generates interesting concurrency problems, e.g., concerning the ball possession after a contact. A simple way to address this problem consists in assigning a fluent to each field cell, whose value can be (free), (resp., ) if a white (resp. black) player is in the cell. The supervisor identifies a conflict when two opponent players move to the same cell, thus assigning to that fluent a different value. In this case, the supervisor arbitrarily enables one action, the other agent waits a turn to retry the action:
action act([A],move(D)) on_failure retry_after 1 on_conflict arbitrate :- agent(A), direction(D).
4 Conclusions and future work
In this paper, we illustrated the design of a high-level action description language for the description of multi-agent domains. The language enables the description of agents with individual goals operating in a shared environment. The agents can explicitly interact (by requesting help from other agents in achieving their own goals) and implicitly cooperate in resolving conflicts that may arise during execution of their individual plans. The main features of the framework we described in this paper have been realized into an implementation, based on SICStus Prolog. The implementation is fully distributed, and uses Linda to enable communication among agents. Such a prototype is currently being refined and extended with further features.
There have been many agent programming languages such as the BDI agent programming AgentSpeak , (as implemented in Jason ), JADE  (and its extension Jadex ), ConGolog , IMPACT , 3APL , GOAL . A good comparison of many of these languages can be found in . The emphasis of the effort presented in this paper is to expand our original work on constraint-based modeling of agents based on action languages. The generalization to a constraint-based multi-agent action language has been presented in . In this paper we demonstrate a further extension to encompass distributed reasoning and distributed planning. Thus, the focus of the proposal remains on the level of creating an action language and demonstrating the suitability of constraint-based technology to support it. As such, we do not propose here a new agent programming language, rather we push an action language perspective and how action languages scale to multi-agent domains; our work could be used as the underlying formalism for the development of new agent programming languages. In this sense, our proposal is different than many of the MAS development platforms, which focus on programming languages for MAS and on complex protocols for advertising and interaction among agents (e.g., FIPA).
The choice of Linda came about for simplicity; we required the use of a CLP platform and SICStus provides support for both Linda and constraint handling—as few other distributed communication platforms (e.g., OAA ). In the long term, we envision mapping our agent design on a MAS infrastructure that enables discovery and addition of agents, handles network-wide distribution of agents, mapping the exchange of constraints to a standard agent communication language (e.g., FIPA-ACL/FIPA-SL ). This will require a non-trivial engineering work, to map the reasoning with action languages (e.g., planning) to a platform that is not constraint-based—we are currently exploring the problem in the context of Jason .
The work is an initial proposal that already shows strong potential and several avenues of research. The immediate goal in the improvement of the system consists of adding refined strategies and coordination mechanisms, involving for instance, payoff, trust, etc. Then, we intend to evaluate the performance and quality of the system in several multi-agent domains (e.g., game playing scenarios, modeling of auctions, and other domains requiring distributed planning). We also plan to investigate strategies to enhance performance by exploiting features provided by the constraint solving libraries of SICStus (e.g., the use of the table constraint ).
We will investigate the use of future references in the fluent constraints (as fully supported in )—we believe this feature may provide a more elegant approach to handle the requests among agents, and it is necessary to enable the expression of complex interactions among agents (e.g., to model forms of negotiation with temporal references). In particular, we view this platform as ideal to experiment with models of negotiation (e.g., as discussed in ) and to deal with commitments  (which often require temporal references).
We will also explore the implementation of different strategies associated to conflict resolution; in particular, we are interested in investigating how to capture the notion of “trust” among agents, as a dynamic property that changes depending on how reliable agents have been in providing services to other agents (e.g., accepting to provide a property but failing to make it happen). Also concerning trust evaluation, different approaches can be integrated in the system. For instance, a “controlling entity” (e.g., either the supervisor or a privileged/elected agent) could be in charge of assigning the “degree of trust” of each agent. Alternatively, each single agent could develop its own opinion on other agents’ reliability, depending on the behavior they manifested in past interactions.
Finally, work is needed to expand the framework to enable greater flexibility in several aspects, such as:
Allow deadlines for requests—e.g., by allowing axioms of the form
request if until
indicating that the request is valid only if accomplished within time steps.
Allow constraint based delays for requests:
request if while
indicating that the request is still valid while constraint is entailed.
Allow dynamic changes in the agents’ knowledge about other agents (e.g., an action might make an agent aware of the existence of other agents), or about the world (e.g., an action might change the rights another agent has to access/modify some fluents).
The authors wish to thank the anonymous reviewers for their insightful comments.
-  Barták, R. and Toropila, D. 2008. Reformulating constraint models for classical planning. In Int. Florida AI Research Society Conference, AAAI Press, 525–530.
-  Bellifemine, F., Caire, G., and Greenwood, D. 2007. Developing Multi-Agent Systems with JADE. John Wiley & Sons.
-  Bordini, R., Hübner, J., and Wooldridge, M. 2007. Programming Multi-agent Systems in AgentSpeak using Jason. J. Wiley and Sons.
-  Braubach, L., Pokahr, A., and Lamersdorf, W. 2005. Jadex: a BDI-Agent System Combining Middle-ware and Reasoning. In Software Agent-based Applications, Platforms and Development Kits. Springer Verlag.
-  Carriero, N. and Gelernter, D. 1989. Coordination Languages and their Significance. Communications of the ACM 32 4.
-  Cheyer, A. and Martin, D. 2001. The Open Agent Architecture. Journal of Autonomous Agents and Multi-Agent Systems 4, 1, 143–148.
-  Dastani, M., Dignum, F., and Meyer, J.-J. 2003. 3APL: A programming language for cognitive agents. ERCIM News 53, 28–29.
-  de Boer, F., Hindriks, K., van der Hoek, W., and Meyer, J. 2005. A Verification Framework for Agent Programming with Declarative Goals. JAL, 5, 277–302.
-  De Giacomo, G., Lespèrance, Y., and Levesque, H. 2000. ConGolog, a concurrent programming language based on the situation calculus. AIJ, 121, 1–2, 109–169.
-  Dovier, A., Formisano, A., and Pontelli, E. 2009. Representing multi-agent planning in CLP. In LPNMR , Lecture Notes in Computer Science, vol. 5753. Springer, 423–429.
-  Dovier, A., Formisano, A., and Pontelli, E. 2010. Multivalued action languages with constraints in CLP(FD). Theory and Practice of Logic Programming 10, 2, 167–235.
-  Fagin, R. et al. 1995. Reasoning about knowledge. The MIT Press.
Gelfond, M. and Lifschitz, V. 1998.
Electronic Transactions on Artificial Intelligence2, 193–210.
-  Gerbrandy, J. 2006. Logics of propositional control. In , 193–200.
-  Hayzelden, A. and Bourne, R. 2001. Agent Technology for Communication Infrastructures. John Wiley & Sons.
-  Mallya, A. and Huhns, M. 2003. Commitments among agents. IEEE Internet Computing 7, 4, 90–93.
-  Mascardi, V., Martelli, M., and Sterling, L. 2004. Logic-based specification languages for intelligent agents. Theory and Practice of Logic Programming 4, 4, 495–537.
-  Nakashima, H., Wellman, M. P., Weiss, G., and Stone, P., Eds. 2006. International Joint Conference on Autonomous Agents and Multiagent Systems. ACM.
-  Rao, A. 1996. AgentSpeak: BDI Agents Speak Out in a Logical Computable Language. In European Workshop on Modeling Autonomous Agents in a Multi-Agent World.
-  Sauro, L., Gerbrandy, J., van der Hoek, W., and Wooldridge, M. 2006. Reasoning about action and cooperation. See , 185–192.
-  Son, T., Pontelli, E., and Sakama, C. 2009. Logic programming for multiagent planning with negotiation. In Int. Conference on Logic Programming. Springer, 99–114.
-  Spaan, M. T. J., Gordon, G. J., and Vlassis, N. A. 2006. Decentralized planning under uncertainty for teams of communicating agents. In AAMAS, ACM Press, 249–256.
-  Subrahmanian, V. S., Bonatti, P., Dix, J., Eiter, T., Kraus, S., Ozcan, F., and Ross, R. 2000. Heterogeneous Agent Systems: Theory and Implementation. MIT Press.
-  van der Hoek, W., Jamroga, W., and Wooldridge, M. 2005. A logic for strategic reasoning. In AAMAS, ACM Press, 157–164.