Vocabulary Alignment in Openly Specified Interactions

by   Paula Chocron, et al.

The problem of achieving common understanding between agents that use different vocabularies has been mainly addressed by designing techniques that explicitly negotiate mappings between their vocabularies, requiring agents to share a meta-language. In this paper we consider the case of agents that use different vocabularies and have no meta-language in common, but share the knowledge of how to perform a task, given by the specification of an interaction protocol. For this situation, we present a framework that lets agents learn a vocabulary alignment from the experience of interacting. Unlike previous work in this direction, we use open protocols that constrain possible actions instead of defining procedures, making our approach more general. We present two techniques that can be used either to learn an alignment from scratch or to repair an existent one, and we evaluate experimentally their performance.



There are no comments yet.


page 1

page 2

page 3

page 4


On Counting the Population Size

We consider the problem of counting the population size in the populatio...

Incorporating social practices in BDI agent systems

When agents interact with humans, either through embodied agents or beca...

Extending Agents by Transmitting Protocols in Open Systems

Agents in an open system communicate using interaction protocols. Suppos...

Strengthening Gossip Protocols using Protocol-Dependent Knowledge

Distributed dynamic gossip is a generalization of the classic telephone ...

Meta-trained agents implement Bayes-optimal agents

Memory-based meta-learning is a powerful technique to build agents that ...

Value Alignment Verification

As humans interact with autonomous agents to perform increasingly compli...

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

The UMLS Metathesaurus integrates more than 200 biomedical source vocabu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Addressing the problem of vocabulary heterogeneity is necessary for the common understanding of agents that use different languages, and therefore crucial for the success of multi-agent systems that act jointly by communicating. This problem has been tackled several times in the past two decades, in general from two different perspectives. Some approaches [20, 8] consider the existence of external contextual elements that all agents perceive in common, and explore how those can be used to explain the meaning of words. A second group of techniques [15, 18] (and also [8], where both approaches are combined) consider the situation, reasonable in agents that communicate remotely, in which this kind of context is not available. They do so by providing explicit ways of learning or agreeing on a common vocabulary (or alignment between vocabularies). These techniques require agents to share a common meta-language that they can use to discuss about the meaning of words and their alignments. The complex question of how to communicate with heterogeneous interlocutors when neither a physical context nor a meta-language are available remains practically unexplored.

W: to drink?

C: wine

C: beer

W: color?

C: red

C: white

W: da bere?

C: vino

C: birra

W: tipo?

C: rosso

C: bianco
Figure 1: Protocols for Waiter (W) and Customer (C)

The work by Atencia and Schorlemmer [1] approaches this situation by considering a different kind of context, given by the interactions in which agents engage. Agents are assumed to share the knowledge of how to perform a task, or, more concretely, the specification of an interaction, given by a finite state automaton. For example, they consider the interaction in Figure 1, in which an English speaking customer and an Italian waiter communicate to order drinks. The authors show how agents can progressively learn which mappings lead to successful interactions from the experience of performing the task. After several interactions, agents converge to an alignment that they can use to always succeed at ordering and delivering drinks with that particular interlocutor. The interesting aspect of this idea is that the only shared element, namely the interaction specification, is already necessary to communicate. However, using finite state automata as specifications implies that agents need to agree on the exact order in which messages should be sent, which is unnecessarily restrictive. In addition, agents learn an alignment that is only useful for one task, and how it can be extrapolated to further interactions is not explored.

In this paper we present a general version of the interaction as context approach to vocabulary alignment in multi-agent systems. Instead of automata, we consider agents that specify interactions with constrained-based open protocols that define rules about what can be said instead of forcing one particular execution. In particular, we use ConDec protocols [14] that use linear temporal logic constraints. One ConDec protocol may not define completely the meaning of all words in a vocabulary, so we study how agents can learn mappings from performing different tasks. For this reason, the learning process is substantially different to the one in [1], where a mapping that works in one task is considered correct. If enough experiences are available, agents that use the techniques we present converge to an alignment that is useful in the general case, even for future interactions that they do not yet know. Since learning is done gradually as different interaction opportunities appear, agents can use what they learned early, even if they still do not know the alignment completely.

After presenting open interaction protocols

, we define a framework for interacting with partners that use different vocabularies. We later present two different techniques to learn an alignment from the experience of interacting. The first relies only on analysing if a message is allowed or not in a particular moment, and it can be used in any constraint-based protocol. The second technique uses the semantics of the protocols we present to improve the performance of the first one. Both methods can be used to learn alignments from scratch when there is no information, as well as to repair alignments obtained with other methods. We evaluate experimentally the techniques when used for both purposes, and show how different factors affect their performance.

2 Open Interaction Protocols

The question of what should be expressed by an interaction protocol has been extensively discussed in the multi-agent systems community. Traditional approaches such as finite state automata and Petri Nets are simple to design and read, but also too rigid. More flexible alternatives constrain possible actions instead of defining a fixed procedure. Constraint Declarative Protocols (commonly known as ConDec protocols) are an example of these approaches. ConDec protocols were first proposed by Pesic and van der Aalst [14] as a language to describe business protocols, and then used as specifications for agent interactions by Montali [13], who presented an extension of ConDec and tools for its verification, and by Baldoni et al. [2, 3], who integrated ConDec constrains and commitment protocols, a framework to specify interactions with social semantics first proposed by Singh [19]. An important advantage of ConDec protocols is that they use linear temporal logic, a well-known logic for which many tools are available.

Linear temporal logic (LTL from now on) is a natural choice to express constraints about actions that occur in time. The syntax of LTL includes the one of propositional logic, and the additional temporal operators , that can be applied to any LTL formula. LTL formulae are interpreted over paths in Kripke structures, which are sequences of states associated to a truth-valuation of propositional variables. Temporal operators are interpreted in these paths as follows: means that must be true in the truth-valuation of all states, means that must be true eventually, means that must be true in the next state, and means that must be true until is valid. A set of LTL formulae, called a theory, is satisfiable if there exists a path for which all the formulae are true. In that case, the path is a model of the theory. The satisfiability problem in LTL is decidable, as well as the model checking problem, which consists in deciding if a given path is a model of a theory.

A ConDec protocol is a tuple , where is a set of action names and is a set of constraints about how actions can be performed. Constraints are LTL sentences renamed conveniently. We use a minimal version111 Since we are not interested in the usability of the protocols here, we do not include some constraints that work as syntactic sugar, such as , that can be replaced by including and . of the constraints introduced originally by Pesic and van der Aalst that we show in Table 1, where and . These constraints are generally divided into three classes. Existential constraints ( and ) predicate over the amount of times some action can be performed, relational constraints describe binary relations between two actions, and negation constraints (preceded by a ‘!’ sign) are relations that do not hold. Given a set of actions, is the set of all possible constraints over . In a protocol , necessarily .

Constraint LTL meaning
Table 1: LTL definitions of constraints

2.1 Open Protocols as Interaction Protocols

In the rest of this section we present the technical notions that are necessary to use ConDec protocols as specifications of interactions between agents that may use different vocabularies. We first define interaction protocols, that constrain the way in which agents can utter messages. We introduce the notion of bound to capture the knowledge of agents about how many steps they have to finish the interaction. If this knowledge is not available it can be omitted and replaced for the fact that the interaction must finish in finite steps.

Definition 1

Given a set of agent IDs and a propositional vocabulary , an interaction protocol is a tuple , where the set of actions is a set of messages formed by the ID of the sender agent and the term it utters, , and is the bound.

The semantics of an interaction protocol is defined over interactions that represent a sequence of uttered messages.

Definition 2

Given a set of messages , an interaction is a finite sequence of messages . The length of an interaction () and the append operation () are defined in the same way as for sequences.

An interaction can be encoded into a Kripke structure path by including one propositional variable for each message and considering the following sequence of states with truth-valuations. In states with index , let the propositional variable for the -th message in be true, and all other propositional variables be false. For states after , let all propositional variables be false. With this construction, we can define the semantics of interaction protocols.

Definition 3

An interaction is a model of an interaction protocol (noted ) if , and the Kripke path that encodes is a model of . An interaction is a partial model (noted ) of if it is a prefix of a model of .

Definition 3 implies that checking satisfiability of an interaction protocol is equivalent to checking LTL satisfiability.

As already mentioned, we are interested in agents that use different vocabularies, but share the knowledge of how to perform a task. In the rest of this section we define more precisely what that means. From now on, and are two possibly different vocabularies. Let us start by defining the notion of vocabulary alignment.

Definition 4

An alignment is a function between vocabularies . Given a set of agents , can be extended homomorphically to a function between:

  • messages and ()

  • constraints () and sets of constraints ()

  • interactions () and sets of interactions ()

To capture the idea of sharing the knowledge of how to perform a task, we define notion of compatibility between protocols, which consists simply on having the same models modulo an alignment. Given a protocol , we define the set of its models as .

Definition 5

Two interaction protocols and with sets of messages and are compatible if there exists an alignment such that

If we know the alignment for which the condition holds, we can say they are compatible under .

In this paper, for simplicity, we will restrict to using only bijective alignments over vocabularies of the same size.

An Example

Let us extend the example of Figure 1 by considering again a waiter W and a customer C and the ordering drinks situation. The vocabulary of the customer is {to drink, beer, wine, water, size, pint, half pint}, while the Waiter uses {da bere, birra, vino, acqua, tipo, media, piccola}. Consider the bijective alignment obtained by mapping each word in with the word in the same position in . We consider the following protocols to specify the ordering drinks interactions:

The two protocols above are not compatible under any bijective alignment, since the customer protocol has as model an interaction in which the customer orders wine and beer, while the waiter protocol has as models interactions in which the waiter only accepts one beverage. If is added to , the resulting protocols are compatible, in particular under .

3 Communicating with Heterogeneous Partners

We focus on interactions between two agents and with vocabularies and respectively. During an interaction, an agent can send messages composed of words in its vocabulary and receive others from its interlocutor, or finish the communication if certain conditions hold. We assume messages are never lost and always arrive in order, and more strongly, that each message is received before the following one is uttered. This requires that agents agree in who speaks at each time, although we do not force them to follow any particular turn-taking pattern.

Agents interact to perform some task together, that each of them specifies with an interaction protocol. We assume and agree on a set of tasks that they can perform and on how they are performed: their respective protocols for each task are compatible. Our agents use their vocabulary consistently throughout different tasks, i.e., the protocols for all tasks are compatible under the same alignment.

From now on, we will use sets of messages and . We assume there exists a bijective alignment such that, for each task that and can perform, they have protocols and respectively, and are compatible under . We assume that whenever agents interact, they use protocols that correspond to the same task.

We present a general approach to learn the alignment from the experience of interacting to perform different tasks sequentially. The methods we present are used by one agent alone, and do not require that its interlocutor uses them as well. To explain the techniques, we adopt the perspective of agent .

Briefly, we propose to learn by taking into account the coherence of messages. When receives a message, it learns from analysing which interpretations are allowed by the protocol and which are not. This information, however, is not definitive. An allowed word can be an incorrect interpretation for a received message, because protocols do not restrict completely all possible meanings. Moreover, a forbidden message can still be a correct interpretation. Since some rules express relations between messages, a previously misinterpreted word (by the agent or by its interlocutor) can make a correct mapping be impossible.

To handle this uncertainty, we propose a simple probabilistic learning technique. Agents maintain an interpretation distribution over possible meanings for foreign words, and update their values with the experience of interacting. Formally, for each that knows about (because it received it in a message at some point) and for each , agent has a weight that represents its confidence in that . Using the bijectivity, we will keep , but we do not require the same in the other direction since the foreign vocabulary is unknown a priori.

The techniques we propose can incorporate existing alignments, in the same way as it is done in [5] for the case when protocols are automata. These alignments represent a priori knowledge about the foreign vocabulary, that can have been obtained from previous experience, from a matching tool, or with other techniques such as a syntactic similarity measure. Since these techniques are never fully correct, previous alignments should not be trusted completely, and the techniques we propose can work as methods for repairing them. A previous alignment for agent is a partial function , where is a set of terms. Note that is not necessarily defined over , since the vocabulary that uses may be unknown a priori. However, the information it provides can be used even if is a subset of or they overlap. We interpret the confidences always positively, meaning that a mapping with low confidence has still more confidence than one that does not exist. If has a previous alignment , it initializes the interpretation distribution using that information. They first assign raw values:

Then they normalize the values using an exponential method such as softmax. This is necessary to start the technique with the values for all mappings in the interval

, since the alignment can be wrong. If there is no previous alignment, they always initialize the interpretation values with an uniform distribution:

for all .

We explain in detail the learning techniques that we propose in Section 4. In the rest of this section we focus on the dynamics of the interaction: how agents choose which messages to say, finish the interaction, and decide interpretations for the messages they receive.

Choosing messages to send

The choice of messages to utter is internal of each agent and depends on its interests while interacting. We do not impose any restriction on this, besides respecting the constraints in the protocol. Formally, can utter a message only if is a possible message as defined below.

Definition 6

Consider an agent following protocol and suppose interaction has happened so far (and ). The possible messages at that point are all messages such that when they are performed the interaction remains a partial model, that is .

Finishing the interaction

An interaction can finish in three situations. First, if there is a bound, reaching it implies the automatic end of the conversation. The interaction also finishes if an agent receives a message that has no possible interpretation. An agent can also finish the conversation whenever it wants if it considers that the interaction is successful, i.e., if it is a model of the protocol when it ends. In this case, it simply stops talking, producing a time-out that let the other agent realize the interaction finished.

Choosing interpretations

When agent receives a message from , it needs to interpret it in to be able to continue the interaction. To this aim, agents use the information in the interpretation distribution. Since agents assume there exists a bijective alignment, they always choose the same interpretation for a word during one interaction and they do not choose words that have been already chosen as interpretations.

Consider agent receives word after interaction . Let be the mappings made function, where if and only if was chosen as a mapping for before in the same interaction. The domain are the that received since the current task started, and its image are the that were chosen as mappings. The set of possible interpretations for is the set of words in such that is a possible message after , and such that .

If is a function that chooses an element randomly in the set , the agent will choose an interpretation as follows:

If either but , or , the interaction is in one of the failure cases, and it finishes because stops talking.

4 Learning Alignments from Interactions

To learn an alignment from the experience of interacting, agents make the following well behaviour assumptions about their interlocutor. Essentially, these assumptions imply that the dynamics of the interaction described in the previous section are respected.

  1. Compliance: An agent will not utter a message that violates the constraints, or that makes it impossible to finish successfully the interaction in the steps determined by the bound (or in finite steps if there is no bound).

  2. Non Abandonment: An agent will not finish intentionally the conversation unless the constraints are fulfilled.

Agents learn when they decide how to interpret a received word. The overall method is simple: if receives , it updates the value of for all the words in a set . The set , unlike the set of possible interpretations used to decide the mapping, can have interpretations that are not possible messages in that particular moment, and the values in the interpretation distribution are updated according to that.

A first decision is how to choose the set . Since checking satisfiability is computationally expensive, this can impact considerably the overall performance of the methods. A first option is to update the value of all possible words (), which can be slow for large vocabularies. Another option is to use only the options that are considered until an interpretation is chosen, or all those words that have more value than the first possible interpretation. In this case, if is chosen as an interpretation for , . A third possibility is a midpoint between these two, in which . This option updates also the words that have the same value as the chosen interpretation. If the distribution is uniform when the learning starts, this updates many possible interpretations in the beginning and fewer when there is more information. This is the approach we choose, and the one used for the evaluation.

We present two methods to update the values in the interpretation distribution. The first one is a general technique that only takes into account whether messages are possible. This method is not exclusively designed for protocols that use LTL, and can be used for constraints specified in any logic for which agents have a satisfiability procedure. In a second method we show how the particular semantics of the protocols we introduced can be taken into account to improve the learning process.

4.1 Simple Strategy

The first approach consists in updating the value of interpretations for a received message, according to whether it is coherent or not with the protocol and the performed interaction. When a new message is received, the values for all interpretations are initialized as described in the last section. We use a simple learning method in which updates are a punishment or reward that is proportional to the value to update. We chose this method because it incorporates naturally previous alignments, it is very simple to formulate, and the proportional part updated can be manipulated easily to consider different situations, something that will be relevant for the second technique. However, it is not necessarily the most efficient choice, and other methods, such as one that records a history, can converge faster. Consider a reward and a punishment parameter . We use the notion of possible messages in Definition 6 and update the values for as follows:

After all the updates for a given message are made, the values are normalised in such a way that . Either simple sum-based or softmax normalization can be used for this.

4.2 Reasoning Strategy

Violated Constraint Action

Table 2: Updates for each violated constraint when a message is received

The simple strategy does not make use of some easily available information regarding which constraints were violated when a message is considered impossible. To illustrate the importance of this information, consider an interaction in which the Customer said water but the Waiter interpreted it as vino. If the Customer says beer, the Waiter may think that interpreting it as birra is impossible, since ordering two alcoholic beverages is not allowed ( would be violated). However, this is only due to misunderstanding water in the first place. In the reasoning approach, agents make use of the semantics of violated rules to perform a more fine-grained update of the values. Before presenting the method, we need to divide constraints in two kinds, that will determine when agents can learn from them.

Definition 7

A constraint is semantically non-monotonic if there exist interactions and such that is a prefix of , and but . A constraint is semantically monotonic if this cannot happen, and implies that also all its extensions are models of .

The following proposition divides the constraints in our protocols between monotonic and non-monotonic.

Proposition 1

The constraints , and are semantically monotonic. All the rest of the constraints defined are semantically non-monotonic.

Proof 1

To prove non-monotonicity, it is enough to show an example that violates the constraint. For example, consider and suppose an interaction such that and . Then and . Monotonicity can be proven by counterexample, showing that for a constraint , if then necessarily . For example, if violates , there are two options, either or . Let and be the number of occurrences of in and respective. In both cases, , so if , then necessarily .

Non-monotonic constraints can be used to learn while interacting, using the principle of compliance (if a constraint is violated, something must be wrong in the interpretations). Monotonic constraints could be used when the interaction ends, using the principle of non-abandonment (if not all constraints are satisfied, there must be an error). However, since our agents cannot communicate in any way, they ignore why the interaction ended, and therefore they do not know if their interlocutor considers the constraints were satisfied or not, making it very difficult to decide how to update values. In this work we focus on handling non-monotonic constraints, leaving monotonic ones for future work where an ending signal is introduced. To start, let us define formally when a non-monotonic constraint is violated.

Definition 8

Given an interaction and an interpretation for a received message, a constraint is violated if but .

If agent decides that is not a possible interpretation for , let be all that are violated by . It is possible that . This can happen when there are no violated constraints because a subset of constraints becomes unsatisfiable, for example, if the waiter has not said size and a protocol includes the rules
, the customer cannot say wine even when would be empty. Another situation in which a message can be impossible without any constraint being violated is when it makes the interaction impossible to finish successfully in the number of utterances indicated by the bound, or in finite time if there is no bound.

To explain the update in the reasoning strategy, consider again reward and a punishment parameter . If an interpretation is possible for a received message , then . If it is not, it is updated as indicated in to Table 2, according to which rule was violated. These actions are performed iteratively for each broken rule. After the updates, all values are normalized to obtain . In this case, we need to use a method that maintains the values that are , so softmax is not a possibility.

Let us explain the motivation under each update. Violating the existential non-monotonic constraint is different to violating a relation constraint, because it expresses a condition over only one action, so if the chosen interpretation violates it, it must be wrong. For this reason, the value of the mapping is set to .

Negation constraints express a relation over two actions, and if their are broken, necessarily both of them happened in the interaction, and if a constraint is violated, it could be because either of them was misinterpreted. To take this into account, we propose an update inspired in the Jeffrey’s rule of conditioning [17]

, that modifies the Bayes Rule of conditional probability for the cases when evidence is uncertain. If

is a probability distribution for a partition

of possible evidence, then Jeffrey’s rule states that the posterior probability of

is computed as:

Back to our problem, let be a received message such that . Suppose receives a new message , and it discovers that mapping it with violates a constraint . This means that , and therefore . Then, if is the set of possible mappings, we can write:

or, what is equivalent, if is the set of incompatible mappings, since :

We implement this by subtracting for each mapping a value that is proportional to the confidence in the other mapping. Since agents do not explore all the possible interpretations, the value is not computed exactly, but approximated considering only the interpretations that have more confidence.

When the interpretation depends on the interpretation of many other mappings, there are too many possibilities that difficult reasoning about which one can be wrong. In this case, the agents use a default punishment . This occurs when the positive versions of and are violated, when , and when and a violated constraint depends on a message that sent, since they have no information about how their messages were interpreted by .

Lastly, let us discuss the value of . Choosing , the technique has the effect of subtracting a larger value when the agent is confident in the correctness of the previous interpretation. This is the value that we use.

5 Experimental Evaluation

In this section we present the results of experiments that show how the methods we propose perform experimentally when used by agents with different vocabularies. Before presenting the results, let us discuss the generation of data for experimentation.

Figure 2: Results for a vocabulary of size 10

Due to the lack of existing datasets, we performed all the experimentation with randomly generated data. We created protocols with different sizes of vocabulary and of set of constraints. To generate a protocol from a vocabulary, we randomly chose constraints and added them to the protocol if it remained satisfiable with the new rule. In the implementation, we express constraints over words instead of over messages, so a constraint are valid for any combination of senders. We used bounds in our experimentation to limit the possible constraints and to eventually finish the interactions between agents, that in all cases had the value of the vocabulary size. We created pairs of compatible protocols , with vocabularies and respectively by building a bijective alignment and then using it to translate the constraints in a protocol , obtaining . We used the NuSMV model checker [6] to perform all the necessary satisfiability checks.

To evaluate how agents learn from the experience of interacting, we first need to define a measure of how well an agent knows an alignment. Since agents always choose the possible mapping with highest weight, we can easily extract from the interpretation distribution an alignment that represents the first interpretation choice for each foreign word in the following interaction. This alignment, that we call for agent , is one of the mappings with highest weight for each foreign word. Formally, for , the domain of is such that is defined for all , and . Note that there are multiple possibilities with the same value, takes a random word between them.

To compare this alignment with

, we used the standard precision and recall measures, and their harmonic mean combination, commonly known as

F-score. To use the standard definitions directly, we only need to consider alignments as relations instead of functions: given an alignment , the corresponding relation is the set of pairs such that .

Definition 9

Given two alignments and expressed as relations, the precision of with respect to is the fraction of the mappings in that are also in :

while its recall is the fraction of the mappings in that were found by :

Given an alignment and a reference alignment , the F-score of the alignment is computed as follows:

We performed experiments parametrized with a protocol and vocabulary size. In all experiments, agents are sequentially given pairs of compatible protocols, and after each interaction we measure the F-score of their alignments with respect to . The same protocols can appear repeatedly in the sequence, but we consider that eventually new ones appear always in the interaction, which implies that at some point the meaning of all words is fixed. Our agents also used the NuSMV model checker for both checking satisfiability and finding violated constraints. We fist intended to perform experiments with the same vocabulary sizes that are used for testing in [1], which are . However, the interactions for size were too slow to be able to perform a reasonable amount of repetitions for each technique. This problem is intrinsic to the protocols we use (since agents have to decide if the messages they want to send are possible), and should be taken into account in future work. Since varying the vocabulary size did not provide particularly interesting results, we show here the experiments for a vocabulary of words and four different protocol sizes. We used punishment and rewards parameters of for the simple strategy, which were best in a preliminary test. Each experiment was repeated 10 times.

In a first experiment, we let agents interact 200 times and measured the way they learned the alignment for different protocol sizes, shown in Figure 2. The F-score if computed by averaging the F-score of each agent that participates in the interaction. The curves show that the smart strategy is always better than the simple one, but this difference is more dramatic when the protocols are smaller. This is because there is less information, so using it intelligently makes a great difference. The curves are sharp in the beginning, when agents do not have much information about , and they make many mistakes from which they learn fast. When agents reach a reasonable level of precision and recall, mistakes are less frequent, and therefore the pace of the learning slows considerably. Although this affects the convergence to an F-score of , it also means that, after a certain number of interactions, agents have learn enough to communicate successfully most of the times. The sparse mistakes that they will make in future conversations make them learn more slowly the remaining mappings. Table 3 shows how fast agents reach a reasonable level of F-score, that we chose of based on [10].

6 8 10 12
simple - 187 101 47
smart 139 87 57 33
Table 3: 0.8 convergence

In a second experiment, we studied the performance of our methods when used to repair existing alignments. To this aim, we created alignments with different values of precision and recall with respect to , that we divided into three categories. Low quality alignments had precision and recall 0.2, medium quality 0.5, and high quality 0.8. We only studied alignments with the same value of precision and recall to reduce the number of combinations and have a clear division of quality levels. We made agents with this alignments interact, using the simple and the smart technique. The results can be seen in Figure 3. The number next to each agent class in the reference is the precision and recall of the alignment that was given to that agent. Again, the reasoning strategy performs particularly better when only little information is available; in this case, the repair is much better than when using the simple strategy for alignments of low quality.

Figure 3: Results for different alignment qualities

6 Related Work

As we already mentioned, a large portion of the existent work that tackles the problem of vocabulary alignment for multi-agent communication consists on techniques to let agents discuss and agree on a common vocabulary or alignment. Some of these approaches use argumentation or negotiation techniques [15, 12], performing an offline negotiation that takes place before actual interactions can start. Other approaches, such as the three-level schema developed by van Diggelen [8], discuss the meaning of works on a by-demand basis, that is, only when it is necessary to use them to perform some task.

Other approaches use different kinds of grounding to learn alignments or the meaning of words from interactions. The well established research by Steels [20] considers a situation in which learners share a physical environment, and have the ability of pointing to things to communicate if they are talking about the same things. Goldman et al. [11]

, investigated how agents can learn to communicate in a way that maximises rewards in an environment that can be modelled as a Markov Decision Process. In

[4], the authors study a version of the multiagent, multiarmed bandit problem in which agents can communicate between each other with a common language, but message interpretations are not know. Closer to our approach is the work by Euzenat on cultural alignment repair [9], and particularly the already mentioned work by Atencia and Schorlemmer [1], where the authors consider agents communicating with interaction protocols represented with finite state automata, and use a shared notion of task success as grounding. Our work, instead, considers only the coherence of the interaction as grounding. To the best of our knowledge, the work presented in this paper is the first approach in which agents learn languages dynamically from temporal interaction rules, taking into account only the coherence of the utterances.

A very well studied problem consists on learning different structures from experience, such as grammars [7] or norms [16]. Our approach can be seen as the reverse of these approaches, where some structure is considered shared and used to learn from.

7 Conclusions and Future Work

The techniques we propose allow agents to learn alignments between their vocabularies only by interacting, assuming that their protocols share the same set of models, without requiring any common meta-language or procedure. The assumption of sharing the models can be removed, but the learning will be slower because an extra component of uncertainty is added. Although our methods converge slowly, they use only very general information, and can be easily combined with other methods that provide more information about possible mappings, as we show with the integration of previous alignments. Our agents learn dynamically while interacting and the techniques achieve fast a reasonable level of knowledge of the alignment. Therefore, a possible way of using our techniques would be to go first through a training phase in which agents learn many mappings fast, and then start performing the real interactions, and keep learning to discover the remaining mappings. The technique that uses particular semantics of the protocols, as expected, improves the learning. These techniques can be particularly developed for each kind of protocols. Agents do not need to use the same techniques or even the same logic, as long as they share the models.

There exist many possible directions of research derived form this work. First, we could relax the assumption that agents do not share any meta-language, considering agents that can in some way exchange information about the interaction. For example, considering only that agents can communicate whether they finished a task successfully would make possible to reason about monotonic rules when an interaction ends. An approach like this would relate our work to the one by Santos [15] about dialogues for meaning negotiation. Another possible extension consists in considering agents that prioritize the language learning to performing the task. In this case, agents would have to decide what utterances it is more convenient to make in order to get more information about the alignment. An aspect that should be improved in future work is the performance in terms of runtime per interaction, since it was very slow for larger vocabularies.


  • [1] M. Atencia and M. Schorlemmer. An interaction-based approach to semantic alignment. Journal of Web Semantics, 12-13:131–147, 2012.
  • [2] M. Baldoni, C. Baroglio, and E. Marengo. Behavior-oriented commitment-based protocols. In ECAI 2010 - 19th European Conference on Artificial Intelligence, Lisbon, Portugal, August 16-20, 2010, Proceedings, pages 137–142, 2010.
  • [3] M. Baldoni, C. Baroglio, E. Marengo, and V. Patti. Constitutive and regulative specifications of commitment protocols: A decoupled approach (extended abstract). In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 4143–4147, 2015.
  • [4] S. Barrett, N. Agmon, N. Hazon, S. Kraus, and P. Stone. Communicating with unknown teammates. In T. Schaub, G. Friedrich, and B. O’Sullivan, editors, ECAI 2014 — 21st European Conference on Artificial Intelligence, 18-22 August 2014, Prague, Czech Republic — Including Prestigious Applications of Intelligent Systems (PAIS 2014), volume 263 of Frontiers in Artificial Intelligence and Applications, pages 45–50. IOS Press, 2014.
  • [5] P. Chocron and M. Schorlemmer. Attuning ontology alignments to semantically heterogeneous multi-agent interactions. In ECAI 2016 - 22nd European Conference on Artificial Intelligence, The Hague, The Netherlands, pages 871–879, 2016.
  • [6] A. Cimatti, E. Clarke, E. Giunchiglia, F. Giunchiglia, M. Pistore, M. Roveri, R. Sebastiani, and A. Tacchella. Nusmv 2: An opensource tool for symbolic model checking. In International Conference on Computer Aided Verification, pages 359–364. Springer, 2002.
  • [7] C. de la Higuera. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY, USA, 2010.
  • [8] J. V. Diggelen, R. Beun, F. Dignum, R. M. V. Eijk, and J. J. Meyer. Ontology negotiation: Goals, requirements and implementation. Int. J. Agent-Oriented Softw. Eng., 1(1):63–90, Apr. 2007.
  • [9] J. Euzenat. First Experiments in Cultural Alignment Repair. In Semantic Web: ESWC 2014 Satellite Events, volume 8798, pages 115–130, 2014.
  • [10] J. Euzenat, C. Meilicke, H. Stuckenschmidt, P. Shvaiko, and C. Trojahn. Ontology alignment evaluation initiative: Six years of experience. Journal on Data Semantics XV, pages 158–192, 2011.
  • [11] C. V. Goldman, M. Allen, and S. Zilberstein. Learning to communicate in a decentralized environment. Autonomous Agents and Multi-Agent Systems, 15(1):47–90, Aug. 2007.
  • [12] L. Laera, I. Blacoe, V. A. M. Tamma, T. R. Payne, J. Euzenat, and T. J. M. Bench-Capon. Argumentation over ontology correspondences in MAS. In E. H. Durfee, M. Yokoo, M. N. Huhns, and O. Shehory, editors, 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), Honolulu, Hawaii, USA, May 14-18, 2007, page 228, 2007.
  • [13] M. Montali. Specification and verification of declarative open interaction models: A logic-based approach, volume 56 of Lecture Notes in Business Information Processing. Springer-Verlag New York Inc, 2010.
  • [14] M. Pesic and W. M. P. van der Aalst. A declarative approach for flexible business processes management. In Proceedings of the International Conference on Business Process Management Workshops, BPM, pages 169–180, Berlin, Heidelberg, 2006. Springer-Verlag.
  • [15] G. Santos, V. Tamma, T. R. Payne, and F. Grasso. A dialogue protocol to support meaning negotiation. (extended abstract). In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’16, pages 1367–1368, Richland, SC, 2016. International Foundation for Autonomous Agents and Multiagent Systems.
  • [16] S. Sen and S. Airiau. Emergence of norms through social learning. In IJCAI, volume 1507, page 1512, 2007.
  • [17] G. Shafer. Jeffrey’s rule of conditioning. Philosophy of Science, 48(3):337–362, 1981.
  • [18] N. Silva, G. I. Ipp, and P. Maio. An approach to ontology mapping negotiation. In Proceedings of the Workshop on Integrating Ontologies, pages 54–60, 2005.
  • [19] M. P. Singh. A social semantics for agent communication languages. In Issues in Agent Communication, pages 31–45, London, UK, 2000. Springer-Verlag.
  • [20] L. Steels. The origins of ontologies and communication conventions in multi-agent systems. Autonomous Agents and Multi-Agent Systems, 1(2):169–194, Oct. 1998.