Log In Sign Up

Translating Neuralese

Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel. While these policies are effective for many tasks, interpretation of their induced communication strategies has remained a challenge. Here we propose to interpret agents' messages by translating them. Unlike in typical machine translation problems, we have no parallel data to learn from. Instead we develop a translation model based on the insight that agent messages and natural language strings mean the same thing if they induce the same belief about the world in a listener. We present theoretical guarantees and empirical evidence that our approach preserves both the semantics and pragmatics of messages by ensuring that players communicating through a translation layer do not suffer a substantial loss in reward relative to players with a common language.


page 1

page 7


Emergent Communication with World Models

We introduce Language World Models, a class of language-conditional gene...

Learning to Ground Decentralized Multi-Agent Communication with Contrastive Learning

For communication to happen successfully, a common language is required ...

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

We introduce AI rationalization, an approach for generating explanations...

Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

Learning to communicate through interaction, rather than relying on expl...

Independent and Decentralized Learning in Markov Potential Games

We propose a multi-agent reinforcement learning dynamics, and analyze it...

Automatically Generating Commit Messages from Diffs using Neural Machine Translation

Commit messages are a valuable resource in comprehension of software evo...

CoDraw: Visual Dialog for Collaborative Drawing

In this work, we propose a goal-driven collaborative task that contains ...

Code Repositories


Interpreting neural codes

view repo


Learning Grounded Language via Split Screen Communication Learning via Deep Multi-Agent Reinforcement Learning

view repo

1 Introduction

Several recent papers have described approaches for learning deep communicating policies

(DCPs): decentralized representations of behavior that enable multiple agents to communicate via a differentiable channel that can be formulated as a recurrent neural network. DCPs have been shown to solve a variety of coordination problems, including reference games

Lazaridou et al. (2016b), logic puzzles Foerster et al. (2016), and simple control Sukhbaatar et al. (2016)

. Appealingly, the agents’ communication protocol can be learned via direct backpropagation through the communication channel, avoiding many of the challenging inference problems associated with learning in classical decentralized decision processes

Roth et al. (2005).

Figure 1:

Example interaction between a pair of agents in a deep communicating policy. Both cars are attempting to cross the intersection, but cannot see each other. By exchanging message vectors

, the agents are able to coordinate and avoid a collision. This paper presents an approach for understanding the contents of these message vectors by translating them into natural language.

But analysis of the strategies induced by DCPs has remained a challenge. As an example, Figure 1 depicts a driving game in which two cars, which are unable to see each other, must both cross an intersection without colliding. In order to ensure success, it is clear that the cars must communicate with each other. But a number of successful communication strategies are possible—for example, they might report their exact coordinates at every timestep, or they might simply announce whenever they are entering and leaving the intersection. If these messages were communicated in natural language, it would be straightforward to determine which strategy was being employed. However, DCP agents instead communicate with an automatically induced protocol of unstructured, real-valued recurrent state vectors—an artificial language we might call “neuralese,” which superficially bears little resemblance to natural language, and thus frustrates attempts at direct interpretation.

We propose to understand neuralese messages by translating them. In this work, we present a simple technique for inducing a dictionary that maps between neuralese message vectors and short natural language strings, given only examples of DCP agents interacting with other agents, and humans interacting with other humans. Natural language already provides a rich set of tools for describing beliefs, observations, and plans—our thesis is that these tools provide a useful complement to the visualization and ablation techniques used in previous work on understanding complex models Strobelt et al. (2016); Ribeiro et al. (2016).

While structurally quite similar to the task of machine translation between pairs of human languages, interpretation of neuralese poses a number of novel challenges. First, there is no natural source of parallel data: there are no bilingual “speakers” of both neuralese and natural language. Second, there may not be a direct correspondence between the strategy employed by humans and DCP agents: even if it were constrained to communicate using natural language, an automated agent might choose to produce a different message from humans in a given state. We tackle both of these challenges by appealing to the grounding of messages in gameplay. Our approach is based on one of the core insights in natural language semantics: messages (whether in neuralese or natural language) have similar meanings when they induce similar beliefs about the state of the world.

Based on this intuition, we introduce a translation criterion that matches neuralese messages with natural language strings by minimizing statistical distance in a common representation space of distributions over speaker states. We explore several related questions:

  • What makes a good translation, and under what conditions is translation possible at all? (Section 4)

  • How can we build a model to translate between neuralese and natural language? (Section 5)

  • What kinds of theoretical guarantees can we provide about the behavior of agents communicating via this translation model? (Section 6)

Our translation model and analysis are general, and in fact apply equally to human–computer and human–human translation problems grounded in gameplay. In this paper, we focus our experiments specifically on the problem of interpreting communication in deep policies, and apply our approach to the driving game in Figure 1 and two reference games of the kind shown in Figure 2. We find that this approach outperforms a more conventional machine translation criterion both when attempting to interoperate with neuralese speakers and when predicting their state.

Figure 2: Overview of our approach—best-scoring translations generated for a reference game involving images of birds. The speaking agent’s goal is to send a message that uniquely identifies the bird on the left. From these translations it can be seen that the learned model appears to discriminate based on coarse attributes like size and color.

2 Related work

A variety of approaches for learning deep policies with communication were proposed essentially simultaneously in the past year. We have broadly labeled these as “deep communicating policies”; concrete examples include Lazaridou16Communication, Foerster16Communication, and Sukhbaatar16CommNet. The policy representation we employ in this paper is similar to the latter two of these, although the general framework is agnostic to low-level modeling details and could be straightforwardly applied to other architectures. Analysis of communication strategies in all these papers has been largely ad-hoc, obtained by clustering states from which similar messages are emitted and attempting to manually assign semantics to these clusters. The present work aims at developing tools for performing this analysis automatically.

Most closely related to our approach is that of Lazaridou16LanguageGame, who also develop a model for assigning natural language interpretations to learned messages; however, this approach relies on supervised cluster labels and is targeted specifically towards referring expression games. Here we attempt to develop an approach that can handle general multiagent interactions without assuming a prior discrete structure in space of observations.

The literature on learning decentralized multi-agent policies in general is considerably larger Bernstein et al. (2002); Dibangoye et al. (2016). This includes work focused on communication in multiagent settings Roth et al. (2005) and even communication using natural language messages Vogel et al. (2013b). All of these approaches employ structured communication schemes with manually engineered messaging protocols; these are, in some sense, automatically interpretable, but at the cost of introducing considerable complexity into both training and inference.

Our evaluation in this paper investigates communication strategies that arise in a number of different games, including reference games and an extended-horizon driving game. Communication strategies for reference games were previously explored by Vogel13Grice, Andreas16Pragmatics and Kazemzadeh14ReferIt, and reference games specifically featuring end-to-end communication protocols by Yu16Reinforcer. On the control side, a long line of work considers nonverbal communication strategies in multiagent policies Dragan and Srinivasa (2013).

Another group of related approaches focuses on the development of more general machinery for interpreting deep models in which messages have no explicit semantics. This includes both visualization techniques Zeiler and Fergus (2014); Strobelt et al. (2016), and approaches focused on generating explanations in the form of natural language Hendricks et al. (2016); Vedantam et al. (2017).

3 Problem formulation


Consider a cooperative game with two players and of the form given in Figure 3. At every step of this game, player makes an observation and receives a message from . It then takes an action and sends a message to . (The process is symmetric for .) The distributions and together define a policy which we assume is shared by both players, i.e.  and

. As in a standard Markov decision process, the actions

alter the world state, generating new observations for both players and a reward shared by both.

The distributions and may also be viewed as defining a language: they specify how a speaker will generate messages based on world states, and how a listener will respond to these messages. Our goal in this work is to learn to translate between pairs of languages generated by different policies. Specifically, we assume that we have access to two policies for the same game: a “robot policy” and a “human policy” . We would like to use the representation of , the behavior of which is transparent to human users, in order to understand the behavior of (which is in general an uninterpretable learned model); we will do this by inducing bilingual dictionaries that map message vectors of to natural language strings of and vice-versa.

Figure 3: Schematic representation of communication games. At every timestep , players and make an observation and receive a message , then produce an action and a new message .

Learned agents

Figure 4: Cell implementing a single step of agent communication (compare with Sukhbaatar16CommNet and Foerster16Communication). MLP

denotes a multilayer perceptron;


denotes a gated recurrent unit

Cho et al. (2014). Dashed lines represent recurrent connections.

Our goal is to present tools for interpretation of learned messages that are agnostic to the details of the underlying algorithm for acquiring them. We use a generic DCP model as a basis for the techniques developed in this paper. Here each agent policy is represented as a deep recurrent Q network Hausknecht and Stone (2015). This network is built from communicating cells of the kind depicted in Figure 4. At every timestep, this agent receives three pieces of information: an observation of the current state of the world, the agent’s memory vector from the previous timestep, and a message from the other player. It then produces three outputs: a predicted Q value for every possible action, a new memory vector for the next timestep, and a message to send to the other agent.

Sukhbaatar16CommNet observe that models of this form may be viewed as specifying a single RNN in which weight matrices have a particular block structure. Such models may thus be trained using the standard recurrent Q-learning objective, with communication protocol learned end-to-end.

Human agents

The translation model we develop requires a representation of the distribution over messages employed by human speakers (without assuming that humans and agents produce equivalent messages in equivalent contexts). We model the human message generation process as categorical, and fit a simple multilayer perceptron model to map from observations to words and phrases used during human gameplay.

4 What’s in a translation?

What does it mean for a message to be a “translation” of a message ? In standard machine translation problems, the answer is that is likely to co-occur in parallel data with ; that is, is large. Here we have no parallel data: even if we could observe natural language and neuralese messages produced by agents in the same state, we would have no guarantee that these messages actually served the same function. Our answer must instead appeal to the fact that both natural language and neuralese messages are grounded in a common environment. For a given neuralese message , we will first compute a grounded representation of that message’s meaning; to translate, we find a natural-language message whose meaning is most similar. The key question is then what form this grounded meaning representation should take. The existing literature suggests two broad approaches:

Semantic representation

The meaning of a message is given by its denotations: that is, by the set of world states of which may be felicitously predicated, given the existing context available to a listener. In probabilistic terms, this says that the meaning of a message is represented by the distribution it induces over speaker states. Examples of this approach include Guerin01Denotational and Pasupat16Denotations.

Pragmatic representation

The meaning of a message is given by the behavior it induces in a listener. In probabilistic terms, this says that the meaning of a message is represented by the distribution it induces over actions given the listener’s observation . Examples of this approach include Vogel13Grice and Gauthier16GoalDriven.

These two approaches can give rise to rather different behaviors. Consider the following example:

The top language (in blue) has a unique name for every kind of shape, while the bottom language (in red) only distinguishes between shapes with few sides and shapes with many sides. Now imagine a simple reference game with the following form: player is covertly assigned one of these three shapes as a reference target, and communicates that reference to ; must then pull a lever labeled large or small depending on the size of the target shape. Blue language speakers can achieve perfect success at this game, while red language speakers can succeed at best two out of three times.

How should we translate the blue word hexagon into the red language? The semantic approach suggests that we should translate hexagon as many: while many does not uniquely identify the hexagon, it produces a distribution over shapes that is closest to the truth. The pragmatic approach instead suggests that we should translate hexagon as few, as this is the only message that guarantees that the listener will pull the correct lever large. So in order to produce a correct listener action, the translator might have to “lie” and produce a maximally inaccurate listener belief.

If we were exclusively concerned with building a translation layer that allowed humans and DCP agents to interoperate as effectively as possible, it would be natural to adopt a pragmatic representation strategy. But our goals here are broader: we also want to facilitate understanding, and specifically to help users of learned systems form true beliefs about the systems’ computational processes and representational abstractions. The example above demonstrates that “pragmatically” optimizing directly for task performance can sometimes lead to translations that produce inaccurate beliefs.

We instead build our approach around semantic representations of meaning. By preserving semantics, we allow listeners to reason accurately about the content and interpretation of messages. We might worry that by adopting a semantics-first view, we have given up all guarantees of effective interoperation between humans and agents using a translation layer. Fortunately, this is not so: as we will see in Section 6, it is possible to show that players communicating via a semantic translator perform only boundedly worse (and sometimes better!) than pairs of players with a common language.

5 Translation models

In this section, we build on the intuition that messages should be translated via their semantics to define a concrete translation model—a procedure for constructing a natural language neuralese dictionary given agent and human interactions.

We understand the meaning of a message to be represented by the distribution it induces over speaker states given listener context. We can formalize this by defining the belief distribution for a message and context as:

Here we have modeled the listener as performing a single step of Bayesian inference, using the listener state and the message generation model (by assumption shared between players) to compute the posterior over speaker states. While in general neither humans nor DCP agents compute explicit representations of this posterior, past work has found that both humans and suitably-trained neural networks can be modeled as Bayesian reasoners

Frank et al. (2009); Paige and Wood (2016).

This provides a context-specific representation of belief, but for messages and to have the same semantics, they must induce the same belief over all contexts in which they occur. In our probabilistic formulation, this introduces an outer expectation over contexts, providing a final measure of the quality of a translation from to :


recalling that in this setting


which is zero when the messages and give rise to identical belief distributions and increases as they grow more dissimilar. To translate, we would like to compute and . Intuitively, Equation 1 says that we will measure the quality of a proposed translation by asking the following question: in contexts where is likely to be used, how frequently does induce the same belief about speaker states as ?

While this translation criterion directly encodes the semantic notion of meaning described in Section 4, it is doubly intractable: the KL divergence and outer expectation involve a sum over all observations and respectively; these sums are not in general possible to compute efficiently. To avoid this, we approximate Equation 1 by sampling. We draw a collection of samples from the prior over world states, and then generate for each sample a sequence of distractors from (we assume access to both of these distributions from the problem representation). The KL term in Equation 1 is computed over each true sample and its distractors, which are then normalized and averaged to compute the final score.

given: a phrase inventory
function translate()
function ()
     // sample contexts and distractors
     // compute context weights
     // compute divergences
Algorithm 1 Translating messages

Sampling accounts for the outer in Equation 1 and the inner in Equation 2. The only quantities remaining are of the form and . In the case of neuralese, these are determined by the agent policy . For natural language, we use transcripts of human interactions to fit a model that maps from world states to a distribution over frequent utterances as discussed in Section 3. Details of these model implementations are provided in Appendix B, and the full translation procedure is given in Algorithm 1.

6 Belief and behavior

The translation criterion in the previous section makes no reference to listener actions at all. The shapes example in Section 4 shows that some model performance might be lost under translation. It is thus reasonable to ask whether this translation model of Section 5 can make any guarantees about the effect of translation on behavior. In this section we explore the relationship between belief-preserving translations and the behaviors they produce, by examining the effect of belief accuracy and strategy mismatch on the reward obtained by cooperating agents.

Figure 5: Simplified game representation used for analysis in Section 6. A speaker agent sends a message to a listener agent, which takes a single action and receives a reward.

To facilitate this analysis, we consider a simplified family of communication games with the structure depicted in Figure 5. These games can be viewed as a subset of the family depicted in Figure 3; and consist of two steps: a listener makes an observation and sends a single message to a speaker, which makes its own observation , takes a single action , and receives a reward. We emphasize that the results in this section concern the theoretical properties of idealized games, and are presented to provide intuition about high-level properties of our approach. Section 8 investigates empirical behavior of this approach on real-world tasks where these ideal conditions do not hold.

Our first result is that translations that minimize semantic dissimilarity cause the listener to take near-optimal actions:222Proof is provided in Appendix A.


Proposition 1.

Semantic translations reward rational listeners.Define a rational listener as one that chooses the best action in expectation over the speaker’s state:

for a reward function that depends only on the two observations and the action.333This notion of rationality is a fairly weak one: it permits many suboptimal communication strategies, and requires only that the listener do as well as possible given a fixed speaker—a first-order optimality criterion likely to be satisfied by any richly-parameterized model trained via gradient descent. Now let be a speaker of a language , be a listener of the same language , and be a listener of a different language . Suppose that we wish for and to interact via the translator (so that produces a message , and takes an action ). If tr respects the semantics of , then the bilingual pair and achieves only boundedly worse reward than the monolingual pair and . Specifically, if , then



So as discussed in Section 4, even by committing to a semantic approach to meaning representation, we have still succeeded in (approximately) capturing the nice properties of the pragmatic approach.

Section 4 examined the consequences of a mismatch between the set of primitives available in two languages. In general we would like some measure of our approach’s robustness to the lack of an exact correspondence between two languages. In the case of humans in particular we expect that a variety of different strategies will be employed, many of which will not correspond to the behavior of the learned agent. It is natural to want some assurance that we can identify the DCP’s strategy as long as some human strategy mirrors it. Our second observation is that it is possible to exactly recover a translation of a DCP strategy from a mixture of humans playing different strategies:


Proposition 2.

Semantic translations find hidden correspondences. Consider a fixed robot policy and a set of human policies (recalling from Section 3 that each is defined by distributions and ). Suppose further that the messages employed by these human strategies are disjoint; that is, if , then for all . Now suppose that all for all messages in the support of some and for all . Then every message is translated into a message produced by , and messages from other strategies are ignored.


This observation follows immediately from the definition of , but demonstrates one of the key distinctions between our approach and a conventional machine translation criterion. Maximizing will produce the natural language message most often produced in contexts where is observed, regardless of whether that message is useful or informative. By contrast, minimizing will find the that corresponds most closely to even when is rarely used.

The disjointness condition, while seemingly quite strong, in fact arises naturally in many circumstances—for example, players in the driving game reporting their spatial locations in absolute vs. relative coordinates, or speakers in a color reference game (Figure 6) discriminating based on lightness vs. hue. It is also possible to relax the above condition to require that strategies be only locally disjoint (i.e. with the disjointness condition holding for each fixed ), in which case overlapping human strategies are allowed, and the recovered robot strategy is a context-weighted mixture of these.

7 Evaluation

Figure 6: Tasks used to evaluate the translation model. (a–b) Reference games: both players observe a pair of reference candidates (colors or images); Player is assigned a target (marked with a star), which player must guess based on a message from . (c) Driving game: each car attempts to navigate to its goal (marked with a star). The cars cannot see each other, and must communicate to avoid a collision.

7.1 Tasks

In the remainder of the paper, we evaluate the empirical behavior of our approach to translation. Our evaluation considers two kinds of tasks: reference games and navigation games. In a reference game (e.g. Figure 6a), both players observe a pair of candidate referents. A speaker is assigned a target referent; it must communicate this target to a listener, who then performs a choice action corresponding to its belief about the true target. In this paper we consider two variants on the reference game: a simple color-naming task, and a more complex task involving natural images of birds. For examples of human communication strategies for these tasks, we obtain the XKCD color dataset McMahan and Stone (2015); Monroe et al. (2016) and the Caltech–UCSD Birds dataset Welinder et al. (2010) with accompanying natural language descriptions Reed et al. (2016). We use standard train / validation / test splits for both of these datasets.

The final task we consider is the driving task (Figure 6c) first discussed in the introduction. In this task, two cars, invisible to each other, must each navigate between randomly assigned start and goal positions without colliding. This task takes a number of steps to complete, and potentially involves a much broader range of communication strategies. To obtain human annotations for this task, we recorded both actions and messages generated by pairs of human Amazon Mechanical Turk workers playing the driving game with each other. We collected close to 400 games, with a total of more than 2000 messages exchanged, from which we held out 100 game traces as a test set.

7.2 Metrics

A mechanism for understanding the behavior of a learned model should allow a human user both to correctly infer its beliefs and to successfully interoperate with it; we accordingly report results of both “belief” and “behavior” evaluations.

To support easy reproduction and comparison (and in keeping with standard practice in machine translation), we focus on developing automatic measures of system performance. We use the available training data to develop simulated models of human decisions; by first showing that these models track well with human judgments, we can be confident that their use in evaluations will correlate with human understanding. We employ the following two metrics:

Belief evaluation

This evaluation focuses on the denotational perspective in semantics that motivated the initial development of our model. We have successfully understood the semantics of a message if, after translating , a human listener can form a correct belief about the state in which was produced. We construct a simple state-guessing game where the listener is presented with a translated message and two state observations, and must guess which state the speaker was in when the message was emitted.

When translating from natural language to neuralese, we use the learned agent model to directly guess the hidden state. For neuralese to natural language we must first construct a “model human listener” to map from strings back to state representations; we do this by using the training data to fit a simple regression model that scores (state, sentence) pairs using a bag-of-words sentence representation. We find that our “model human” matches the judgments of real humans 83% of the time on the colors task, 77% of the time on the birds task, and 77% of the time on the driving task. This gives us confidence that the model human gives a reasonably accurate proxy for human interpretation.

Behavior evaluation

This evaluation focuses on the cooperative aspects of interpretability: we measure the extent to which learned models are able to interoperate with each other by way of a translation layer. In the case of reference games, the goal of this semantic evaluation is identical to the goal of the game itself (to identify the hidden state of the speaker), so we perform this additional pragmatic evaluation only for the driving game. We found that the most reliable way to make use of human game traces was to construct a speaker-only model human. The evaluation selects a full game trace from a human player, and replays both the human’s actions and messages exactly (disregarding any incoming messages); the evaluation measures the quality of the natural-language-to-neuralese translator, and the extent to which the learned agent model can accommodate a (real) human given translations of the human’s messages.


We compare our approach to two baselines: a random baseline that chooses a translation of each input uniformly from messages observed during training, and a direct baseline that directly maximizes (by analogy to a conventional machine translation system). This is accomplished by sampling from a DCP speaker in training states labeled with natural language strings.

8 Results

(a)     as speaker R H as listener R 1.00 0.50 random 0.70 direct 0.73 belief (ours) H* 0.50 0.83 0.72 0.86

(b)     as speaker R H as listener R 0.95 0.50 random 0.55 direct 0.60 belief (ours) H* 0.50 0.77 0.57 0.75

Table 1: Evaluation results for reference games. (a) The colors task. (b) The birds task. Whether the model human is in a listener or speaker role, translation based on belief matching outperforms both random and machine translation baselines.
Figure 7: Best-scoring translations generated for color task.

In all below, “R” indicates a DCP agent, “H” indicates a real human, and “H*” indicates a model human player.

Reference games

Results for the two reference games are shown in Table 1. The end-to-end trained model achieves nearly perfect accuracy in both cases, while a model trained to communicate in natural language achieves somewhat lower performance. Regardless of whether the speaker is a DCP and the listener a model human or vice-versa, translation based on the belief-matching criterion in Section 5 achieves the best performance; indeed, when translating neuralese color names to natural language, the listener is able to achieve a slightly higher score than it is natively. This suggests that the automated agent has discovered a more effective strategy than the one demonstrated by humans in the dataset, and that the effectiveness of this strategy is preserved by translation. Example translations from the reference games are depicted in Figure 2 and Figure 7.

as speaker
as listener R 0.85 0.50 random
0.45 direct
0.61 belief (ours)
H* 0.5 0.77
Table 2: Belief evaluation results for the driving game. Driving states are challenging to identify based on messages alone (as evidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the best overall performance in both directions.
R / R H / H R / H
1.93 / 0.71 — / 0.77 1.35 / 0.64 random
1.49 / 0.67 direct
1.54 / 0.67 belief (ours)
Table 3: Behavior evaluation results for the driving game. Scores are presented in the form “reward / completion rate”. While less accurate than either humans or DCPs with a shared language, the models that employ a translation layer obtain higher reward and a greater overall success rate than baselines.

Driving game

Behavior evaluation of the driving game is shown in Table 3, and belief evaluation is shown in Table 2. Translation of messages in the driving game is considerably more challenging than in the reference games, and scores are uniformly lower; however, a clear benefit from the belief-matching model is still visible. Belief matching leads to higher scores on the belief evaluation in both directions, and allows agents to obtain a higher reward on average (though task completion rates remain roughly the same across all agents). Some example translations of driving game messages are shown in Figure 8.

9 Conclusion

We have investigated the problem of interpreting message vectors from deep networks by translating them. After introducing a translation criterion based on matching listener beliefs about speaker states, we presented both theoretical and empirical evidence that this criterion outperforms a conventional machine translation approach at recovering the content of message vectors and facilitating collaboration between humans and learned agents.

Figure 8: Best-scoring translations generated for driving task generated from the given speaker state.

While our evaluation has focused on understanding the behavior of deep communicating policies, the framework proposed in this paper could be much more generally applied. Any encoder–decoder model Sutskever et al. (2014) can be thought of as a kind of communication game played between the encoder and the decoder, so we can analogously imagine computing and translating “beliefs” induced by the encoding to explain what features of the input are being transmitted. The current work has focused on learning a purely categorical model of the translation process, supported by an unstructured inventory of translation candidates, and future work could explore the compositional

structure of messages, and attempt to synthesize novel natural language or neuralese messages from scratch. More broadly, the work here shows that the denotational perspective from formal semantics provides a framework for precisely framing the demands of interpretable machine learning

Wilson et al. (2016), and particularly for ensuring that human users without prior exposure to a learned model are able to interoperate with it, predict its behavior, and diagnose its errors.


JA is supported by a Facebook Graduate Fellowship and a Berkeley AI / Huawei Fellowship. We are grateful to Lisa Anne Hendricks for assistance with the Caltech–UCSD Birds dataset, and to Liang Huang and Sebastian Schuster for useful feedback.


Appendix A Proofs

Proof of Proposition 1

We know that

and that for all translations

Applying Pinsker’s inequality:
and Jensen’s inequality:

The next step relies on the following well-known property of the total variation distance: for distributions and and a function bounded by ,

For convenience we will write

A listener using the speaker’s language expects a reward of

via (*). From the assumption of player rationality:
using (*) again:

So the true reward achieved by a -speaker receiving a translated code is only additively worse than the native -speaker reward:

Appendix B Implementation details

b.1 Agents

Learned agents have the following form:

where is a hidden state, is a message from the other agent, is a distribution over actions, and is an observation of the world. A single hidden layer with 256 units and a nonlinearity is used for the MLP. The GRU hidden state is also of size 256, and the message vector is of size 64.

Agents are trained via interaction with the world as in Hausknecht15DRQN using the adam optimizer Kingma and Ba (2014) and a discount factor of 0.9. The step size was chosen as for reference games and for the driving game. An -greedy exploration strategy is employed, with the exploration parameter for timestep given by:

As in Foerster16Communication, we found it useful to add noise to the communication channel: in this case, isotropic Gaussian noise with mean 0 and standard deviation 0.3. This also helps smooth

when computing the translation criterion.

b.2 Representational models

As discussed in Section 5, the translation criterion is computed based on the quantity . The policy representation above actually defines a distribution , additionally involving the agent’s hidden state from a previous timestep. While in principle it is possible to eliminate the dependence on by introducing an additional sampling step into Algorithm 1, we found that it simplified inference to simply learn an additional model of directly. For simplicity, we treat the term

as constant, those these could be more accurately approximated with a learned density estimator.

This model is trained alongside the learned agent to imitate its decisions, but does not get to observe the recurrent state, like so:

Here the multilayer perceptron has a single hidden layer with nonlinearities and size 128. It is also trained with adam and a step size of 0.0003.

We use exactly the same model and parameters to implement representations of for human speakers, but in this case the vector is taken to be a distribution over messages in the natural language inventory, and the model is trained to maximize the likelihood of labeled human traces.

b.3 Tasks


We use the version of the XKCD dataset prepared by McMahan15Colors. Here the input feature vector is simply the LAB representation of each color, and the message inventory taken to be all unigrams that appear at least five times.


We use the dataset of Welinder10Birds with natural language annotations from Reed16Birds. The model’s input feature representations are a final 256-dimensional hidden feature vector from a compact bilinear pooling model Gao et al. (2016) pre-trained for classification. The message inventory consists of the 50 most frequent bigrams to appear in natural language descriptions; example human traces are generated by for every frequent (bigram, image) pair in the dataset.


Driving data is collected from pairs of human workers on Mechanical Turk. Workers received the following description of the task:

Your goal is to drive the red car onto the red square. Be careful! You’re driving in a thick fog, and there is another car on the road that you cannot see. However, you can talk to the other driver to make sure you both reach your destinations safely.

Players were restricted to messages of 1–3 words, and required to send at least one message per game. Each player was paid $0.25 per game. 382 games were collected with 5 different road layouts, each represented as an 8x8 grid presented to players as in Figure 8. The action space is discrete: players can move forward, back, turn left, turn right, or wait. These were divided into a 282-game training set and 100-game test set. The message inventory consists of all messages sent more than 3 times. Input features consists of indicators on the agent’s current position and orientation, goal position, and map identity. Data is available for download at .