Quantization Games on Social Networks and Language Evolution

05/31/2020 ∙ by Ankur Mani, et al. ∙ University of Minnesota MIT University of Illinois at Urbana-Champaign 0

We consider a strategic network quantizer design setting where agents must balance fidelity in representing their local source distributions against their ability to successfully communicate with other connected agents. We study the problem as a network game and show existence of Nash equilibrium quantizers. For any agent, under Nash equilibrium, the word representing a given partition region is the conditional expectation of the mixture of local and social source probability distributions within the region. Since having knowledge of the original source of information in the network may not be realistic, we show that under certain conditions, the agents need not know the source origin and yet still settle on a Nash equilibrium using only the observed sources. Further, the network may converge to equilibrium through a distributed version of the Lloyd-Max algorithm. In contrast to traditional results in the evolution of language, we find several vocabularies may coexist in the Nash equilibrium, with each individual having exactly one of these vocabularies. The overlap between vocabularies is high for individuals that communicate frequently and have similar local sources. Finally, we argue that error in translation along a chain of communication does not grow if and only if the chain consists of agents with shared vocabulary. Numerical results are given.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

When representing information, resource constraints often necessitate the quantization of real-valued signals into a finite codebook [3, 4, 5]. For example, people may use a discrete vocabulary to internally represent the external world due to cognitive constraints [6]. In the traditional approach for performing such quantization, there is a single source of information to be represented with high fidelity. Indeed, the apocryphal story of the Eskimoan language having a large number of words for snow points to the general principle of focal vocabularies: specialized sets of distinctions particularly important for a particular focus of experience. Adapting language to match local statistics allows a person to represent her environment with greater fidelity.

In network settings where several agents interact to represent their local information and also communicate information with each other, however, strategic effects arise in the design of quantizers. The confusio linguarum—the fragmentation of languages—described in biblical stories, points to the need for shared vocabularies to enable coordination, collaboration, and exchange in social groups whether to build a city or conduct business. There needs to be a shared understanding of what words and phrases mean, otherwise groups may become bogged down in ambiguous interpretations of even the most basic of concepts. Indeed, linguistic distance is associated with less international trade [7].

Here we study how agents, embedded in a social network, choose a vocabulary to balance interpretation loss in communication with others in a social group against loss in individualized representation. The formalism we develop to study this problem introduces a conflict of interest among agents in the choice of their quantization codebooks and introduces a novel quantization network game. Our formulation is quite different from other work in network quantization [8] in that competition among agents is critical and there is no centralized designer of codes, but some local optimality conditions that emerge are similar. A quantization game for only two agents was developed and studied in [9, 10, 11, 12], where Voronoi partitions were shown to have equilibrium properties; like our approach, the solution concept is based on Nash equilibrium. We are only aware of one prior work on game-theoretic quantization with more than two agents [13], which was concerned with group decision-making in a parallel topology.

Language formation games have also been studied for two agents in the presence of noisy communication channels [14, 15]. The central results establish generic conditions on when communication may be successful. These results can be regarded as contributing to an understanding of signaling games and strategic communication that have been developed in the economics literature (see [16] and references thereto). We also consider some level of communication noise.

Besides developing a novel mathematical question in quantizer design with strategic interaction among agents, our work provides insight into the evolution of language [17, 18, 19, 20]. Most prior theoretical results point to convergence of language to a single shared vocabulary. In contrast we find several vocabularies may coexist under Nash equilibrium under a variety of settings. Overlap between vocabularies is high for individuals who communicate frequently and have similar natural environments, as compared to the agents (like the Eskimos) who communicate less frequently and have dissimilar local environments. This is consistent with the current state of human language where several languages co-exist [21], with the nature of historical language evolution [22], and with synthetic experimental results [23].

In engineering, there is growing interest in studying the evolution of language for applications in robotics [24]

. One approach is through agent-based models, where agents engage in routinized turn-taking interactions to develop a common language. Although agents may be software that operate in virtual worlds, they are primarily physical robotic agents that interact with each other in a real world as experienced through a sensory-motor system. Some classical findings are the evolutionary emergence of perceptually grounded category lexicons, such as colors

[25]. In describing how color categorization evolves, three main constraint types are described: constraints from embodiment in the sense of how the sensing apparatus affects what is perceived; constraints from the environment in the sense of the statistical structure of the environment; and constraints from culture, where collective decisions are made by a population.

The formalism we propose mathematizes these qualitative notions and finds best schemes for categorical communication among agents, whether human or robot.

In our model, agents observe signals in their physical environment as well as in their social environment; their codebooks are grounded in their physical and social experiences [26, 27]. Signals in an agent’s physical environment are randomly generated from a local continuous source distribution, whereas signals in an agent’s social environment are received from peers in the network. Since agents have finite vocabulary, there are a finite number of possible signals in the social environment for each agent. Pairs of peers in the network may have differing frequency of communication.

Each agent chooses a vocabulary such that distortion due to quantization of signals in her physical environment (focal vocabulary) and social environment (shared vocabulary) is minimized. We characterize the equilibrium in which each agent draws signals from a mixture of the distributions from her physical and social environments, choosing a vocabulary to minimize loss for this mixture. We find that for any agent, under Nash equilibrium, the word that represents a given partition region is the conditional expectation of the mixture probability distribution within the region. In this sense, just a local view on the network is sufficient to develop a good quantization scheme.

Further characterization is given in the case of cycle-free communication networks among agents. Since the network looks like a forest in this case, for any tree in the forest, the agents may sequentially optimize their vocabularies from root to leaves and still converge to Nash equilibrium. We provide a more general result that says that the agents will converge to the equilibrium irrespective of the sequence in which they optimize their vocabularies and may even use Lloyd-Max iterations instead of full optimization.

Even when individuals have the same codebook, each individual in the social network may have slightly different partition regions for the words in the codebook. This provides an explanation for why the original meaning is often lost in translation when communication happens through a long chain of individuals, see Fig. 1. We find that the error in translation along a chain of communication does not grow if and only if the chain consists of agents with shared vocabulary.

Fig. 1: A schematic depiction of the loss in translation phenomenon through a chain of communication, as the word small ends up being interpreted as grand.

Further characterization of the Nash equilibrium shows agents in the network may converge to equilibrium through a distributed Lloyd-Max algorithm [28, 29]. Dynamics under the distributed Lloyd-Max algorithm have relations to prior investigations of language evolution [30], but we consider the balance of focal and shared vocabularies rather than seeing the convergence to shared vocabularies. Numerical examples are given.

The present paper expands on the first presentation of this work [1] by generalizing technical assumptions and clarifying the proofs of results. It also formalizes results on translation chains and includes numerical results to provide insight.

Ii Problem Statement

Consider a set of agents. Each agent

observes signals in an uncertain environment. The agents either receive signals directly from their own physical environment or indirectly from other agents via their social environments. The random vector

with the constraint that exactly one element of is captures the source of the signal for agent . If the signal is coming from agent then the th element, while other elements are zero. On the other hand if the agent receives the signal from its physical environment then while the rest of the elements are zero. The relative frequency of communication between agents is captured by the stochastic communication matrix , where is the relative frequency with which agent receives a signal from agent .111The matrix may be sparse since agents often communicate only with a small set of neighbors due to limited time and the desire to avoid information overload [31], but our results do not require sparsity.

The physical environment of any agent

may be unique and is represented by a random variable

that takes values in the alphabet and has a continuous probability distribution with density function . The true environment of agent is a mixture of its physical environment and social environment and is represented by the random variable

The density function of is a mixture that satisfies the following:


for all . One might wonder why signals from different sources are not handled separately, e.g. with individualized decoders, but this would be taxing for agents with bounded ability.

Since agents have bounded ability, they operate with a vocabulary of at most words. An agent maps the observed signals in her environment onto a vocabulary ( for all ) through her word map . The agent’s decision consists of the chosen vocabulary and the word map, i.e. a reproduction alphabet and a quantization strategy. Note that the choice of words is in the closed set rather than the open set ; this is purely for technical reasons and as we will show later, rational agents will never choose or as a word.

Agents transmit to other agents using words from their own vocabulary. We assume that the communication is noisy and the transmitting agent’s words are distorted by a zero-mean additive noise with absolutely continuous distribution.

Since the transmitter and receiver may have different vocabularies, the receiver maps the transmitter’s words into her own vocabulary through her word map. One might wonder how the receiver can do the word mapping, since in typical studies of quantization systems, abstract indices are transmitted. Here we think of representation alphabets that are subsets of the real line, and so a real-valued letter from the transmitter’s representation alphabet is passed through the mapping to produce a letter in the receiver’s representation alphabet, cf. Fig. 1 for a depiction. This seems reasonable for communicating concepts like distance, size, degree, concentration, color, time duration, and temperature [10], which motivate our study.

The communicated signals and physical observations together form the observed environment with density function for each agent , see Fig. 2. Given and , where is a zero mean communication noise. The density of the observed signal given and is . This has support on and is continuous with mean

and standard deviation

. Thus observed signals in the social environments of agents are noisy versions of true signals (corrupted by both quantization and additive noise). These observed signals depend on the path of communication to the receiver from the agent that originally observed the signal in her physical environment, see Fig. 3. Note that since the observed social signal depends on the path of communication from the agent that observed the physical signal to the agent that finally received the signal, the same true signal may be mapped onto more than one word depending on the path. The density is also a mixture. The probability that agent uses the word is

and therefore the density of the observed signal is

Fig. 2: A diagram of how signals are modified as they flow through the network.
Fig. 3: A depiction of how signals take paths through the network.

Since the vocabulary is uniquely identified by , the strategy of an agent is simply her choice of mapping . The strategy profile .

The agent’s loss due to distortion is measured using , where is strictly convex and coercive in for all , and symmetric around the true signal . Therefore, the distortion is minimized when and strictly increases as goes farther from . For ease of exposition in the sequel, we assume that the distortion is quadratic, i.e. . However, all our results hold for any strictly convex, coercive, and symmetric distortion measure, e.g. Bregman divergences [32].

The loss of risk-neutral agents under the strategy profile is the expected distortion in the representation of the true signal:


Notice this expected distortion depends on the quantizers of all agents, since is present in the expression. The problem of the risk-neutral agent is to choose that minimizes the above loss. A strategy profile is a Nash equilibrium if

In this paper, we restrict attention to equilibria in regular quantizers, where partition regions are intervals and words lie within their partition regions.

Definition 1.

A quantizer for agent is defined to be regular if letters in its representation alphabet have corresponding partition cells that are each intervals

such that for all

We will only consider strategy profiles consisting of regular quantizers for each agent. We will show that such an equilibrium in regular quantizers exists and therefore it is reasonable to study equilibria in regular quantizers and their properties.

We note that the loss of any agent under a strategy profile has three components.

  • The first component, expected quantization loss is due to the quantization by agent and is the expected error in representation of the observed signal:

  • The second component, the expected communication loss is due to the noise introduced by the communication of words in the network and is the expected error in the observed signal:

  • The third component is due to correlation between quantization loss and communication loss. In the well-behaved communication networks we consider (see Assumption 3), this component will be much smaller and in most cases absent. Therefore, we will ignore this component later. This is much like mixed distortion in joint source-channel coding that is often zero [33].

Having specified the players, their possible strategies, and their interlinked payoffs, in the next section we discuss how agents choose strategies. We also make further technical assumptions.

Assumption 1.

The semi-elasticity of , i.e.  is decreasing in for all .

This assumption is equivalent to requiring the density function be log-concave and is widely used in quantization theory to establish uniqueness of locally optimal quantizers [34]

. Many beta distributions and most thin-tailed distributions such as those in the exponential family satisfy this condition.

Assumption 2.

The semi-elasticity of and are decreasing for all choices of strategy profiles , i.e.  and are decreasing in for all .

This assumption holds when the communication matrix is diagonal dominant222Indeed, it is common for people to spend some time interacting with their environment and some time in communication with others [35], and moreover to have several conversation partners with whom they speak regularly [36].

and the agents with high communication probability have similar physical environments. These assumptions ensures that the loss function is continuous as well as the best response

of agent to the strategies of other agents is unique in the strategies of other agents.

Iii Main Results

The first step in characterizing quantization strategies is to show the existence of a Nash equilibrium in regular quantizers, using continuity and fixed point arguments.

Proposition 1.

There exists a pure strategy Nash equilibrium in which all agents have regular quantization strategies.


We will first show that the loss is continuous in the strategy profile. For each agent and , we define as the probability that agent observes a signal that travelled through a communication path with cycles containing the agent . Therefore, . The loss can be represented as:

When only agent changes her vocabulary within an infinitesimal ball of radius , the change in loss for agent is also infinitesimal. To observe this, denote the new mapping for as and the new vocabulary for as .

The change in expected distortion of signals that come to for the first time is:

The first term of order is introduced to all signals due to the shift of the words and to the physical environment signals of agent , at the boundaries of the partitions that move to the neighboring partitions. The second term is the probability that the signals in the physical environment are mapped onto neighboring partitions. Here the error in the representation of any such signal is bounded by . Clearly, the above error approaches as approaches .

The probability of not introducing changes in error due to translation when the signal travels through one cycle containing is:


which approaches as . For any , the contribution to the expected distortion of the signals that come to through the path with at most cycles containing is

where the first term is for signals in which cycles do not introduce any error due to translation along the path and the second and third terms form the bound on the error introduced due to translation in the cycles. The total change in the loss for agent is

which approaches as .

Therefore the loss function of is continuous in its own strategy. Similarly the loss function of all other agents is continuous in the strategy of . Combining arguments for the finite network, loss functions of all agents are continuous in strategy profile . Assumptions 1 and 2 imply that the best response exists and is unique [34].

Therefore following the theorem of the maximum [37], the best response function , where for any , is non-empty, and continuous. Therefore, following Brouwer’s fixed point theorem [38], a fixed point of the best response function exists.

We also need to show no words belong to for any fixed point of the best response correspondence for any . This follows since words do not lie on boundaries of local minima partitions [28, p. 355]. Therefore there exists a Nash equilibrium consisting of regular quantization strategies for all agents. ∎

For the sequel, we make an assumption that eliminates the correlation between communication loss and quantization loss. In particular, we restrict the set of rational quantizers or the set of quantizers that are not strictly dominated to a well-behaved set. We first consider the optimal quantizers, of all agents in the absence of the communication network, i.e.,

Assumption 3.

We assume the following about the optimal quantizers and the best responses in the network. There exists such that

  • for any two agents, , with and any , the words of under the quantizer are sufficiently far away from the partition boundaries of in , i.e.,

  • for any agent, , and any the best response of the agents given any strategy of other agents has limited perturbation from , i.e.,

  • the support of the additive noise in the communication for all agents and all words is less than , i.e.  almost surely.

These assumptions ensure the zero probability boundary condition [28, p. 355] is satisfied and that the interaction between quantization loss and communication loss is zero. These assumptions imply that for any agent , any quantizer with

is strictly rationally dominated and the agent will never choose . Therefore, in the sequel we will restrict to the quantization strategies for agent . This set of quantization strategies are stable with respect to mapping of signals in social environment or are socially stable, i.e. for any two agents with and any two indices, ,


for all quantization strategies and . Under these assumptions, we now characterize the equilibrium. We first show that the best response functions for all agents satisfy the centroid condition with respect to the true signals in the environment.

Lemma 1.

A quantizer is the local minimum of the loss function for agent , for fixed strategies of all other agents, if and only if it satisfies the centroid condition: .


Assumptions 1 and 2 imply that there is a unique local minimum of the loss function of agent . Assumption 3 also implies that all quantizers in are socially stable, see (6). Then since the change in the error is due to the distortion of the agent’s own signals, the local centroid condition is satisfied. ∎

The immediate consequence of the lemma is that the the centroid conditions are also satisfied in the equilibrium.

Theorem 1.

In any Nash equilibrium, the quantizers of all agents satisfy the centroid condition with respect to the distribution of the signals in the true environment.


From Lemma 1, all local minima of the loss functions and hence the best responses will satisfy the centroid condition. Therefore the Nash equilibrium satisfies it too because each agent’s strategy in the Nash equilibrium is the best response to the strategies of all other agents. ∎

Using Assumption 3, we now show that the centroid conditions are also satisfied with respect to the observed signals in the equilibrium.

Corollary 1.

In any Nash equilibrium, the quantizers of all agents satisfy the centroid condition with respect to the distribution of the signals in the observed environment.


Pick any Nash equilibrium . Following Theorem 1, , for any agent . Therefore,

(following Theorem 1)

This completes the proof. ∎

This result implies designing for the objective and for the objective

could be equivalent, thereby paralleling traditional results in remote source coding where there is separation between estimation and quantization

[39]. We now show that the two objectives are indeed equivalent in cycle-free networks.

Theorem 2.

When there are no cycles in the communication network, then a strategy profile is a Nash equilibrium if and only if the quantizers minimize the expected distortion in the representation of the observed signal for each agent given the quantizers for all other agents.


When there are no cycles in the communication network, then the observed environments of all agents that send messages to any agent are independent of ’s quantizer. The communication frequency matrix imposes a directed acyclic network where the direction of the edges are from transmitters to receivers. In this case,

Assume that is a Nash equilibrium. Then by Theorem 1,


Since the second term is independent of agent ’s action, must minimize because it minimizes for .

To prove the other side, assume a strategy profile is such that . Without loss of generality, we divide the set of agents in the network as follows. Let the agents be the ones who only observe their physical environment, i.e.  for all and all . Let be the agents who observe social signals only from the agents in and so on. We say that agents are upstream from agents . Then for any agent, , and is the best response of to and . Now, extending the argument to agents in , we find that for any agent , will be a best response of to and

Continuing this way, the same follows for all agents and hence, is a Nash equilibrium. ∎

This result holds when there are no loops in the network and therefore agents do not reflect messages back to the transmitting agents; it remains to prove the equivalent result for loopy graphs, where intricacies are reminiscent of analyzing loopy belief propagation [40]. (See also numerical results in Sec. IV.)

Iii-a Myopic Dynamics: Distributed Lloyd-Max Algorithm

Having established equivalence between quantizing based on observed signals rather than true signals, let us see whether a dynamic process based only on observed signals can lead to equilibrium strategies. Indeed, sequential application of the Lloyd-Max algorithm (itself iterative) may lead to Nash equilibrium for the original game.

Corollary 2.

Assume there are no cycles in the communication network. If all agents cyclically use the Lloyd-Max algorithm to minimize the expected distortion in the representation of the observed signal as a response to the quantizers of other agents, then there exist a non-trivial set of initial conditions for which this myopic dynamics will converge to Nash equilibrium.


Since the network is acyclic, then if the Lloyd-Max algorithm converges to the global minima every time, the myopic updating of individual strategies of agents as defined in the construction in Theorem 2 will always converge to a strategy that minimizes expected distortion in the observed signal in the first iteration. Thence agents in will converge to strategies that minimize their expected error in their observed signals as a response to the strategies of all upstream agents in the second iteration. Following this, in finite number of iterations, all agents will converge to the strategies that minimize their expected error in their observed signals as a response to the strategies of all upstream agents in the second iteration. Then the result follows from Theorem 2.

Applying a deterministic annealing step [41] allows an initialization of the Lloyd-Max algorithm so that it is guaranteed to converge to the global minimum for any source distribution. (Computationally simpler approaches to initialization are also often effective [42] but do not provide theoretical guarantees.) ∎

Sec. IV shows numerical examples for these dynamics.

Iii-B Characterizing the Equilibrium Vocabularies

Omitting detailed derivation, let us describe equilibrium vocabularies. Contrary to extant results in the evolution of language that point to the convergence of language to a single shared vocabulary [17], we find that under different settings several vocabularies may coexist in equilibrium. The overlap between vocabularies is high for individuals who communicate more frequently and have similar natural environments as compared to the agents who communicate less frequently and have dissimilar local environments. Balance must be achieved between the error in local representation and social communication.

The loss in translation phenomenon that arises in long chains of communication, recall Fig. 1, is often troubling since this can lead to large distortions. Further, the loss could be different for different chains of communication. As such, equilibria that lead to path-dependent translation loss could be undesirable. We now characterize the equilibria that introduce path-dependent translation loss and manifest the phenomenon of loss in translation chains. We say that a set of agents have a shared vocabulary if the intersection of individual partition regions have exactly one word for each agent in , i.e.,

The loss in translation along a chain of communication does not grow if and only if the chain consists of agents with shared vocabulary.

Theorem 3.

For any set of agents , there is no path-dependent translation loss and the loss in translation is bounded by the maximum partition size for all communication chains if and only if all agents in have a shared vocabulary.


The example in Fig. 1 provides a counterexample: a set of agents that do not share a vocabulary and therefore the loss in translation is path-dependent and growing in the length of the communication chain. We note that if John had communicated directly to Jean then Jean would have interpreted moyen instead of grand and had a smaller translation loss. Such examples can be created for any set of agents that do not have a shared vocabulary, demonstrating that in the absence of shared vocabulary, there is path-dependent translation loss. For the other side, we consider a set of agents with shared vocabulary. This implies that for any communication chain between , a signal observed by that is mapped onto a word by agent is also mapped onto the word for any agent along the communication chain from to and is mapped onto the word by agent . This implies that loss in translation along the communication chain is . By the arbitrary choice of the communication chain, the the loss in translation along all communication chains from to is . By arbitrary choice of the words, this is true for all words and correspondingly the signals represented by those words.

Iv Numerical Examples

We had described a cyclic and distributed Lloyd-Max algorithm in Corollary 2. In this section, we demonstrate numerical experiments for an implementation of the algorithm. These experiments include loopy networks, and we see convergence even there.

Consider a set of five agents that have the communication matrix

whose sparsity pattern, agent labels, and agent physical environments are depicted in Fig. 4

. Clearly the network is loopy rather than tree-structured, and agents communicate with some peers more than others. The physical environments for the agents are all governed by beta distributions with probability density function:

where is the beta function and each agent having different parameters , as given in Fig. 4. One can directly verify that all five sets of parameters yield log-concave density functions, cf. [43]. Notice that some pairs of physical environments are more similar to one another than others.

Fig. 4: An example network of five agents with communication matrix having sparsity pattern as depicted on the left and local physical environment statistics with beta distribution pdfs with parameters as depicted on the right.

We require each agent to use a quantizer with levels. The initial quantizer design for each agent is performed using its respective physical environment alone, with the Lloyd-Max algorithm. Since these beta distributions are log-concave, the Lloyd-Max algorithm will find the global optimum for any initialization [34]. In cycling through each , the next step is to design the agent’s quantizer based on the current evaluation of the mixed physical and social density , where the mixing matrix is used to combine the physical environment with the current social environment for (which is initially just ). To initialize the Lloyd-Max algorithm, we use the quantizer design from the previous time step, which should already be close to the optimal quantizer.

This procedure continues cyclically and iteratively with quantizer designs based on new , which are functions of the fixed and , but updated . If quantizers do not change too much between a given iteration and the next, we terminate the design and declare convergence.

Fig. 5 shows the quantizers designed in the first four iterations of the algorithm. For subsequent iterations, the designed quantizers are not easily distinguished visually, since convergence happens quite quickly.

As one can see, both the physical and the social environments exert an influence on quantizer design, and so the agents’ quantizers are more similar to one another in equilibrium than without social considerations. To see this quantitatively, for each of the pairs of agents we plot the mean-squared distance between the representation points of the quantizers as a function of the Hellinger distance between the physical source distributions. The Hellinger distance between beta distributions is given by:

Fig. 6(a) shows the comparison for quantizers designed in iteration 1, using only the physical environment whereas Fig. 6(b) shows the same comparison in iteration 20 after reaching equilibrium. One can observe that the similarity of physical environments is highy predictive in the first setting (validating the measure), but much less so in equilibrium. One can also observe the vertical scale is very different in the two subplots: the quantizers are much more similar in equilibrium. To specifically see the influence of communication, we can see e.g. how agent 1 pulls the quantizer for agent 5 to be finer at larger values in the unit interval, where it has greater probability mass (though it gets pulled to the center by other agents). Notice that agent 1 is the only agent sending messages to agent 5 in the example. The right shift in representation points is shown in Tab. I.

Agent 5
right shift
TABLE I: Social Influence on Quantizer
Fig. 5: Iterative quantizer design for the example from Fig. 4. The black curves indicate the unchanging physical environment statistics, the black vertical stems indicate the changing social context (scaled by 10), the blue lines indicate quantizer representation points, and the red lines indicate quantizer cell boundaries. Plots (a)–(d) correspond to iterations 1–4, for each of the five agents.
Fig. 6: Mean-sequare distance between quantizer representation points as a function of the Hellinger distance between physical environment distributions for all pairs of agents in Fig. 4. (a) Quantizer design only considers physical environments. (b) Quantizer design in equilibrium.

V Conclusion

Symbolic communication requires establishing a vocabulary with a clear association between the external world and its representation. Unlike traditional biological evolution where organisms become adapted to their local niches (and unlike traditional quantization theory where codebooks are adapted to local source distributions), language evolution is necessarily a social phenomenon, since without social interaction there is no need for shared vocabularies. In this work, we have argued that vocabularies evolve to balance individual concerns and social exchange.

As far as we know, we have put forth a first general game-theoretic formulation of quantization theory. By thinking about the interaction among several connected agents, several novel quantization problems beyond the one we studied are suggested.

Going forward, we are interested in studying how changes in the communication frequencies between peers lead to changes in vocabularies and how the birth and death of agents in the network impact vocabulary evolution. This will help understand the evolution and future of human languages in a more connected world, and at a smaller scale, the evolution of vocabularies in collaborative tagging on social media platforms such as del.icio.us [44, 45]. We are also interested in settings where agents are organizations rather than individuals, and in particular how to ensure vocabularies evolve to yield clear, concise communication in cross-enterprise collaborations [46, 47]. Finally, applications in language formation for robotic teams and even human-robot collaboration networks [48]

, as well as related questions in unsupervised learning, are of interest.


  • [1] A. Mani, L. R. Varshney, and A. Pentland, “Quantization games on networks,” in Proc. IEEE Data Compression Conf. (DCC 2013), Mar. 2013, pp. 291–300.
  • [2] S. E. Page, The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies.   Princeton: Princeton University Press, 2007.
  • [3] M. T. Orchard and C. A. Bouman, “Color quantization of images,” IEEE Trans. Signal Process., vol. 39, no. 12, pp. 2677–2690, Dec. 1991.
  • [4] K. Zeger, J. Vaisley, and A. Gersho, “Globally optimal vector quantizer design by stochastic relaxation,” IEEE Trans. Signal Process., vol. 40, no. 2, pp. 310–322, Feb. 1992.
  • [5] N. Shlezinger, Y. C. Eldar, and M. R. D. Rodrigues, “Hardware-limited task-based quantization,” IEEE Trans. Signal Process., vol. 67, no. 20, pp. 5223–5238, Oct. 2019.
  • [6] J. Z. Sun, G. I. Wang, V. K. Goyal, and L. R. Varshney, “A framework for Bayesian optimality of psychophyscal laws,” J. Math. Psychol., vol. 56, no. 6, pp. 495–501, Dec. 2012.
  • [7] J. Lohmann, “Do language barriers affect trade?” Econ. Lett., vol. 110, no. 2, pp. 159–162, Feb. 2011.
  • [8] M. Fleming, Q. Zhao, and M. Effros, “Network vector quantization,” IEEE Trans. Inf. Theory, vol. 50, pp. 1584–1604, Aug. 2004.
  • [9] G. Jäger, L. P. Metzger, and F. Riedel, “Voronoi languages: Equilibria in cheap-talk games with high-dimensional types and few signals,” Games Econ. Behav., vol. 73, no. 2, pp. 517–537, Nov. 2011.
  • [10] C. O’Connor, “Evolving perceptual categories,” Philosophy of Science, vol. 81, no. 5, pp. 840–851, Dec. 2014.
  • [11] M. Franke and E. O. Wagner, “Game theory and the evolution of meaning,” Language and Linguistics Compass, vol. 8, no. 9, pp. 359–372, Sep. 2014.
  • [12] M. LiCalzi and N. Maagli, “Bargaining over a common categorisation,” Synthese, vol. 193, no. 3, pp. 705–723, Mar. 2016.
  • [13]

    J. B. Rhim, L. R. Varshney, and V. K. Goyal, “Conflict in distributed hypothesis testing with quantized prior probabilities,” in

    Proc. IEEE Data Compression Conf. (DCC 2011), Mar. 2011, pp. 313–322.
  • [14] B. Touri and C. Langbort, “Language evolution in a noisy environment,” in Proc. Am. Contr. Conf. (ACC 2013), Jun. 2013, pp. 1938–1943.
  • [15] P. Hernández and B. von Stengel, “Nash codes for noisy channels,” Oper. Res., vol. 62, no. 6, pp. 1221–1235, Nov.-Dec. 2014.
  • [16] V. P. Crawford and J. Sobel, “Strategic information transmission,” Econometrica, vol. 50, no. 6, pp. 1431–1451, Nov. 1982.
  • [17] M. A. Nowak and D. C. Krakauer, “The evolution of language,” Proc. Natl. Acad. Sci. U.S.A., vol. 96, pp. 8028–8033, Jul. 1999.
  • [18] M. A. Nowak, N. L. Komarova, and P. Niyogi, “Computational and evolutionary aspects of language,” Nature, vol. 417, pp. 611–617, Jun. 2002.
  • [19] P. Niyogi, The Computational Nature of Language Learning and Evolution.   Cambridge, MA: MIT Press, 2006.
  • [20] J. Crèmer, L. Garciano, and A. Prat, “Language and the theory of the firm,” Quart. J. Econ., vol. 22, pp. 373–407, Feb. 2007.
  • [21] D. Nettle, Linguistic Diversity.   Oxford: Oxford University Press, 1999.
  • [22] M. Pagel, Q. D. Atkinson, A. S. Calude, and A. Meadea, “Ultraconserved words point to deep language ancestry across Eurasia,” Proc. Natl. Acad. Sci. U.S.A., vol. 110, no. 21, pp. 8471–8476, May 2013.
  • [23] L. Graesser, K. Cho, and D. Kiela, “Emergent linguistic phenomena in multi-agent communication games,” in

    Proc. 2019 Conf. Empirical Meth. Natural Language Process. (EMNLP)

    , Nov. 2019, pp. 3700–3710.
  • [24] L. Steels, “Modeling the cultural evolution of language,” Phys. Life Rev., vol. 8, no. 4, pp. 339–356, Dec. 2011.
  • [25] L. Steels and T. Belpaeme, “Coordinating perceptually grounded categories through language: A case study for colour,” Behav. Brain Sci., vol. 28, no. 4, pp. 469–489, Aug. 2005.
  • [26] D. K. Roy and A. P. Pentland, “Learning words from sights and sounds: A computational model,” Cogn. Sci., vol. 26, pp. 113–146, Jan.-Feb. 2002.
  • [27] D. Roy, “Grounded spoken language acquisition: Experiments in word learning,” IEEE Trans. Multimedia, vol. 5, pp. 197–209, Jun. 2003.
  • [28] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.   Boston: Kluwer Academic Publishers, 1992.
  • [29] J. Max, “Quantizing for minimum distortion,” IRE Trans. Inf. Theory, vol. IT-6, no. 1, pp. 7–12, Mar. 1960.
  • [30] A. Baronchelli, M. Felici, V. Loreto, E. Caglioti, and L. Steels, “Sharp transition towards shared vocabularies in multi-agent systems,” J. Stat. Mech., p. P06014, Jun. 2006.
  • [31] G. Miritello, E. Moro, R. Lara, R. Martínez-López, J. Belchamber, S. G. B. Roberts, and R. I. M. Dunbar, “Time as a limited resource: Communication strategy in mobile phone networks,” Soc. Networks, vol. 35, no. 1, pp. 89–95, Jan. 2013.
  • [32] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” J. Mach. Learn. Res., vol. 6, pp. 1705–1749, Oct. 2005.
  • [33] P. Knagenhjelm and E. Agrell, “The Hadamard transform – a tool for index assignment,” IEEE Trans. Inf. Theory, vol. 42, no. 4, pp. 1139–1151, Jul. 1996.
  • [34] A. V. Trushkin, “Sufficient conditions for uniqueness of a locally optimal quantizer for a class of convex error weighting functions,” IEEE Trans. Inf. Theory, vol. IT-28, pp. 187–198, Mar. 1982.
  • [35] M. Chui, J. Manyika, J. Bughin, R. Dobbs, C. Roxburgh, H. Sarrazin, G. Sands, and M. Westergren, “The social economy: Unlocking value and productivity through social technologies,” McKinsey Global Institute, Tech. Rep., Jul. 2012.
  • [36] N. Eagle and A. Pentland, “Reality mining: sensing complex social systems,” Personal and Ubiquitous Computing, vol. 10, no. 4, pp. 255–268, May 2006.
  • [37] C. Berge, Topological Spaces.   New York: Macmillan, 1963.
  • [38] L. E. J. Brouwer, “Über abbildung von mannigfaltigkeiten,” Mathematische Annalen, vol. 71, pp. 97–115, Mar. 1911.
  • [39] J. K. Wolf and J. Ziv, “Transmission of noisy information to a noisy receiver with minimum distortion,” IEEE Trans. Inf. Theory, vol. IT-16, pp. 406–411, Jul. 1970.
  • [40] A. T. Ihler, J. W. Fisher, III, and A. S. Willsky, “Loopy belief propagation: Convergence and effects of message errors,” J. Mach. Learn. Res., vol. 6, pp. 905–936, May 2005.
  • [41] K. Rose, E. Gurewitz, and G. C. Fox, “Vector quantization by deterministic annealing,” IEEE Trans. Inf. Theory, vol. 38, no. 4, pp. 1249–1257, Jul. 1992.
  • [42] X. Wu, “On initialization of Max’s algorithm for optimum quantization,” IEEE Trans. Commun., vol. 38, no. 10, pp. 1653–1656, Oct. 1990.
  • [43] X. Mu, “Log-concavity of a mixture of beta distributions,” Stat. Probab. Lett., vol. 99, pp. 125–130, Apr. 2015.
  • [44] V. Robu, H. Halpin, and H. Shepherd, “Emergence of consensus and shared vocabularies in collaborative tagging systems,” ACM Trans. Web, vol. 3, p. 14, Sep. 2009.
  • [45] C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts, “No country for old members: User lifecycle and linguistic change in online communities,” in Proc. 22nd Int. Conf. World Wide Web (WWW’13), May 2013, pp. 307–318.
  • [46] R. Gulati, F. Wohlgezogen, and P. Zhelyazkov, “The two facets of collaboration: Cooperation and coordination in strategic alliances,” Acad. Manage. Ann., vol. 6, pp. 531–583, Jun. 2012.
  • [47] L. R. Varshney and D. V. Oppenheim, “On cross-enterprise collaboration,” in Business Process Management, ser. Lecture Notes in Computer Science, S. Rinderle-Ma, F. Toumani, and K. Wolf, Eds.   Berlin: Springer, 2011, vol. 6896, pp. 29–37.
  • [48] B. Gleeson, K. MacLean, A. Haddadi, E. Croft, and J. Alcazar, “Gestures for industry: Intuitive human-robot communication from human observation,” in Proc. 8th ACM/IEEE Int. Conf. Human-Robot Interaction, Mar. 2013, pp. 349–356.