A Unified Approach to Dynamic Decision Problems with Asymmetric Information - Part I: Non-Strategic Agents

We study a general class of dynamic multi-agent decision problems with asymmetric information and non-strategic agents, which includes dynamic teams as a special case. When agents are non-strategic, an agent's strategy is known to the other agents. Nevertheless, the agents' strategy choices and beliefs are interdependent over times, a phenomenon known as signaling. We introduce the notions of private information that effectively compresses the agents' information in a mutually consistent manner. Based on the notions of sufficient information, we propose an information state for each agent that is sufficient for decision making purposes. We present instances of dynamic multi-agent decision problems where we can determine an information state with a time-invariant domain for each agent. Furthermore, we present a generalization of the policy-independence property of belief in Partially Observed Markov Decision Processes (POMDP) to dynamic multi-agent decision problems. Within the context of dynamic teams with asymmetric information, the proposed set of information states leads to a sequential decomposition that decouples the interdependence between the agents' strategies and beliefs over time, and enables us to formulate a dynamic program to determine a globally optimal policy via backward induction.

Authors

• 4 publications
• 15 publications
• 4 publications
12/03/2018

A Unified Approach to Dynamic Decision Problems with Asymmetric Information - Part II: Strategic Agents

We study a general class of dynamic games with asymmetric information wh...
09/09/2011

A Framework for Sequential Planning in Multi-Agent Settings

This paper extends the framework of partially observable Markov decision...
05/27/2019

Civic Crowdfunding for Agents with Negative Valuations and Agents with Asymmetric Beliefs

In the last decade, civic crowdfunding has proved to be effective in gen...
03/27/2021

Dynamic Information Sharing and Punishment Strategies

In this paper we study the problem of information sharing among rational...
05/19/2021

A Game-Theoretic Account of Responsibility Allocation

When designing or analyzing multi-agent systems, a fundamental problem i...
12/12/2019

10/16/2012

Exploiting Structure in Cooperative Bayesian Games

Cooperative Bayesian games (BGs) can model decision-making problems for ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Background and Motivation

Dynamic multi-agent decision problems with asymmetric information have been used to model many situations arising in engineering, economic, and socio-technological applications. In these applications many decision makers/agents interact with each other as well as with a dynamic system. They make private imperfect observations over time, and influence the evolution of the dynamic system through their actions that are determined by their strategies. An agent’s strategy is defined as a decision rule that the agent uses to choose his action at each time based on his realized information at that time.

In this paper, we study a general class of dynamic decision problems with non-strategic agents. We say an agent is non-strategic if his strategy (not his specific action) is known to the other agents. In a companion paper [2] we study dynamic decision problems with strategic agents where an agent’s strategy is his private information and not known to the other agents.

We consider an environment with controlled Markovian dynamics, where, given the agents’ actions at every time, the system state at the next time is a stochastic function of the current system state. The instantaneous utility of each agent depends on the agents’ joint actions as well as the system state. At every time, each agent makes a private noisy observation that depends on the current system state and past actions of all agents in the system. Therefore, agents have asymmetric and imperfect information about the system history. Moreover, each agent’s information depends on other agents’ past actions and strategies; this phenomenon is known as signaling in the control theory literature. In such problems, the agents’ decisions and information are coupled and interdependent over time because (i) an agent’s utility depends on the other agents’ actions, (ii) the evolution of the system state depends, in general, on all the agents’ actions, (iii) each agent has imperfect and asymmetric information about the system history, and (iv) at every time an agent’s information depends, in general, on the agents’ (including himself) past actions and strategies.

There are two main challenges in the study of dynamic multi-agent decision problems with asymmetric information. First, because of the coupling and interdependence among the agents’ decisions and information over time, we need to determine the agents’ strategies simultaneously for all times. Second, as the agents acquire more information over time, the domains of their strategies grow.

In this paper, we propose a general approach for the study of dynamic decision problems with non-strategic agents and address these two challenges. We propose the notion of sufficient information and provide a set of conditions sufficient to characterize a compression of the agents’ private and common information in a mutually consistent manner over time. We show that such a compression results in an information state for each agent’s decision making problem. We show that restriction to the set of strategies based on this information state entails no loss of generality in dynamic decision problems with non-strategic agents.

We identify specific instances of dynamic decision problems where we can discover a set of information states for the agents that have time-invariant domain. Within the context of dynamic teams, we further demonstrate that the notion of sufficient information leads to a sequential decomposition of dynamic teams. This sequential decomposition results in a dynamic program the solution of which determines the agents’ globally optimal strategies.

I-B Related Literature

The Partially Observed Markov Decision Processes (POMDPs), i.e. centralized stochastic control problems, present the simplest form of dynamic decision problems with single agent [3, 4]. To analyze and identify properties of optimal strategies in POMDPs the notion of information state is introduced as the agent’s belief about the current system state conditioned on his information history. The information state provides a way to compress the agent’s information over time that is sufficient for the decision-making purposes. When the agent has perfect recall, this information state is independent of the agent’s strategies over time; this result is known as the policy-independence belief property [3].

Dynamic multi-agent decision problems with non-strategic agents are considerably more difficult compared to their centralized counterparts. This is because, due to signaling, they are (in general) non-convex functional optimization problems (see [5, 6, 7, 8]). The difficulties present in these problems were first illustrated by Witsenhausen [9]

, who showed that in a simple dynamic team problem with Gaussian primitive random variables and quadratic cost function where signaling occurs, linear strategies are suboptimal (contrary to the corresponding centralized problem where linear strategies are optimal). Subsequently, many researchers investigated control problems with various specific information structures such as: partially nested (

[10, 11, 12, 13, 14, 15] and references therein), stochastic nested [16], randomized partially nested [17], delayed sharing ([11, 18, 19, 20] and references therein), information structures possessing the i-partition property or the s-partition property [21], the quadratic invariance property [22], and the substitutability property [23].

Currently, there are three approaches to the analysis of dynamic multi-agent decision problems with non-strategic agents: the agent-by-agent approach [24], the designer’s approach [25], and the common information approach [26]. We provide a brief discussion of these approaches here. We discuss them in details in Section VI-B, where we compare them with the sufficient information approach we present in this paper and show that our approach is distinctly different from them.

The agent-by-agent approach [24], is an iterative method. At each iteration, we pick an agent and fix the strategy of all agents except that agent, and determine the best response for that agent and update his strategy accordingly. We proceed in a round robin fashion among the agents until a fixed point is reached, that is, when no agent can improve his performance by unilaterally changing his strategy. The designer’s approach [25]

, considers the decision problem from the point of view of a designer who knows the system model and the probability distribution of the primitive random variables, and chooses the control strategies for all agents without having an information about the realization of the primitive random variables. The common information approach

[26], assumes that at each time all agents possess private information and share some common information; it uses the common information to coordinate the agents’ strategies sequentially over time.

I-C Contribution

We develop a general methodology for the study and analysis of dynamic decision problems with asymmetric information and non-strategic agents. Our model includes problems with non-classical information structures [19] where signaling is present. We propose an approach that effectively compresses the agents’ private and common information in a mutually consistent manner. As a result, we offer a set of information states for the agents which are sufficient for decision making purposes. We characterize special instances where we can identify an information state with a time-invariant domain. Based on the proposed information state, we provide a sequential decomposition of dynamic teams over time. We show that the methodology developed in this paper generalizes the existing results for dynamic teams with non-classical information structure. Our results in this paper, along those appearing in the companion paper [2] present a set of information states sufficient for decision making in strategic and non-strategic settings. Therefore, we provide a unified approach to decision making problems that can be used to study dynamic games and dynamic teams as well as dynamic games among teams of agents.

I-D Organization

The rest of the paper is organized as follows. In Section II, we describe the model and present few examples. In Section III, we discuss the main challenges that are present in dynamic multi-agent decision problems with non-strategic agents. We present the sufficient information approach in Section IV. We present the main results of the paper in Section V. We discuss an open problem associated with the sufficient information approach in Section VI-A. In Section VI-B, we compare the sufficient information approach with the existing approaches in the literature. We provide a generalization of the sufficient information approach in Section VII. We present an extension of our results to infinite-horizon dynamic multi-agent decision problems with non-strategic agents in Section VIII. We conclude in Section IX. The proofs of all the theorems and lemmas appear in the Appendix.

Notation

Random variables are denoted by upper case letters, their realizations by the corresponding lower case letters. In general, subscripts are used as time index while superscripts are used to index agents. For , (resp. ) is the short hand notation for the random variables (resp. functions ). When we consider a sequence of random variables (resp. functions) for all time, we drop the subscript and use to denote (resp. to denote ). For random variables (resp. functions ), we use (resp.

) to denote the vector of the set of random variables (resp. functions) at

, and (resp. ) to denote all random variables (resp. functions) at except that of the agent indexed by . and denote the probability and expectation of an event and a random variable, respectively. For a set , denotes the set of all beliefs/distributions on . For random variables with realizations , and . For a strategy and a belief (probability distribution) , we use (resp. ) to indicate that the probability (resp. expectation) depends on the choice of and . We use to denote the indicator function for event . For sets and we use to denote all elements in set that are not in set . For random variables and we write when and have an identical probability distribution.

Ii Model

1) System dynamics: Consider non-strategic agents who live in a dynamic Markovian world over a horizon , . Let denote the state of the world at . At time , each agent, indexed by , chooses an action , where denotes the set of available actions to him at . Given the collective action profile , the state of the world evolves according to the following stochastic dynamic equation,

 Xt+1=ft(Xt,At,Wxt), (1)

where is a sequence of independent random variables. The initial state is a random variable that has a probability distribution with full support.

At every time , before taking an action, agent receives a noisy private observation of the current state of the world and the action profile , given by

 Yit=Oit(Xt,At−1,Wit), (2)

where , , are sequences of independent random variables. Moreover, at every , all agents receive a common observation of the current state of the world and the action profile , given by

 Zt=Oct(Xt,At−1,Wct), (3)

where , is a sequence of independent random variables. We note that the agents’ actions is commonly observable at if . We assume that the random variables , , , and , are mutually independent.

2) Information structure: Let denote the aggregate information of all agents at time . Assuming that agents have perfect recall, we have , i.e. denotes the set of all agents’ past observations and actions. The set of all possible realizations of the agents’ aggregate information is given by .

At time , the aggregate information is not fully known to all agents; each agent may have asymmetric information about . Let denote the agents’ common information about and denote agent ’s private information about , where and denote the set of all possible realizations of agent ’s private and common information at , respectively. In this paper, we discuss several instances of information structures that can be captured as special cases of our general model.

3) Strategies and Utilities: Let denote the information available to agent at , where denote the set of all possible realizations of agent ’s information at . Agent ’s strategy , is defined as a sequence of mappings , , that determine agent ’s action for every realization of his history at .

Agent ’s instantaneous utility at depends on the state of the world and the collective action profile and is given by . Therefore, agent ’s total utility over the horizon is given as

 Ui(X1:T,A1:T):=∑t∈Tuit(Xt,At). (4)

We assume that agents are non-strategic. That is, each agent’s, say ’s, , strategy choice is known to other agents. We note that these non-strategic agents may have different utilities over time. Therefore, the model includes a team of agents sharing the same utilities (see Sections V) as well as agents with general non-identical utilities. In [2] we build on our results in this paper to study dynamic decision problems with strategic agents where an agent may deviate privately from the commonly believed strategy, and gain by misleading the other agents.

To avoid measure-theoretic technical difficulties and for clarity and convenience of exposition, we assume that all the random variables take values in finite sets.

Assumption 1.

(Finite game) The sets , , , , , , are finite.

Special Cases:

We present several instances of dynamic decision problems with asymmetric information that are special cases of the general model described above.

1) Real-time source coding-decoding [27]: Consider a data source that generates a random sequence that is -th order Markov, i.e. for every sequence of realizations , for . There exists an encoder (agent ) who observes at every time ; the encoder has perfect recall. At every time , based on his available data , the encoder transmits a signal through a noiseless channel to a decoder (agent ), where denotes the transmission alphabet. At the receiving end, at every time

, the decoder wants to estimate the value of

(with delay ) as based on his available data ; we assume that the decoder has perfect recall. The encoder and decoder choose their joint coding-decoding policy so as to minimize the expected total distortion function given by , where denotes the instantaneous distortion function. To capture the above-described model within the context of our model, we need to define an augmented system state that includes the last states realizations as . Moreover, the encoder’s (agent ’s) observation is given by and the decoder’s (agent ’s) observation is given by , where . The encoder’s and decoder’s instantaneous utility are given by a distortion function .

2) Delayed sharing information structure [19, 20, 28, 18]: Consider a -agent decision problem where agents observe each others’ observations and actions with -step delay. We note that in our model we assume that the agents’ common observation at is only a function of and and . Therefore, to describe the decision problem with delayed sharing information structure within the context of our model we need to augment our state space to include the agents’ last observations and actions as part of the augmented state. Define as the augmented system state where , ; that is, serves as a temporal memory for the agents’ observations and actions at . Then, we have and .

3) Real-time multi-terminal communication [29]: Consider a real-time communication system with two encoders (agents and ) and one receiver (agent ). The two encoders make distinct observations and

of a Markov source. The encoders’ observation are conditionally independent Markov chains. That is, there is an unobserved random variable variable

such that , and

Each encoder encodes, in real-time, its observations into a sequence of discrete symbols and sends it through a memoryless noisy channel characterized by a transition matrix , . The receiver wants to construct, in real time, an estimate of the state of the Markov source based on the channels’ output ,. All agents have the same instantaneous utility given by a distortion function .

4) Optimal remote and local controller [30, 31]: Consider a decentralized control problem for a Markovian plant with two controllers, a local controller (agent ) and a remote controller (agent ).

The local controller observes perfectly the state of the Markov chain, and sends his observation through a packet-drop channel to the remote controller. The transmission is successful, i.e. , with probability and is not successful, i.e. , with probability . We assume that the local controller receives an acknowledgment every time the transmission is successful. The controllers’ joint instantaneous utility is given by a .

Iii Strategies and Beliefs

In a dynamic decision problem with asymmetric information agents have private information about the evolution of the system, and they do not observe the complete history , . Therefore, at every time , each agent, say agent , needs to form (i) an appraisal about the current state of the system and the other agents’ information (appraisal about the history), and (ii) an appraisal about how other agents will play in the future (appraisal about the future), so as to evaluate the performance of his strategy choices.

When agents are non-strategic, the agents’ strategies are known to all agents. Therefore, agent can form these appraisals by using his private information along with the commonly known strategies . Specifically, agent can utilize his own information at , along with (i) the past strategies and (ii) the future strategies to form these appraisals about the history and the future of the overall system, respectively. As a result, the outcome of decision problems with non-strategic agents can be fully characterized by the agents’ strategy profile .222We discuss the decision problems with strategic agents in the companion paper [2].When agents are strategic each agent may have incentive to deviate an any time from the strategy the other agents commonly believe he uses if it is profitable to him (see [2] for more discussion).

However, we need to know the entire strategy profile for all agents and at all times to form these appraisals so as to evaluate the performance of an arbitrary strategy , at any time and for any agent . Therefore, we must work with the strategy profile as a whole irrespective of the length of the time horizon . Consequently, the computational complexity of determining a strategy profile that satisfies certain conditions (e.g. an optimal strategy profile in teams) grows doubly exponentially in since the domain of agents’ strategy (i.e. ) and the number of temporally interdependent decision problems (one for each time instance) grows with . As a result, the analysis of such decision problems is very challenging in general [32].

An alternative conceptual approach for the analysis of decision problems is to define a belief system along with the strategy profile . For every agent , at every time , define as the agent ’s belief about conditioned on the realization of , that is, . The belief provides an intermediate instrument that encapsulates agent ’s appraisal about the past. Therefore, agent can evaluate the performance of any action using only the belief along with the future strategy profile . However, the belief is dependent on in general since the probability distribution depends on . Therefore, the introduction of a belief system offers an equivalent problem formulation that does not necessarily break the inter-temporal dependence between and and does not simplify the analysis of decision problems.

Nevertheless, the definition of a belief system has been shown to be suitable for the analysis of single-agent decision making problems (POMDP) for the following reasons. First, in POMDPs, under perfect recall, the probability distribution is independent of ; this is known as the policy-independence property of beliefs in stochastic control. Second, the complexity of the belief function does not grow over time since at every time the agent only needs to form a belief about , which has a time-invariant domain. As a result, we can sequentially decompose the problem over time to a sequence of static decision problems with time-invariant complexity; such a decomposition leads to a dynamic program. At each stage of the dynamic program, we specify by determining an action for each realization of the belief fixing the future strategies . Therefore, the computational complexity of the analysis is reduced from being exponential in to linear in .

Unfortunately, the above approach for POMDPs does not generalize to decision problems with many agents. This is because of three reasons. First, with many agents, currently in the literature, there exists no information state for each agent that provides a compression of the agent’s information, in a mutually consistent manner among the agents, that is sufficient for decision making purposes. Therefore, an agent’s, say agent ’s, strategy has a growing domain over time. Second, at every time , each agent needs to form a belief about the system state as well as the other agents’ private information that has a growing domain. Therefore the complexity of belief functions grows over time. Third, in decision problems with many agents, the policy-independence property of belief does not hold in general and the agents’ beliefs at every time depend on the past strategy profile . Therefore, the agents’ beliefs are correlated with one another. This correlation depends on , and thus, it is not known a priori. Consequently, if we follow an approach similar to that of POMDP to sequentially decompose the problem, we need to solve the decision problem at every stage for every arbitrary correlation among the agents’ belief functions, and such a problem is not tractable.333Alternatively, one can consider arbitrary correlation among the agents’ information rather than their beliefs. This is the main idea that underlies the designer’s approach proposed by Winstenhausen [25]. Please see Section VI-B for more discussion. Hence, the methodology proposed for the study of POMDPs is not directly applicable to decision problems with many agents and non-classical information structures.

In the sequel, we propose a notion of sufficient private information and sufficient common information as a mutually consistent compression of the agents’ information for decision making purposes. Therefore, we address (partially) the first two problems on the growing domain of the agents’ beliefs and strategies. We provide instances of decision problems where we can discover time-invariant information state for each agent. We then utilize the agents’ sufficient common information as a coordination instrument, and thus, capture the implicit correlation among the agents’ beliefs over time. Accordingly, we present a sequential decomposition of the original decision problems such that at every stage the complexity of the decision problem is similar to that of a static decision multi-agent problem and the size of state variable at every stage is proportional to the dimension of the sufficient private information; thus, we (partially) address the third problem discussed above.

Iv Sufficient Information

We present the sufficient information approach and characterize an information state that results from compressing the agents’ private and common information in a mutually consistent manner. Therefore, we introduce a class of strategy choices that are simpler than general strategies as they require agents to keep track of only a compressed version of their information over time. We proceed as follows. In Section IV-A we provide conditions sufficient to determine the subset of private information an agent needs to keep track of over time for decision making purposes. In Section IV-B, we introduce the notion of sufficient common information as a compressed version of the agents’ common information that along with sufficient private information provides an information state for each agent. We then show, in Section V, that this compression of the agents’ private and common information provides a sufficient statistic in dynamic decision problems with non-strategic agents. In Section VII, we provide a generalization of sufficient information approach presented here.

Iv-a Sufficient Private Information

The key ideas for compressing an agent’s private information appear in Definitions 1 and 2 below. To motivate these definitions we first consider the decision problem with single agent, that is, a Partially Observed Markov Decision Process (POMDP), which is a special case of the model described in Section II where , and for all .

In a POMDP, the agent’s belief about the system state conditioned on his history realization is an information state. We highlight the three main proprieties that underlie the definition of information state in POMDP (see [33, 34]): (1) the information state can be updated recursively, that is, at any time the information state at can be written as a function of the information state at and the new information that becomes available at , (2) the agent’s belief about the information state at the next time conditioned on the current information state and action is independent of his information history, and (3) at any time and for any arbitrary action the agent’s expected instantaneous utility conditioned on the information state is independent of his information history.

We generalize the key properties of information state for POMDPs, described above, to decision problems with many agents. We propose a set of conditions sufficient to compress the agents’ private information in two steps. First, we consider a decision problem with many agents where there is no signaling among them. Motivated by the definition of information state in POMDPs, we describe conditions sufficient to determine a compression of the agents’ private information (Definition 1). Next, we build on Definition 1 as an intermediate conceptual step, and consider the case where agents are aware of possible signaling among them. Accordingly, we present a set of conditions sufficient to determine a compression of the agents’ private information in decision problems with many agents (Definition 2) .

Therefore, we first characterize subsets of an agent’s private information that are sufficient for the agent’s decision making process when there is no signaling among the agents.

Definition 1 (Private payoff-relevant information).

Let denote a private signal that agent forms at based on his private information and common information . We say is a private payoff-relevant information for agent if, for all open-loop strategy profile and for all ,

1. it can be updated recursively as

 Pi,prt=¯ϕit(Pi,prt−1,Hit∖Hit−1)if t≠1,
2. for all realizations it satisfies

3. for all realizations such that ,

By assuming that all other agents play open-loop strategies we remove the interdependence between agents ’s strategy choices and agent ’s information structure, thus, we eliminate signaling among the agents. Fixing the open-loop strategies of agents , agent faces a centralized stochastic control problem. Definition 1 says that , , is a private payoff-relevant information for agent if (i) it can be recursively updated, (ii) includes all information in that is relevant to and (iii) agent ’s instantaneous conditional expected utility at any is only a function of , and his action at . These three conditions are similar to properties (1)-(3) for an information state in POMDP, but they concern only agent ’s private information instead of the collection of his private and common information.555We note that we interpret a centralized control problem as a special case of our model where , and for all , Definition 1 coincides with the definition of information state for the single agent decision problem. We would like to point out that conditions (i)-(iii) can have many solutions including the trivial solution . 444An interesting research direction is to determine whether a minimal private payoff-relevant information exists, and if so, characterize such a minimal payoff-relevant information. However, such a direction is beyond the scope of this chapter, and we leave this topic for future research.

While the definition of private payoff-relevant information suggests a possible way to compress the information required for an agent’s decision making process, it assumes that other agents play open-loop strategies and do not utilize the information they acquire in real-time for decision making purposes (i.e. no signaling). However, open-loop strategies are not in general optimal for agents . As a result, to evaluate the performance of any strategy choice agent needs also to form a belief about the information that other agents utilize to make decisions.

Definition 2 (Sufficient private information).

We say , , , is sufficient private information for the agents if,

1. it can be updated recursively as

 Sit=ϕit(Sit−1,Hit∖Hit−1;g1:t−1) for t∈T∖{1}, (5)
2. for any strategy profile and for all realizations of positive probability,

 (6)

where for ;

3. for every strategy profile of the form and , ;

 (7)

for all realizations of positive probability where for ;

4. given an arbitrary strategy profile of the form , , and ,

 (8)

for all realizations of positive probability where for .

There are four key differences between the definition of sufficient private information and that of private payoff relevant information. First, we allow that the definition and the update rule of sufficient information to depend on the agents’ strategies . Second, comparing to part (ii) of Definition 1, part (ii) of Definition 2 requires that sufficient information includes all information relevant to the realization of in addition to the information relevant to the realization of . As we discuss further in Section VI, this is because when signaling occurs in a multi-agent decision problems agents need to have a consistent view about future commonly observable events. Third, comparing part (iii) of Definition 2 to part (iii) of Definition 1, we note that the probability measures in Definition 2 depend on the strategy profile instead of the ope-loop strategy profile . Fourth, in part (iv) of Definition 2 there is an additional condition requiring that agent ’s sufficient private information must be rich enough so that he can form beliefs about agents ’s sufficient private information ; such a condition is absent in Definition 1.

In general, the notion of sufficient private information is more restrictive than that of private payoff relevant information . This is because, , , needs to satisfy the additional condition (iv), and furthermore, open-loop strategies are a strict subset of closed loop strategies. Definition 2 provides (sufficient) conditions under which agents can compress their private information in a “mutually consistent’ manner. We would like to point out that conditions (i)-(iv) of Definition 2 can have many solutions including the trivial solution .666We do not discuss the possibility of finding a minimal set of sufficient private information in this chapter, and leave it for future research as such investigation is beyond the scope of this chapter.

Iv-B Sufficient Common Information

Based on the characterization of sufficient private information, we present a statistic (compressed version) of the common information that agents need to keep track of over time for decision making purposes.

Fix a choice of sufficient private information , . Define to be the set of all possible realizations of , and . Given the agents’ strategy profile , let

denote a mapping that determines a conditional probability distribution over the system state

and all the agents’ sufficient private information conditioned on the common information at time as

 γt(ct)(xt,st)=Pg1:t−1{Xt=xt,St=st|ct}, (9)

for all .

We call the collection of mappings a sufficient information based belief system (SIB belief system). Note that is only a function of the common information , and thus, it is computable by all agents. Let denote the (random) common information based belief that agents hold under belief system at . We can interpret as the common belief that each agent holds about the system state and all the agents’ (including himself) sufficient private information at time . We call the SIB belief a sufficient common information for the agents. In the rest of the paper, we write and drop the superscript whenever such a simplification in notation is clear. Moreover, we use the terms sufficient common information and SIB belief interchangeably.

Iv-C Sufficient Information based Strategy

The combination of sufficient private information and sufficient common information (the SIB belief) offers a mutually consistent compression of the agents’ private and common information. Consider a class of strategies that are based on the information given by for each agent at time . We call the mapping a Sufficient Information Based (SIB) strategy for agent at time . A SIB strategy determines a probability distribution for agent ’s action at time given his information . A SIB strategy is a strategy where agents only use the sufficient common information (instead of complete common information ), and the sufficient private information (instead of complete private information ). A collection of SIB strategies is called a SIB strategy profile . The set of SIB strategies is a subset of general strategies, defined in Section II, as we can define,

 g(σ,γ),it(hit):=σit(πγt,sit)∀t∈T (10)

We note that from Definition 2 and (9), the realizations and at only depends on . Therefore, strategies , defined above via (10) needs to be determined iteratively as follows; for , ; for , ; ; for , . Therefore, strategy is well-defined for all and .

Iv-D Sufficient Information based Update Rule

When the agents play a SIB strategy profile , it is possible to determine the SIB belief recursively over time based on and the new common information via Bayes’ rule. Let describe such a update rule for time so that

 Πt=ψσt−1t(Πt−1,Zt). (11)

We note that the SIB update rule depends on the SIB strategy profile at . In the rest of the paper, we drop the superscript whenever such a simplification in notation is clear.

Iv-E Special Cases

We consider the special cases (1)-(3) of the general model we presented in Section II, and identify the sufficient private information ; we discuss the application of sufficient information approach to special case (4) in Section VII.

1) Real-time source coding-decoding: The encoder’s and decoders’ private information are given by and , respectively. The agents’ common information is given by . We can verify that and satisfy the conditions of Definition 2 ; this is similar to the structural results in [27, Sections III and VI]. Consequently, the common information based belief is .

2) Delayed sharing information structure: We have and . Since we do not assume any specific structure for the system dynamics and the agents’ observations, agent ’s complete private information is payoff-relevant for him. Therefore, we set . Consequently, we have . The above sufficient information appears in the first structural result in [18].

3) Real-time multi-terminal communication: We have , , , and . It is easy to verify that , , and ; this sufficient information corresponds to the structural results that appear [29].

V Main Results

In this section, we present our main results for the analysis of dynamic decision problems with asymmetric information and non-strategic agents using the notion of sufficient information. We first provide a generalization of the policy-independence property of beliefs to decision problems with many agents (Theorem 1). Second, we show that the set of SIB strategies are rich enough so that restriction to them is without loss of generality (Theorem 2). That is, given any strategy profile , there exists a SIB strategy profile such that every agent gets the same flow of utility over time under as the one under . Third, we consider dynamic team problems with asymmetric information. We show that using the SIB strategies, we can decompose the problem sequentially over time, formulate a dynamic program, and determine a globally optimal policy via backward induction (Theorem 3).

Theorem 1 (Policy-independence belief property).

(i) Consider a general strategy profile . If agents play according to strategies , then for every strategy that agent plays,

 Pg{xt,p−it∣∣hit}=Pg−i{xt,p−it∣∣hit}. (12)

(ii) Consider a SIB strategy profile along with the associated update rule . If agents play according to SIB strategies , then for every general strategy that agent plays,

 (13)

Theorem 1 provides a generalization of the policy-independence belief property for the centralized stochastic control problem [3] to multi-agent decision making problems. Part (i) of Theorem 1 states that, under perfect recall, agent ’s belief is independent of his actual strategy . Part (ii) of Theorem 1 refers to the case where agents play SIB strategies and update their SIB belief according to SIB update rule . The update rule is determined based on via Bayes’ rule, where denotes the SIB strategy that agents assume agent utilizes. Equation (13) states that even if agent unilaterally and privately deviates from his SIB strategy, his belief is independent of his actual strategy , and only depends on the other agents’s strategy as well as the other agents’ assumption about the SIB strategy (or equivalently the SIB update rule ).777The results of Theorem 1 provides a crucial property for the analysis of decision problems with strategic agents. This is because it ensures that an agent’s unilateral deviation does not influence his belief (see the companion paper [2] for more details).

In POMDPs it is shown that restriction to Markov strategies is without loss of optimality. We provide a generalization of this result to decision problems with many agents. We show that restriction to SIB strategies is without loss of generality in non-strategic settings given that the agents have access to a public randomization device. We say that the agents have access to a public randomization device if at every time they observe a public random signal

that is completely independent of all events and primitive random variables in the decision problem and is uniformly distributed on

, and is independent across time. As a result, in general, at every , all agents can condition their actions on the realization of as well as their own information. In other words, a public randomization device enables the agents to play correlated randomized strategies. We denote by agent ’s SIB strategy using the public randomization device for every and .

Theorem 2.

Assume that the non-strategic agents have access to a public randomization device. Then, for any strategy profile there exists an equivalent SIB strategy profile that results in the same expected flow of utility, i.e.

 (14)

for all and .

We provide an intuitive explanation for the result of Theorem 2 below. For every agent , his complete information history at any time consists of two components: (i) one component captures his information about past events that is relevant to the continuation decision problem; and (ii) another component that, given the first component, captures the information about past events that is irrelevant to the continuation decision problem. We show that the combination of sufficient private information and sufficient common information contains the first component. Nevertheless, in general, the agents can coordinate their action by incorporating the second component into their decision since their information about the past events is correlated. Let denote the part of agent ’s information that is not captured by . We show that the set of are jointly independent of (Lemma 2 in the Appendix). Therefore, at every time , we can generate a set of signals , one for each agent, using the public randomization device so that they are identically distributed as . Using the signals along with the information state for every agent , we can thus recreate a (simulated) history that is identically distributed to . This implies that, given a public randomization device , it is sufficient for each agent to only keep track of instead of his complete history , and play a SIB strategy to achieve an identical (in distribution) sequence of outcomes per stage as those under the strategy profile .

The result of Theorem 2 states that the the class of SIB strategies characterizes a set of simpler strategies where the agents only keep track of a compressed version of their information rather than their entire information history. Moreover, the restriction to the class of SIB strategies is without loss of generality. Thus, along with results appearing in the companion paper [2], the result of Theorem 2 suggests that the sufficient information approach proposed in this paper presents a unified methodology for the study of decision problems with many non-strategic or strategic agents and asymmetric information.

We would like to discuss the implication of Theorem 2 for two special instances of our model. First, when , there is no need for a public randomization device since the single decision maker does not need to correlate the outcome of his randomized strategy with any other agent. Therefore, the result of Theorem 2 states that the restriction to Markov strategies in POMDPs is without loss of generality. Second, when and the agents have identical utilities, i.e. dynamic teams, utilizing a public randomization device does not improve the performance. This is because, in dynamic teams a randomized strategy profile is optimal if and only if it is optimal for every realization of the randomization. Therefore, the restriction to SIB strategies in dynamic teams is without loss of optimality.

Using the result of Theorem 2, we present below a sequential decomposition of dynamic teams over time. We formulate a dynamic program that enables us to determine a globally optimal strategy profile via backward induction.

Theorem 3.

A SIB strategy profile is a globally optimal solution to a dynamic team problem with asymmetric information if it solves the following dynamic program:

 VT+