I Introduction
Ia Background and Motivation
Dynamic multiagent decision problems with asymmetric information have been used to model many situations arising in engineering, economic, and sociotechnological applications. In these applications many decision makers/agents interact with each other as well as with a dynamic system. They make private imperfect observations over time, and influence the evolution of the dynamic system through their actions that are determined by their strategies. An agent’s strategy is defined as a decision rule that the agent uses to choose his action at each time based on his realized information at that time.
In this paper, we study a general class of dynamic decision problems with nonstrategic agents. We say an agent is nonstrategic if his strategy (not his specific action) is known to the other agents. In a companion paper [2] we study dynamic decision problems with strategic agents where an agent’s strategy is his private information and not known to the other agents.
We consider an environment with controlled Markovian dynamics, where, given the agents’ actions at every time, the system state at the next time is a stochastic function of the current system state. The instantaneous utility of each agent depends on the agents’ joint actions as well as the system state. At every time, each agent makes a private noisy observation that depends on the current system state and past actions of all agents in the system. Therefore, agents have asymmetric and imperfect information about the system history. Moreover, each agent’s information depends on other agents’ past actions and strategies; this phenomenon is known as signaling in the control theory literature. In such problems, the agents’ decisions and information are coupled and interdependent over time because (i) an agent’s utility depends on the other agents’ actions, (ii) the evolution of the system state depends, in general, on all the agents’ actions, (iii) each agent has imperfect and asymmetric information about the system history, and (iv) at every time an agent’s information depends, in general, on the agents’ (including himself) past actions and strategies.
There are two main challenges in the study of dynamic multiagent decision problems with asymmetric information. First, because of the coupling and interdependence among the agents’ decisions and information over time, we need to determine the agents’ strategies simultaneously for all times. Second, as the agents acquire more information over time, the domains of their strategies grow.
In this paper, we propose a general approach for the study of dynamic decision problems with nonstrategic agents and address these two challenges. We propose the notion of sufficient information and provide a set of conditions sufficient to characterize a compression of the agents’ private and common information in a mutually consistent manner over time. We show that such a compression results in an information state for each agent’s decision making problem. We show that restriction to the set of strategies based on this information state entails no loss of generality in dynamic decision problems with nonstrategic agents.
We identify specific instances of dynamic decision problems where we can discover a set of information states for the agents that have timeinvariant domain. Within the context of dynamic teams, we further demonstrate that the notion of sufficient information leads to a sequential decomposition of dynamic teams. This sequential decomposition results in a dynamic program the solution of which determines the agents’ globally optimal strategies.
IB Related Literature
The Partially Observed Markov Decision Processes (POMDPs), i.e. centralized stochastic control problems, present the simplest form of dynamic decision problems with single agent [3, 4]. To analyze and identify properties of optimal strategies in POMDPs the notion of information state is introduced as the agent’s belief about the current system state conditioned on his information history. The information state provides a way to compress the agent’s information over time that is sufficient for the decisionmaking purposes. When the agent has perfect recall, this information state is independent of the agent’s strategies over time; this result is known as the policyindependence belief property [3].
Dynamic multiagent decision problems with nonstrategic agents are considerably more difficult compared to their centralized counterparts. This is because, due to signaling, they are (in general) nonconvex functional optimization problems (see [5, 6, 7, 8]). The difficulties present in these problems were first illustrated by Witsenhausen [9]
, who showed that in a simple dynamic team problem with Gaussian primitive random variables and quadratic cost function where signaling occurs, linear strategies are suboptimal (contrary to the corresponding centralized problem where linear strategies are optimal). Subsequently, many researchers investigated control problems with various specific information structures such as: partially nested (
[10, 11, 12, 13, 14, 15] and references therein), stochastic nested [16], randomized partially nested [17], delayed sharing ([11, 18, 19, 20] and references therein), information structures possessing the ipartition property or the spartition property [21], the quadratic invariance property [22], and the substitutability property [23].Currently, there are three approaches to the analysis of dynamic multiagent decision problems with nonstrategic agents: the agentbyagent approach [24], the designer’s approach [25], and the common information approach [26]. We provide a brief discussion of these approaches here. We discuss them in details in Section VIB, where we compare them with the sufficient information approach we present in this paper and show that our approach is distinctly different from them.
The agentbyagent approach [24], is an iterative method. At each iteration, we pick an agent and fix the strategy of all agents except that agent, and determine the best response for that agent and update his strategy accordingly. We proceed in a round robin fashion among the agents until a fixed point is reached, that is, when no agent can improve his performance by unilaterally changing his strategy. The designer’s approach [25]
, considers the decision problem from the point of view of a designer who knows the system model and the probability distribution of the primitive random variables, and chooses the control strategies for all agents without having an information about the realization of the primitive random variables. The common information approach
[26], assumes that at each time all agents possess private information and share some common information; it uses the common information to coordinate the agents’ strategies sequentially over time.IC Contribution
We develop a general methodology for the study and analysis of dynamic decision problems with asymmetric information and nonstrategic agents. Our model includes problems with nonclassical information structures [19] where signaling is present. We propose an approach that effectively compresses the agents’ private and common information in a mutually consistent manner. As a result, we offer a set of information states for the agents which are sufficient for decision making purposes. We characterize special instances where we can identify an information state with a timeinvariant domain. Based on the proposed information state, we provide a sequential decomposition of dynamic teams over time. We show that the methodology developed in this paper generalizes the existing results for dynamic teams with nonclassical information structure. Our results in this paper, along those appearing in the companion paper [2] present a set of information states sufficient for decision making in strategic and nonstrategic settings. Therefore, we provide a unified approach to decision making problems that can be used to study dynamic games and dynamic teams as well as dynamic games among teams of agents.
ID Organization
The rest of the paper is organized as follows. In Section II, we describe the model and present few examples. In Section III, we discuss the main challenges that are present in dynamic multiagent decision problems with nonstrategic agents. We present the sufficient information approach in Section IV. We present the main results of the paper in Section V. We discuss an open problem associated with the sufficient information approach in Section VIA. In Section VIB, we compare the sufficient information approach with the existing approaches in the literature. We provide a generalization of the sufficient information approach in Section VII. We present an extension of our results to infinitehorizon dynamic multiagent decision problems with nonstrategic agents in Section VIII. We conclude in Section IX. The proofs of all the theorems and lemmas appear in the Appendix.
Notation
Random variables are denoted by upper case letters, their realizations by the corresponding lower case letters. In general, subscripts are used as time index while superscripts are used to index agents. For , (resp. ) is the short hand notation for the random variables (resp. functions ). When we consider a sequence of random variables (resp. functions) for all time, we drop the subscript and use to denote (resp. to denote ). For random variables (resp. functions ), we use (resp.
) to denote the vector of the set of random variables (resp. functions) at
, and (resp. ) to denote all random variables (resp. functions) at except that of the agent indexed by . and denote the probability and expectation of an event and a random variable, respectively. For a set , denotes the set of all beliefs/distributions on . For random variables with realizations , and . For a strategy and a belief (probability distribution) , we use (resp. ) to indicate that the probability (resp. expectation) depends on the choice of and . We use to denote the indicator function for event . For sets and we use to denote all elements in set that are not in set . For random variables and we write when and have an identical probability distribution.Ii Model
1) System dynamics: Consider nonstrategic agents who live in a dynamic Markovian world over a horizon , . Let denote the state of the world at . At time , each agent, indexed by , chooses an action , where denotes the set of available actions to him at . Given the collective action profile , the state of the world evolves according to the following stochastic dynamic equation,
(1) 
where is a sequence of independent random variables. The initial state is a random variable that has a probability distribution with full support.
At every time , before taking an action, agent receives a noisy private observation of the current state of the world and the action profile , given by
(2) 
where , , are sequences of independent random variables. Moreover, at every , all agents receive a common observation of the current state of the world and the action profile , given by
(3) 
where , is a sequence of independent random variables. We note that the agents’ actions is commonly observable at if . We assume that the random variables , , , and , are mutually independent.
2) Information structure: Let denote the aggregate information of all agents at time . Assuming that agents have perfect recall, we have , i.e. denotes the set of all agents’ past observations and actions. The set of all possible realizations of the agents’ aggregate information is given by .
At time , the aggregate information is not fully known to all agents; each agent may have asymmetric information about . Let denote the agents’ common information about and denote agent ’s private information about , where and denote the set of all possible realizations of agent ’s private and common information at , respectively. In this paper, we discuss several instances of information structures that can be captured as special cases of our general model.
3) Strategies and Utilities: Let denote the information available to agent at , where denote the set of all possible realizations of agent ’s information at . Agent ’s strategy , is defined as a sequence of mappings , , that determine agent ’s action for every realization of his history at .
Agent ’s instantaneous utility at depends on the state of the world and the collective action profile and is given by . Therefore, agent ’s total utility over the horizon is given as
(4) 
We assume that agents are nonstrategic. That is, each agent’s, say ’s, , strategy choice is known to other agents. We note that these nonstrategic agents may have different utilities over time. Therefore, the model includes a team of agents sharing the same utilities (see Sections V) as well as agents with general nonidentical utilities. In [2] we build on our results in this paper to study dynamic decision problems with strategic agents where an agent may deviate privately from the commonly believed strategy, and gain by misleading the other agents.
To avoid measuretheoretic technical difficulties and for clarity and convenience of exposition, we assume that all the random variables take values in finite sets.
Assumption 1.
(Finite game) The sets , , , , , , are finite.
Special Cases:
We present several instances of dynamic decision problems with asymmetric information that are special cases of the general model described above.
1) Realtime source codingdecoding [27]: Consider a data source that generates a random sequence that is th order Markov, i.e. for every sequence of realizations , for . There exists an encoder (agent ) who observes at every time ; the encoder has perfect recall. At every time , based on his available data , the encoder transmits a signal through a noiseless channel to a decoder (agent ), where denotes the transmission alphabet. At the receiving end, at every time
, the decoder wants to estimate the value of
(with delay ) as based on his available data ; we assume that the decoder has perfect recall. The encoder and decoder choose their joint codingdecoding policy so as to minimize the expected total distortion function given by , where denotes the instantaneous distortion function. To capture the abovedescribed model within the context of our model, we need to define an augmented system state that includes the last states realizations as . Moreover, the encoder’s (agent ’s) observation is given by and the decoder’s (agent ’s) observation is given by , where . The encoder’s and decoder’s instantaneous utility are given by a distortion function .2) Delayed sharing information structure [19, 20, 28, 18]: Consider a agent decision problem where agents observe each others’ observations and actions with step delay. We note that in our model we assume that the agents’ common observation at is only a function of and and . Therefore, to describe the decision problem with delayed sharing information structure within the context of our model we need to augment our state space to include the agents’ last observations and actions as part of the augmented state. Define as the augmented system state where , ; that is, serves as a temporal memory for the agents’ observations and actions at . Then, we have and .
3) Realtime multiterminal communication [29]: Consider a realtime communication system with two encoders (agents and ) and one receiver (agent ). The two encoders make distinct observations and
of a Markov source. The encoders’ observation are conditionally independent Markov chains. That is, there is an unobserved random variable variable
such that , andEach encoder encodes, in realtime, its observations into a sequence of discrete symbols and sends it through a memoryless noisy channel characterized by a transition matrix , . The receiver wants to construct, in real time, an estimate of the state of the Markov source based on the channels’ output ,. All agents have the same instantaneous utility given by a distortion function .
4) Optimal remote and local controller [30, 31]: Consider a decentralized control problem for a Markovian plant with two controllers, a local controller (agent ) and a remote controller (agent ).
The local controller observes perfectly the state of the Markov chain, and sends his observation through a packetdrop channel to the remote controller. The transmission is successful, i.e. , with probability and is not successful, i.e. , with probability . We assume that the local controller receives an acknowledgment every time the transmission is successful. The controllers’ joint instantaneous utility is given by a .
Iii Strategies and Beliefs
In a dynamic decision problem with asymmetric information agents have private information about the evolution of the system, and they do not observe the complete history , . Therefore, at every time , each agent, say agent , needs to form (i) an appraisal about the current state of the system and the other agents’ information (appraisal about the history), and (ii) an appraisal about how other agents will play in the future (appraisal about the future), so as to evaluate the performance of his strategy choices.
When agents are nonstrategic, the agents’ strategies are known to all agents. Therefore, agent can form these appraisals by using his private information along with the commonly known strategies . Specifically, agent can utilize his own information at , along with (i) the past strategies and (ii) the future strategies to form these appraisals about the history and the future of the overall system, respectively. As a result, the outcome of decision problems with nonstrategic agents can be fully characterized by the agents’ strategy profile .^{2}^{2}2We discuss the decision problems with strategic agents in the companion paper [2].When agents are strategic each agent may have incentive to deviate an any time from the strategy the other agents commonly believe he uses if it is profitable to him (see [2] for more discussion).
However, we need to know the entire strategy profile for all agents and at all times to form these appraisals so as to evaluate the performance of an arbitrary strategy , at any time and for any agent . Therefore, we must work with the strategy profile as a whole irrespective of the length of the time horizon . Consequently, the computational complexity of determining a strategy profile that satisfies certain conditions (e.g. an optimal strategy profile in teams) grows doubly exponentially in since the domain of agents’ strategy (i.e. ) and the number of temporally interdependent decision problems (one for each time instance) grows with . As a result, the analysis of such decision problems is very challenging in general [32].
An alternative conceptual approach for the analysis of decision problems is to define a belief system along with the strategy profile . For every agent , at every time , define as the agent ’s belief about conditioned on the realization of , that is, . The belief provides an intermediate instrument that encapsulates agent ’s appraisal about the past. Therefore, agent can evaluate the performance of any action using only the belief along with the future strategy profile . However, the belief is dependent on in general since the probability distribution depends on . Therefore, the introduction of a belief system offers an equivalent problem formulation that does not necessarily break the intertemporal dependence between and and does not simplify the analysis of decision problems.
Nevertheless, the definition of a belief system has been shown to be suitable for the analysis of singleagent decision making problems (POMDP) for the following reasons. First, in POMDPs, under perfect recall, the probability distribution is independent of ; this is known as the policyindependence property of beliefs in stochastic control. Second, the complexity of the belief function does not grow over time since at every time the agent only needs to form a belief about , which has a timeinvariant domain. As a result, we can sequentially decompose the problem over time to a sequence of static decision problems with timeinvariant complexity; such a decomposition leads to a dynamic program. At each stage of the dynamic program, we specify by determining an action for each realization of the belief fixing the future strategies . Therefore, the computational complexity of the analysis is reduced from being exponential in to linear in .
Unfortunately, the above approach for POMDPs does not generalize to decision problems with many agents. This is because of three reasons. First, with many agents, currently in the literature, there exists no information state for each agent that provides a compression of the agent’s information, in a mutually consistent manner among the agents, that is sufficient for decision making purposes. Therefore, an agent’s, say agent ’s, strategy has a growing domain over time. Second, at every time , each agent needs to form a belief about the system state as well as the other agents’ private information that has a growing domain. Therefore the complexity of belief functions grows over time. Third, in decision problems with many agents, the policyindependence property of belief does not hold in general and the agents’ beliefs at every time depend on the past strategy profile . Therefore, the agents’ beliefs are correlated with one another. This correlation depends on , and thus, it is not known a priori. Consequently, if we follow an approach similar to that of POMDP to sequentially decompose the problem, we need to solve the decision problem at every stage for every arbitrary correlation among the agents’ belief functions, and such a problem is not tractable.^{3}^{3}3Alternatively, one can consider arbitrary correlation among the agents’ information rather than their beliefs. This is the main idea that underlies the designer’s approach proposed by Winstenhausen [25]. Please see Section VIB for more discussion. Hence, the methodology proposed for the study of POMDPs is not directly applicable to decision problems with many agents and nonclassical information structures.
In the sequel, we propose a notion of sufficient private information and sufficient common information as a mutually consistent compression of the agents’ information for decision making purposes. Therefore, we address (partially) the first two problems on the growing domain of the agents’ beliefs and strategies. We provide instances of decision problems where we can discover timeinvariant information state for each agent. We then utilize the agents’ sufficient common information as a coordination instrument, and thus, capture the implicit correlation among the agents’ beliefs over time. Accordingly, we present a sequential decomposition of the original decision problems such that at every stage the complexity of the decision problem is similar to that of a static decision multiagent problem and the size of state variable at every stage is proportional to the dimension of the sufficient private information; thus, we (partially) address the third problem discussed above.
Iv Sufficient Information
We present the sufficient information approach and characterize an information state that results from compressing the agents’ private and common information in a mutually consistent manner. Therefore, we introduce a class of strategy choices that are simpler than general strategies as they require agents to keep track of only a compressed version of their information over time. We proceed as follows. In Section IVA we provide conditions sufficient to determine the subset of private information an agent needs to keep track of over time for decision making purposes. In Section IVB, we introduce the notion of sufficient common information as a compressed version of the agents’ common information that along with sufficient private information provides an information state for each agent. We then show, in Section V, that this compression of the agents’ private and common information provides a sufficient statistic in dynamic decision problems with nonstrategic agents. In Section VII, we provide a generalization of sufficient information approach presented here.
Iva Sufficient Private Information
The key ideas for compressing an agent’s private information appear in Definitions 1 and 2 below. To motivate these definitions we first consider the decision problem with single agent, that is, a Partially Observed Markov Decision Process (POMDP), which is a special case of the model described in Section II where , and for all .
In a POMDP, the agent’s belief about the system state conditioned on his history realization is an information state. We highlight the three main proprieties that underlie the definition of information state in POMDP (see [33, 34]): (1) the information state can be updated recursively, that is, at any time the information state at can be written as a function of the information state at and the new information that becomes available at , (2) the agent’s belief about the information state at the next time conditioned on the current information state and action is independent of his information history, and (3) at any time and for any arbitrary action the agent’s expected instantaneous utility conditioned on the information state is independent of his information history.
We generalize the key properties of information state for POMDPs, described above, to decision problems with many agents. We propose a set of conditions sufficient to compress the agents’ private information in two steps. First, we consider a decision problem with many agents where there is no signaling among them. Motivated by the definition of information state in POMDPs, we describe conditions sufficient to determine a compression of the agents’ private information (Definition 1). Next, we build on Definition 1 as an intermediate conceptual step, and consider the case where agents are aware of possible signaling among them. Accordingly, we present a set of conditions sufficient to determine a compression of the agents’ private information in decision problems with many agents (Definition 2) .
Therefore, we first characterize subsets of an agent’s private information that are sufficient for the agent’s decision making process when there is no signaling among the agents.
Definition 1 (Private payoffrelevant information).
Let denote a private signal that agent forms at based on his private information and common information . We say is a private payoffrelevant information for agent if, for all openloop strategy profile and for all ,

it can be updated recursively as

for all realizations it satisfies

for all realizations such that ,
By assuming that all other agents play openloop strategies we remove the interdependence between agents ’s strategy choices and agent ’s information structure, thus, we eliminate signaling among the agents. Fixing the openloop strategies of agents , agent faces a centralized stochastic control problem. Definition 1 says that , , is a private payoffrelevant information for agent if (i) it can be recursively updated, (ii) includes all information in that is relevant to and (iii) agent ’s instantaneous conditional expected utility at any is only a function of , and his action at . These three conditions are similar to properties (1)(3) for an information state in POMDP, but they concern only agent ’s private information instead of the collection of his private and common information.^{5}^{5}5We note that we interpret a centralized control problem as a special case of our model where , and for all , Definition 1 coincides with the definition of information state for the single agent decision problem. We would like to point out that conditions (i)(iii) can have many solutions including the trivial solution . ^{4}^{4}4An interesting research direction is to determine whether a minimal private payoffrelevant information exists, and if so, characterize such a minimal payoffrelevant information. However, such a direction is beyond the scope of this chapter, and we leave this topic for future research.
While the definition of private payoffrelevant information suggests a possible way to compress the information required for an agent’s decision making process, it assumes that other agents play openloop strategies and do not utilize the information they acquire in realtime for decision making purposes (i.e. no signaling). However, openloop strategies are not in general optimal for agents . As a result, to evaluate the performance of any strategy choice agent needs also to form a belief about the information that other agents utilize to make decisions.
Definition 2 (Sufficient private information).
We say , , , is sufficient private information for the agents if,

it can be updated recursively as
(5) 
for any strategy profile and for all realizations of positive probability,
(6) where for ;

for every strategy profile of the form and , ;
(7) for all realizations of positive probability where for ;

given an arbitrary strategy profile of the form , , and ,
(8) for all realizations of positive probability where for .
There are four key differences between the definition of sufficient private information and that of private payoff relevant information. First, we allow that the definition and the update rule of sufficient information to depend on the agents’ strategies . Second, comparing to part (ii) of Definition 1, part (ii) of Definition 2 requires that sufficient information includes all information relevant to the realization of in addition to the information relevant to the realization of . As we discuss further in Section VI, this is because when signaling occurs in a multiagent decision problems agents need to have a consistent view about future commonly observable events. Third, comparing part (iii) of Definition 2 to part (iii) of Definition 1, we note that the probability measures in Definition 2 depend on the strategy profile instead of the opeloop strategy profile . Fourth, in part (iv) of Definition 2 there is an additional condition requiring that agent ’s sufficient private information must be rich enough so that he can form beliefs about agents ’s sufficient private information ; such a condition is absent in Definition 1.
In general, the notion of sufficient private information is more restrictive than that of private payoff relevant information . This is because, , , needs to satisfy the additional condition (iv), and furthermore, openloop strategies are a strict subset of closed loop strategies. Definition 2 provides (sufficient) conditions under which agents can compress their private information in a “mutually consistent’ manner. We would like to point out that conditions (i)(iv) of Definition 2 can have many solutions including the trivial solution .^{6}^{6}6We do not discuss the possibility of finding a minimal set of sufficient private information in this chapter, and leave it for future research as such investigation is beyond the scope of this chapter.
IvB Sufficient Common Information
Based on the characterization of sufficient private information, we present a statistic (compressed version) of the common information that agents need to keep track of over time for decision making purposes.
Fix a choice of sufficient private information , . Define to be the set of all possible realizations of , and . Given the agents’ strategy profile , let
denote a mapping that determines a conditional probability distribution over the system state
and all the agents’ sufficient private information conditioned on the common information at time as(9) 
for all .
We call the collection of mappings a sufficient information based belief system (SIB belief system). Note that is only a function of the common information , and thus, it is computable by all agents. Let denote the (random) common information based belief that agents hold under belief system at . We can interpret as the common belief that each agent holds about the system state and all the agents’ (including himself) sufficient private information at time . We call the SIB belief a sufficient common information for the agents. In the rest of the paper, we write and drop the superscript whenever such a simplification in notation is clear. Moreover, we use the terms sufficient common information and SIB belief interchangeably.
IvC Sufficient Information based Strategy
The combination of sufficient private information and sufficient common information (the SIB belief) offers a mutually consistent compression of the agents’ private and common information. Consider a class of strategies that are based on the information given by for each agent at time . We call the mapping a Sufficient Information Based (SIB) strategy for agent at time . A SIB strategy determines a probability distribution for agent ’s action at time given his information . A SIB strategy is a strategy where agents only use the sufficient common information (instead of complete common information ), and the sufficient private information (instead of complete private information ). A collection of SIB strategies is called a SIB strategy profile . The set of SIB strategies is a subset of general strategies, defined in Section II, as we can define,
(10) 
We note that from Definition 2 and (9), the realizations and at only depends on . Therefore, strategies , defined above via (10) needs to be determined iteratively as follows; for , ; for , ; ; for , . Therefore, strategy is welldefined for all and .
IvD Sufficient Information based Update Rule
When the agents play a SIB strategy profile , it is possible to determine the SIB belief recursively over time based on and the new common information via Bayes’ rule. Let describe such a update rule for time so that
(11) 
We note that the SIB update rule depends on the SIB strategy profile at . In the rest of the paper, we drop the superscript whenever such a simplification in notation is clear.
IvE Special Cases
We consider the special cases (1)(3) of the general model we presented in Section II, and identify the sufficient private information ; we discuss the application of sufficient information approach to special case (4) in Section VII.
1) Realtime source codingdecoding: The encoder’s and decoders’ private information are given by and , respectively. The agents’ common information is given by . We can verify that and satisfy the conditions of Definition 2 ; this is similar to the structural results in [27, Sections III and VI]. Consequently, the common information based belief is .
2) Delayed sharing information structure: We have and . Since we do not assume any specific structure for the system dynamics and the agents’ observations, agent ’s complete private information is payoffrelevant for him. Therefore, we set . Consequently, we have . The above sufficient information appears in the first structural result in [18].
3) Realtime multiterminal communication: We have , , , and . It is easy to verify that , , and ; this sufficient information corresponds to the structural results that appear [29].
V Main Results
In this section, we present our main results for the analysis of dynamic decision problems with asymmetric information and nonstrategic agents using the notion of sufficient information. We first provide a generalization of the policyindependence property of beliefs to decision problems with many agents (Theorem 1). Second, we show that the set of SIB strategies are rich enough so that restriction to them is without loss of generality (Theorem 2). That is, given any strategy profile , there exists a SIB strategy profile such that every agent gets the same flow of utility over time under as the one under . Third, we consider dynamic team problems with asymmetric information. We show that using the SIB strategies, we can decompose the problem sequentially over time, formulate a dynamic program, and determine a globally optimal policy via backward induction (Theorem 3).
Theorem 1 (Policyindependence belief property).
(i) Consider a general strategy profile . If agents play according to strategies , then for every strategy that agent plays,
(12) 
(ii) Consider a SIB strategy profile along with the associated update rule . If agents play according to SIB strategies , then for every general strategy that agent plays,
(13) 
Theorem 1 provides a generalization of the policyindependence belief property for the centralized stochastic control problem [3] to multiagent decision making problems. Part (i) of Theorem 1 states that, under perfect recall, agent ’s belief is independent of his actual strategy . Part (ii) of Theorem 1 refers to the case where agents play SIB strategies and update their SIB belief according to SIB update rule . The update rule is determined based on via Bayes’ rule, where denotes the SIB strategy that agents assume agent utilizes. Equation (13) states that even if agent unilaterally and privately deviates from his SIB strategy, his belief is independent of his actual strategy , and only depends on the other agents’s strategy as well as the other agents’ assumption about the SIB strategy (or equivalently the SIB update rule ).^{7}^{7}7The results of Theorem 1 provides a crucial property for the analysis of decision problems with strategic agents. This is because it ensures that an agent’s unilateral deviation does not influence his belief (see the companion paper [2] for more details).
In POMDPs it is shown that restriction to Markov strategies is without loss of optimality. We provide a generalization of this result to decision problems with many agents. We show that restriction to SIB strategies is without loss of generality in nonstrategic settings given that the agents have access to a public randomization device. We say that the agents have access to a public randomization device if at every time they observe a public random signal
that is completely independent of all events and primitive random variables in the decision problem and is uniformly distributed on
, and is independent across time. As a result, in general, at every , all agents can condition their actions on the realization of as well as their own information. In other words, a public randomization device enables the agents to play correlated randomized strategies. We denote by agent ’s SIB strategy using the public randomization device for every and .Theorem 2.
Assume that the nonstrategic agents have access to a public randomization device. Then, for any strategy profile there exists an equivalent SIB strategy profile that results in the same expected flow of utility, i.e.
(14) 
for all and .
We provide an intuitive explanation for the result of Theorem 2 below. For every agent , his complete information history at any time consists of two components: (i) one component captures his information about past events that is relevant to the continuation decision problem; and (ii) another component that, given the first component, captures the information about past events that is irrelevant to the continuation decision problem. We show that the combination of sufficient private information and sufficient common information contains the first component. Nevertheless, in general, the agents can coordinate their action by incorporating the second component into their decision since their information about the past events is correlated. Let denote the part of agent ’s information that is not captured by . We show that the set of are jointly independent of (Lemma 2 in the Appendix). Therefore, at every time , we can generate a set of signals , one for each agent, using the public randomization device so that they are identically distributed as . Using the signals along with the information state for every agent , we can thus recreate a (simulated) history that is identically distributed to . This implies that, given a public randomization device , it is sufficient for each agent to only keep track of instead of his complete history , and play a SIB strategy to achieve an identical (in distribution) sequence of outcomes per stage as those under the strategy profile .
The result of Theorem 2 states that the the class of SIB strategies characterizes a set of simpler strategies where the agents only keep track of a compressed version of their information rather than their entire information history. Moreover, the restriction to the class of SIB strategies is without loss of generality. Thus, along with results appearing in the companion paper [2], the result of Theorem 2 suggests that the sufficient information approach proposed in this paper presents a unified methodology for the study of decision problems with many nonstrategic or strategic agents and asymmetric information.
We would like to discuss the implication of Theorem 2 for two special instances of our model. First, when , there is no need for a public randomization device since the single decision maker does not need to correlate the outcome of his randomized strategy with any other agent. Therefore, the result of Theorem 2 states that the restriction to Markov strategies in POMDPs is without loss of generality. Second, when and the agents have identical utilities, i.e. dynamic teams, utilizing a public randomization device does not improve the performance. This is because, in dynamic teams a randomized strategy profile is optimal if and only if it is optimal for every realization of the randomization. Therefore, the restriction to SIB strategies in dynamic teams is without loss of optimality.
Using the result of Theorem 2, we present below a sequential decomposition of dynamic teams over time. We formulate a dynamic program that enables us to determine a globally optimal strategy profile via backward induction.
Theorem 3.
A SIB strategy profile is a globally optimal solution to a dynamic team problem with asymmetric information if it solves the following dynamic program:
Comments
There are no comments yet.