# A topology for Team Policies and Existence of Optimal Team Policies in Stochastic Team Theory

In this paper, we establish the existence of team-optimal policies for static teams and a class of sequential dynamic teams. We first consider the static team problems and show the existence of optimal policies under certain regularity conditions on the observation channels by introducing a topology on the set of policies. Then we consider sequential dynamic teams and establish the existence of an optimal policy via the static reduction method of Witsenhausen. We apply our findings to the well-known counterexample of Witsenhausen and the Gaussian relay channel problem.

Comments

There are no comments yet.

## Authors

• 5 publications
• ### Non-signaling Approximations of Stochastic Team Problems

In this paper, we consider non-signaling approximation of finite stochas...
05/17/2019 ∙ by Naci Saldı, et al. ∙ 0

read it

• ### Logical Team Q-learning: An approach towards factored policies in cooperative MARL

We address the challenge of learning factored policies in cooperative MA...
06/05/2020 ∙ by Lucas Cassano, et al. ∙ 0

read it

• ### Optimal control of robust team stochastic games

In stochastic dynamic environments, team stochastic games have emerged a...
05/16/2021 ∙ by Feng Huang, et al. ∙ 0

read it

• ### Dynamic Games among Teams with Delayed Intra-Team Information Sharing

We analyze a class of stochastic dynamic games among teams with asymmetr...
02/23/2021 ∙ by Dengwang Tang, et al. ∙ 0

read it

• ### Adaptive Agent Architecture for Real-time Human-Agent Teaming

Teamwork is a set of interrelated reasoning, actions and behaviors of te...
03/07/2021 ∙ by Tianwei Ni, et al. ∙ 12

read it

• ### My Team Will Go On: Differentiating High and Low Viability Teams through Team Interaction

Understanding team viability – a team's capacity for sustained and futur...
10/14/2020 ∙ by Hancheng Cao, et al. ∙ 0

read it

• ### TAIP: an anytime algorithm for allocating student teams to internship programs

In scenarios that require teamwork, we usually have at hand a variety of...
05/19/2020 ∙ by Athina Georgara, et al. ∙ 0

read it

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Team decision theory has been introduced by Marschak [1] to study the behaviour of a group of agents who act collectively in a decentralized fashion in order to optimize a common cost function. Radner [2] established fundamental results for static teams, and in particular, demonstrated connections between person-by-person optimality and team-optimality. Witsenhausen’s seminal papers [3, 4, 5, 6, 7, 8] on dynamic teams and characterization and classification of information structures have been crucial in the progress of our understanding of dynamic teams. Particularly, well-known counterexample of Witsenhausen [8] demonstrated the challenges that arise due to a decentralized information structure in such models. We refer the reader to [9] for a more comprehensive overview of team decision theory and a detailed literature review.

The key underlying difference of team decision problems from the classical (centralized) decision problems is the decentralized nature of the information structure; that is, agents cannot share all the information they have with other agents. This decentralized nature of the information structure prevents one to use classical tools in centralized decision theory such as dynamic programming, convex analytic methods, and linear programming. For this reason, establishing the existence and structure of optimal policies is a quite challenging problem in team decision theory.

In this paper, our aim is to study the existence of an optimal policy for team decision problems. In particular, we are interested in sequential team models.

In the literature relatively few results are available on the existence of team optimal solutions. Indeed, so far, existence of optimal policies for static teams and a class of sequential dynamic teams has been studied recently in [10, 11]

. In these papers, existence of team optimal policies is established via strategic measure approach where strategic measures are the probability measures induced by policies on the product of state space, observation spaces, and action spaces. In this approach, one first identifies a topology on the set of strategic measures and then proves the relative compactness of this set along with the lower semi-continuity of the cost function. If the set of strategic measures are closed, then one can show the existence of optimal policy via Weierstrass Extreme Value Theorem. However, to establish the closeness of the set of strategic measures, one needs somewhat strong assumptions on the observation channels. For instance, conditions imposed in

[10, Assumption 3.1] to establish the closeness of the set of strategic measures with respect to the weak topology implies that observation and their reverse channels are uniformly continuous with respect to the total variation distance. The reason for imposing such a strong condition on the observation channel is that convergence with respect to the topology defined on the set of strategic measures does not in general preserve the information structure of the problem (see, e.g., [11, Theorem 2.7]).

In this paper, we prove the existence of team optimal policies under the assumption that the observation channels are continuous with respect to the total variation distance and we did not put any restriction on the reverse channels. Unlike strategic measure approach, we introduce a topology on the set of policies, inspired by the topology introduced in [12, Section 2.4], instead of the set of strategic measures. In this way, we can preserve the information structure of the problem under the convergence of this topology. We first establish the result for static teams. Then, using static reduction of Witsenhausen, we consider the sequential dynamic teams and prove the existence of team optimal solutions using the result in static case. We then apply our findings to counterexample of Witsenhausen and Gaussian relay channel problem.

The rest of the paper is organized as follows. In Section II we review the definition of Witsenhausen’s intrinsic model for sequential team problems. In Section III we prove the existence of team optimal solutions for static team problems. In Section IV we consider the existence of an optimal policy for dynamic team problems via the static reduction method. In Sections V and VI we apply the results derived in Section IV to study the existence of optimal policies for Witsenhausen’s counterexample and the Gaussian relay channel. Section VII concludes the paper.

### I-a Notation and Conventions

For a metric space , the Borel -algebra (the smallest -algebra that contains the open sets of ) is denoted by . We let and denote the set of all continuous real functions on vanishing at infinity and the set of all continuous real functions on with compact support, respectively. For any , let denote its support. Let and denote the set of all finite signed measures and probability measures on , respectively. A sequence of finite signed measures on is said to converge with respect to total variation distance (see [13]) to a finite signed measure if . A sequence of finite signed measures on is said to converge weakly (see [13]) to a finite signed measure if for all bounded and continuous real function on . Let and be two metric spaces. For any , we denote by the marginal of on . Let be a finite product space. For each with , we denote and . A similar convention also applies to elements of these sets which will be denoted by bold lower case letters. For any set , let denote its complement. Unless otherwise specified, the term ‘measurable’ will refer to Borel measurability in the rest of the paper.

## Ii Intrinsic Model for Sequential teams

Witsenhausen’s intrinsic model [4] for sequential team problems has the following components:

 {(X,X),P,(Ui,Ui),(Yi,Yi),i=1,…,N,}

where Borel spaces (i.e., Borel subsets of complete and separable metric spaces) , , and () endowed with Borel -algebras denote the state space, and action and observation spaces of Agent , respectively. Here is the number of actions taken, and each of these actions is supposed to be taken by an individual agent (hence, an agent with perfect recall can also be regarded as a separate decision maker every time it acts). For each , the observations and actions of Agent  is denoted by and , respectively. The -valued observation variable for Agent  is given by , where is a stochastic kernel on given [14, Definition C.1]. A probability measure on describes the uncertainty on the state variable .

A control strategy , also called policy, is an -tuple of measurable functions such that , where is a measurable function from to . Let denote the set of all admissible policies for Agent ; that is, the set of all measurable functions from to and let .

We note that the intrinsic model of Witsenhausen uses a set-theoretic characterization; however, for Borel spaces, the model above is equivalent to the intrinsic model for sequential team problems.

Under this intrinsic model, a sequential team problem is dynamic if the information available to at least one agent  is affected by the action of at least one other agent . A decentralized problem is static, if the information available at every decision maker is only affected by state of the nature; that is, no other decision maker can affect the information at any given decision maker.

For any , we let the (expected) cost of the team problem be defined by

 J(γ––)\coloneqqE[c(x,y,u)],

for some measurable cost function , where and .

###### Definition 1.

For a given stochastic team problem, a policy (strategy) is an optimal team decision rule if

 J(γ––∗)=infγ––∈ΓJ(γ––)=:J∗.

The cost level achieved by this strategy is the optimal team cost.

In what follows, the terms policy, measurement, and agent are used synonymously with strategy, observation, and decision maker, respectively.

### Ii-a Auxiliary Results

To make the paper as self-contained as possible, in this section we review some results in probability theory and functional analysis that will be used in the paper.

The first result is Prokhorov’s theorem which gives a sufficient condition for relative compactness in weak topology.

###### Theorem 1.

([14, Theorem E.6]) A set of probability measures on a Borel space is relatively compact with respect to the weak topology if it is tight; that is, for any there exists a compact subset of such that for all , we have , or equivalently, .

###### Proposition 1.

([15, Theorem 3.2]) Let be a probability measure on a Borel space . Then is tight.

###### Proposition 2.

([16, Lemma 4.4]) Let and be two Borel spaces. Let and be tight subsets of and , respectively. Then the set

 F\coloneqq xx{ν∈P(E1×E2):ProjE1(ν)∈F1 and ProjE2(ν)∈F2}

is also tight.

Before next theorem, we should give the following definition.

###### Definition 2.

([10, Definition 4.4]) Let , , and be Borel spaces. A non-negative measurable function is in class if for every and for every compact set , there exists a compact set such that

 infK×Lc×E3φ(e1,e2,e3)≥M.

Using this definition, we now state the following result.

###### Theorem 2.

([10, Lemma 4.5]) Suppose is in class . Let and be a tight set of measures. Define

 F\coloneqq x{ν∈P(E1×E2×E3):ProjE1(ν)∈F1 and ∫φdν≤m}.

Then is a tight set of measures.

The last result is about convergence of bilinear forms constituting duality between a Banach space and its topological dual, when both terms in bilinear form converges in some sense.

###### Proposition 3.

Let be a Banach space with its topological dual , where the bilinear form that constitutes duality is denoted by , and . Suppose and with respect to -topology; that is, for all . Then we have as .

###### Proof.

Suppose and with respect to -topology. Then we have

 ∣∣⟨e∗n,en⟩−⟨e∗,e⟩∣∣ ≤∥e∗n∥∥en−e∥+∣∣⟨e∗n,e⟩−⟨e∗,e⟩∣∣.

The second term in the last expression converges to zero as with respect to -topology. Note that by Uniform Boundedness Principle [17, Theorem 5.13]. Hence the first term in the last expression also converges to zero as . ∎

## Iii Existence of the Optimal Strategy for Static Team Problems

In this section, we show the existence of optimal strategy for static teams. Recall that is a probability space representing the state space, where is a Borel space and is its Borel -algebra. We consider an -agent static team problem in which Agent (

) observes a random variable

and takes an action , where takes values in a Borel space and takes values in a Borel space . Given any state realization , the random variable has a distribution ; that is, is a stochastic kernel on given .

The team cost function is a non-negative function of the state, observations, and actions; that is, , where and . To prove the existence of team-optimal policies, we enlarge the space of strategies where each agent can also apply randomized strategies; that is, for Agent , the set of strategies is defined as

 Γi\coloneqq{γi:γi(⋅|yi) is a stochastic kernel on Ui given Yi}.

We first prove the existence of optimal randomized strategy. Then, using Blackwell’s irrelevant information theorem [18], we deduce that the optimal strategy can be chosen deterministic which therefore solves the problem for the original setup.

Recall that . Then, the cost of the team is given by

 J(γ––)=∫X×Y×Uc(x,y,u)γ––(du|y)P(dx,dy),

where . Here, with an abuse of notation,

denotes the joint distribution of the state and observations. Therefore, we have

 J∗=infγ––∈ΓJ(γ––).

For any strategy , we let denote the probability measure induced on . In the literature, is called strategic measure.

In this section, we impose the following assumptions.

###### Assumption 1.
• The cost function is lower semi-continuous.

• , , and () are locally compact.

• For all , is continuous with respect to the total variation distance.

• For all , for some probability measure on .

###### Remark 1.

Note that, for all , if is continuous in and for some -integrable , then Assumption 1-(c) holds. Indeed, let in . Then we have

 ∥∥Wi(⋅|xn)−Wi(⋅|x) ∥∥TV =∫Yi∣∣qi(yi,xn)−qi(yi,x)∣∣μi(dyi).

The last expression goes to as by dominated convergence theorem.

###### Remark 2.

One common approach that is used in the literature [10, 11] to show the existence of team-optimal policies is strategic measure approach. In this approach, one first identifies a topology on the set of strategic measures (in general, weak topology) and then proves the relative compactness of along with lower semi-continuity of the cost function with respect to this topology. Then, if is closed with respect to this topology, then one can deduce the existence of an optimal policy via Weierstrass Extreme Value Theorem. The main problem in this approach is to prove the closeness of , because convergence with respect to the topology defined on does not in general preserve the statistical independence of the actions given the observations; that is, in the limiting strategic measure, action of Agent  may depend on observation of Agent  which is prohibited in the original problem (see, e.g., [11, Theorem 2.7]). Hence, to overcome this obstacle, in this paper we directly introduce a topology on the set policies instead of the set of strategic measures . By this way, in the limiting measure, we can preserve the statistical independence of actions given the observations.

### Iii-a Topology on the Set of Policies Γ

In this section we introduce a topology on the set of policies , which will be used to obtain the existence of team-optimal policies. To this end, we first identify a topology on for each . Fix any .

Recall that we denote by , , and the set of real continuous functions vanishing at infinity on , the set of finite signed measures on , and the set of probability measures on , respectively. For any , let which turns into a Banach space. Let denote the total variation norm on which turns into a Banach space.

###### Theorem 3.

[17, Theorem 7.17] For any and , let , where

 ⟨g,ν⟩\coloneqq∫Uigdν.

Then the map is an isometric isomorphism from to . Hence, we can identify with .

A function is called -measurable [19, p. 18] if the mapping is measurable for all . Let denote the set of all such functions. Then, we define the following set

 L∞(μi,M(Ui)) \coloneqq{γ∈L(μi,M(Ui)):∥γ∥∞\coloneqqesssupy∈Yi∥g(y)∥TV<∞},

where is taken with respect to the measure . Recall that is the reference probability measure in Assumption 1-(d) for the observation channel .

A function is said to be simple if there exists and such that . Define the Bochner integral of with respect to as

 ∫Yif(y)μi(dy)\coloneqqn∑i=1giμi(Ei).

A function is said to be strongly measurable, if there exists a sequence of simple functions with -almost everywhere. The strongly measurable function is Bochner-integrable [20] if . In this case, the integral is given by

 ∫Yif(y)μi(dy)=limn→∞∫Yifn(y)μi(dy),

where is the sequence of simple functions which approximates . Let denote the set of all Bochner-integrable functions from to endowed with the norm

 ∥f∥1\coloneqq∫Yi∥f(y)∥∞μi(dyi).

Then, we have the following theorem.

###### Theorem 4.

[19, Theorem 1.5.5, p. 27] For any and , let

 Tγ(f)\coloneqq∫Yi⟨f(y),γ(y)⟩μi(dy).

Then the map is an isometric isomorphism from to . Hence, we can identify with .

By Theorem 4, we equip with -topology induced by ; that is, it is the smallest topology on for which the mapping

 L∞(μi,M(Ui))∋γ↦Tγ(f)∈R

is continuous for all [17]. We write , if converges to in with respect to -topology. Note that, for this topology, we have been in part inspired by the topology introduced in [12, Section 2.4], where in this work, a similar topology is introduced for randomized Markov policies to study continuous-time stochastic control problems with average cost optimality criterion (see [21] for another construction of a topology on Markov policies).

###### Lemma 1.

Suppose such that -a.e.. Then, for all , the mapping is measurable. Hence, is a stochastic kernel.

###### Proof.

Note first that the mapping is measurable for all real, continuous, and bounded on , because any such can be approximated pointwise by satisfying for all . Moreover, for any closed set , one can approximate pointwise the indicator function by continuous and bounded functions , where is the metric on and . This implies that the mapping is measurable for all closed set in . Then the result follows by [22, Proposition 7.25]. ∎

By Lemma 1, we have

 Γi={γ∈L∞(μi,M(Ui)):γ(y)∈P(Ui) μi−a.e.}.

Since is bounded in , by Banach-Alaoglu Theorem [17, Theorem 5.18], is relatively compact with respect to -topology. Since is separable, then by [13, Lemma 1.3.2], is also relatively sequentially compact.

Note that is not closed with respect to -topology. Indeed, let . Define and , where denotes the degenerate measure on ; that is, for all . Let . Then we have

 limn→∞ ∫Yi⟨g(y),γn(y)⟩μi(dy)=limn→∞∫Yig(y)(n)μi(dy) =∫Yilimn→∞g(y)(n)μi(dy) (as ∥g(y)∥∞ is μi-integrable) =0 (as g(y)∈C0(Ui)).

Hence, . But, , and so, is not closed.

In the remainder of this section, is equipped with this topology. In addition, has the product topology induced by these -topologies; that is, converges to in with respect to the product topology if and only if for all . In this case we write . Note that is sequentially relatively compact under this topology.

### Iii-B Existence of Team-Optimal Policies

In this section, using the topology introduced in Section III-A, we prove the existence of an optimal policy under the Assumption 1 and the assumption below. For any , we define

 ΓL \coloneqq{γ––∈Γ:J(γ––)

For each , we define .

###### Assumption 2.

For some , is tight for .

Before we continue with the proof, we will give several conditions that imply Assumption 2.

###### Theorem 5.

Suppose either of the following conditions hold:

• is compact for all .

• For non-compact case, we assume

• The cost function is in class , for all .

• For all , and is lower semi-continuous.

Then, Assumption 2 holds.

###### Proof.

(i): Note that the marginal on of any measure in is . Since is tight by Proposition 1 and is tight by the compactness of , is also tight by Proposition 2.
(ii): We define . Since, for all , is lower semi-continuous and strictly greater than , for any compact set , we have . This implies that is also in class for . Then, by Theorem 2, one can inductively prove that is tight. Indeed, let . Then is in and

 SL⊂{λ∈P(X×Y×U):ProjX×Y(λ)(dx,dy) xxxxxxxx=P(dx)N∏iμi(dyi) and ∫~cdλ≤J∗+L}.

But since is tight, by Theorem 2, is also tight. Suppose the assertion is true for and consider . Note that is in and

 SL⊂{λ∈P(X×Y×U):ProjX×Y×U[1:j](λ)∈ xxxxxxxxProjX×Y×U[1:j](SL) and ∫~cdλ≤J∗+L}.

Since is tight by the induction hypothesis, is also tight by Theorem 2. This completes the proof of assertion. But this result implies that is also tight for all . ∎

Recall that denotes the set of real continuous functions on with compact support. For any , we define

 Jg(γ––)=∫X×Y×Ug(x,y,u)γ––(du|y)P(dx,dy).

We first prove the following result.

###### Theorem 6.

Suppose that as and . Then we have

 limn→∞∣∣Jg(γ––(n))−Jg(γ––)∣∣=0.
###### Proof.

Fix any . Then by Stone-Weierstrass Theorem [23, Lemma 6.1], can be uniformly approximated by functions of the form

 k∑j=1rjN∏i=1fj,igj,i,

where , , and for each and . This implies that it is sufficient to prove the result for functions of the form , where , , and for . Therefore, in the sequel, we assume that .

Let which is a compact subset of as . Then we have

 ∣∣Jg(γ––(n))−Jg(γ––)∣∣ ≤∣∣Jg(γ(n)1,…,γ(n)N)−Jg(γ(n)1,…,γ(n)N−1,γN)∣∣ +∣∣Jg(γ(n)1,…,γ(n)N−1,γN)−Jg(γ(n)1,…,γ(n)N−2,γN−1,γN)∣∣ x⋮ +∣∣Jg(γ(n)1,γ2,…,γN)−Jg(γ1,…,γN)∣∣ \eqqcolonN∑j=1l(n)j.

Let us consider the term in the above expression. Define the probability measure on and real function on as follows:

 T−j\coloneqq(N∏i=j+1γi(dui|yi)qi(yi,x)μi(dyi))× xxxxxxxxxxxxxx(j−1∏i=1γ(n)i(dui|yi)qi(yi,x)μi(dyi))P(dx) and g−j\coloneqqr∏i≠jfigi.

Then the term can be written as

 l(n)j =∣∣∣∫g−j(∫fjgjqjdγ(n)j⊗μj)dT−j xxxxxxxxxxx−∫g−j(∫fjgjqjdγj⊗μj)dT−j∣∣∣.

Define, for each , the function

 bx(yj,uj)\coloneqqfj(yj)gj(uj)qj(yj,x).

One can prove that any is in ; that is almost all and can be approximated by simple functions. We will prove that the set is totally bounded. Indeed, let . Then

 ∥bx−b~x∥1\coloneqq∫Yjsupuj∈Uj∣∣fj(yj)gj(uj)qj(yj,x) xxxxxxxxxxxxx