Decentralized sequential active hypothesis testing and the MAC feedback capacity

01/11/2020 ∙ by Achilleas Anastasopoulos, et al. ∙ University of Michigan 0

We consider the problem of decentralized sequential active hypothesis testing (DSAHT), where two transmitting agents, each possessing a private message, are actively helping a third agent–and each other–to learn the message pair over a discrete memoryless multiple access channel (DM-MAC). The third agent (receiver) observes the noisy channel output, which is also available to the transmitting agents via noiseless feedback. We formulate this problem as a decentralized dynamic team, show that optimal transmission policies have a time-invariant domain, and characterize the solution through a dynamic program. Several alternative formulations are discussed involving time-homogenous cost functions and/or variable-length codes, resulting in solutions described through fixed-point, Bellman-type equations. Subsequently, we make connections with the problem of simplifying the multi-letter capacity expressions for the noiseless feedback capacity of the DM-MAC. We show that restricting attention to distributions induced by optimal transmission schemes for the DSAHT problem, without loss of optimality, transforms the capacity expression, so that it can be thought of as the average reward received by an appropriately defined stochastic dynamical system with time-invariant state space.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Active hypothesis testing refers to the problem where an agent is adaptively selecting the most informative sensing action, from a set of available ones, in order to obtain information about an underlying phenomenon of interest (hypothesis). The term “active” emphasizes the fact that the agent can exert some control over the sensing action. This problem was originally introduced by Blackwell [1] in its single-shot version. The “sequential” aspect of this problem refers to the setting where sensing decisions are performed at each time instance based on the available information and state of knowledge of the decision agent, i.e., in a closed-loop fashion. This problem generalizes the classical sequential hypothesis testing [2] and has been studied originally by [3].

Decentralized sequential active hypothesis testing (DSAHT) refers to a setting where multiple agents, each with some partial information about the underlying phenomenon of interest, are actively collaborating in order to obtain information about the said phenomenon. Transmission of information over a multiple access channel (MAC) with feedback can be thought of as an instance of a DSAHT problem. Indeed in this setting, two agents (transmitters), each possessing a private message, are actively helping a third agent (receiver) to learn the message pair by transmitting symbols to the common medium modeled as a MAC. The third agent (receiver) observes the noisy channel output, which is also available to the transmitting agents via noiseless feedback, giving rise to a sequential process. The decentralized Wald (non-active) problem has been studied in [4], and more recently, a more general setting was considered in [5]. A real-time communication system with two encoders communicating with a single receiver over separate noisy channels without feedback was considered in [6].

In the first part of this paper, we formulate the DSAHT over the MAC as a decentralized dynamic team problem. We show that optimal encoders are not required to depend on the entire feedback history, but have a time-invariant domain. Specifically they only depend on their private message and an appropriately defined posterior belief on the message pair from the viewpoint of the receiver. This result is both intuitive and satisfying as it generalizes the optimal encoding schemes for point-to-point channels [7, 8]. Furthermore, we show that the optimal encoders are characterized through a dynamic program. Several alternative formulations are discussed involving time-homogenous cost functions and/or variable-length codes, resulting in solutions described through fixed-point, Bellman-type equations.

In the second part of this paper we discuss how the above results can shed light on the problem of characterizing the MAC feedback capacity. A multi-letter capacity expression for DM-MAC with noiseless feedback has been established in [9] and restated in [10]. Other than the case of Gaussian channels [11], currently there is no known single-letter capacity expression for general discrete memoryless MACs (DM-MACs) with feedback. Leveraging the structural results for the optimal encoders for the DSAHT problem, we show that the capacity expression can be thought of as the average per-unit-time reward of an appropriately defined Markov controlled process. In order to achieve this structural result, we introduce some new quantities (other thatn the posterior belief on the message pair from the viewpoint of the receiver that was introduced for the DSAHT problem) that summarize the private beliefs of each transmitter for their own messages conditioned on the corresponding channel input and output.

In the following, we denote random variables with capital letters

, their realizations with small letters , and alphabets with calligraphic letters . A sequence is denoted with . We use the notation to denote

. The space of probability distributions (or equivalently probability mass functions) on the finite alphabet

is denoted by .

Ii Channel Model

We consider a two-user DM-MAC. The input symbols , and the output symbol take values in the finite alphabets , and , respectively. The channel is memoryless in the sense that the current channel output is independent of all the past channel inputs and the channel outputs, i.e.,

(1)

Our model considers noiseless feedback, that is, the presence of the channel output to both encoders with unit delay.

Consider the problem of transmission of messages , over the MAC with noiseless feedback using fixed length codes of length . Encoders generate their channel inputs based on their private messages and past outputs. Thus

(2)

The decoder estimates the messages

and based on channel outputs, as

(3)

A fixed-length transmission scheme for the channel is the pair , consisting of the encoding functions with and decoding function . The error probability associated with the transmission scheme is defined as

(4)

A further generalization of these schemes considers randomized encoding functions, i.e.,

(5)

where or even randomized encoding functions with a common randomness (common between the transmitters and the receiver), i.e.,

(6)

where , with

the uniform distribution over

. In this case, the decoder is of the form .

For simplicity of exposition we only consider fixed-length schemes, although the model can be generalized to variable-length schemes and the subsequent structural results are valid in that case as well.

Iii Decentralized sequential active hypothesis testing on the MAC

One may pose the following optimization problem. Given the alphabets , , , the channel , the pair , and for a fixed length , design the optimal transmission scheme that minimizes the error probability .

(P1)

In the following we reformulate the problem (P1) into an equivalent optimization problem. Using the “common agent” methodology for decentralized dynamic team problems [12], we now decompose the encoding process into an equivalent two-stage process. In the first stage, based on the common information , the mappings (or “partial encoding functions”) , are generated as 111We use square brackets to denote functions with range being function sets, i.e., we use notation because is itself a function. (or collectively, ) where . In the second stage, each of these mappings are evaluated at the private information of each agent, producing . In other words, for , let be the collection of all encoding functions . In the first stage, the common information given by is transformed using mappings to produce a pair of encoding functions . In the second stage these functions are evaluated at the private messages producing .

Furthermore, it should be clear that for any pair of encoding functions, the optimal decoder is the ML decoder (assuming equally likely hypotheses), denoted by . Thus we have reformulated problem (P1) as

(P2)

where we have defined with a slight abuse of notation based on the above equivalence between encoding functions and mappings , as well as the use of ML decoding.

In the following we will show that this problem can be further reformulated as a Markov decision process (MDP). We define the posterior belief

222Note that the posterior belief is used as a conditional distribution, and as a random variable on the message pair at time as

(7a)
(7b)

The ML decoder can now be expressed based on as

(8)

and the resulting error probability is

(9)

where we defined the terminal cost function as

(10)

and the expectation is wrt the random variable .

It is now a simple exercise to show that can be updated using Bayes rule in a policy-independent way as

(11)

where the mapping is defined through

(12a)
(12b)
(12c)
(12d)

We summarize the above result into the following lemma.

Lemma 1

The posterior belief on the message pair can be updated in a policy-independent (i.e., -independent) way as .

Proof:

The proof is essentially given in (12).

The final step in the “common agent” methodology is to show that a fictitious common agent who observes only the common information faces an MDP with state at time , ; action ; zero instantaneous costs for ; and terminal cost . Indeed,

is a controlled Markov chain, since

(13a)
(13b)
(13c)

At this point we have transformed problem (P2) into the following MDP

(P3)

As a result, the optimal policy is deterministic Markovian, i.e., of the form (or explicitly, ), resulting in an encoding policy of the form .

Furthermore, the characterization of the optimal Markov policy is the backward dynamic program

(14a)
(14b)
(14c)

All the above results can be summarized in the following theorem

Theorem 1

The optimization problem (P1) can be restated as an MDP with state at time , ; action ; zero instantaneous costs for ; and terminal cost given in (10). Consequently, the optimal encoders are of the form . Finally, the mapping can be found through backward dynamic programming as in (14).

Proof:

The proof is given in the previous discussion.

We conclude this section by pointing out that the main idea behind the characterization of the optimal solution of the decentralized sequential active hypothesis testing (DSAHT) problem was to transform the decentralized problem (three agents with common and private information) into a centralized problem (single, “fictitious” agent) who observes the common information, of all three agents and takes actions which are then evaluated on the private information to generate the inputs . The price to pay for this reduction is that the action set of the fictitious common agent is now a pair of functions (instead of the transmitted symbols). The gain from this characterization is that the solution can be obtained by backward dynamic programming and the resulting optimal encoding functions do not have a time-varying domain, but can be summarized into a sufficient statistic .

arxiv

Iii-a Alternative Objectives and formulations

2col The same structural results can be derived for similar problems where the terminal cost is not the one defined above but an arbitrary function of . Due to space limitations, these alternative formulations are not presented here. They can be found in the full version of the paper [13].

arxiv

The same structural results can be derived for similar problems where the terminal cost is not the one defined above but an arbitrary function of . We mention here three such interesting cases

  1. The first one relates to the entropy or equivalently the negative of the mutual information .

    (15a)
    (15b)
    (15c)
  2. The second one relates to the conditional entropy or equivalently the negative of the mutual information .

    (16a)
    (16b)
    (16c)
  3. The last one relates to the log-likelihood ratio of the true message pair

    (17)

Interestingly, in the above cases the problem can be reformulated so that the terminal cost is distributed into time-invariant instantaneous costs throughout the transmission, with these instantaneous costs having an intuitive explanation. Indeed, we can define time-invariant instantaneous cost functions for and eliminate the terminal cost as follows

  1. (18a)
    (18b)
    where
    (18c)
    (18d)
    (18e)
    (18f)
    (18g)
    (18h)
    with
    (18i)

    As a result, minimizing the final entropy (or equivalently the final mutual information ) is equivalent to minimizing the cumulative conditional entropy which is also equivalent to maximizing (on the average) of the cumulative drift of the log-likelihood of the true message pair .

  2. (19a)
    (19b)
    where
    (19c)
    (19d)
    (19e)
    (19f)
    (19g)
    (19h)
    with
    (19i)
  3. (20a)
    (20b)
    where we identify the terms inside the summation as
    (20c)
    (20d)
    (20e)
    (20f)
    with
    (20g)
    (20h)
    (20i)

    where denotes the extrinsic Jensen-Shannon divergence [NaJaWi15].

Clearly one may consider other cost functions, e.g., a linear combination of 1) and 2) or even a linear combination of 1), 2), and the symmetric quantity . Similarly, one can consider linear combination of log-likelihood ratios such as the one appearing in 3) with conditional beliefs , or in place of the joint belief , resulting in time-invariant instantaneous costs with appropriate EJS-related quantities.

Since the reformulated problem involves time-invariant costs and a time-homogenous controlled Markov process, we can extend these results to infinite-horizon formulations with either discounted reward or average reward per unit time. The optimal policy will also be time-invariant in this case and it is characterized through the solution of the following fixed-point equations. For instance, for the average reward per unit time we have

(21a)
(21b)

We remark at this point, that a similar formulation with infinite horizon and variable length coding where we minimize a linear combination of the error probability and the length of transmission results in exactly the same structural results, i.e., summarizing the common history into the belief and in addition has time-invariant optimal solutions. This formulation is the decentralized equivalent of the point-to-point active sequential hypothesis testing discussed in [8].

Iv Connection between DSAHT and the MAC channel capacity

Iv-a Multi-letter capacity expressions

A multi-letter capacity expression for DM-MAC with noiseless feedback has been established in [9] and can be stated as follows.

Fact 1 (Theorem 5.1 in [9], [10])

The capacity region of the DM-MAC with feedback is where , the directed information -th inner bound region, is defined as , where denotes the convex hull of a set , and

(22)

where