I Introduction
Active hypothesis testing refers to the problem where an agent is adaptively selecting the most informative sensing action, from a set of available ones, in order to obtain information about an underlying phenomenon of interest (hypothesis). The term “active” emphasizes the fact that the agent can exert some control over the sensing action. This problem was originally introduced by Blackwell [1] in its singleshot version. The “sequential” aspect of this problem refers to the setting where sensing decisions are performed at each time instance based on the available information and state of knowledge of the decision agent, i.e., in a closedloop fashion. This problem generalizes the classical sequential hypothesis testing [2] and has been studied originally by [3].
Decentralized sequential active hypothesis testing (DSAHT) refers to a setting where multiple agents, each with some partial information about the underlying phenomenon of interest, are actively collaborating in order to obtain information about the said phenomenon. Transmission of information over a multiple access channel (MAC) with feedback can be thought of as an instance of a DSAHT problem. Indeed in this setting, two agents (transmitters), each possessing a private message, are actively helping a third agent (receiver) to learn the message pair by transmitting symbols to the common medium modeled as a MAC. The third agent (receiver) observes the noisy channel output, which is also available to the transmitting agents via noiseless feedback, giving rise to a sequential process. The decentralized Wald (nonactive) problem has been studied in [4], and more recently, a more general setting was considered in [5]. A realtime communication system with two encoders communicating with a single receiver over separate noisy channels without feedback was considered in [6].
In the first part of this paper, we formulate the DSAHT over the MAC as a decentralized dynamic team problem. We show that optimal encoders are not required to depend on the entire feedback history, but have a timeinvariant domain. Specifically they only depend on their private message and an appropriately defined posterior belief on the message pair from the viewpoint of the receiver. This result is both intuitive and satisfying as it generalizes the optimal encoding schemes for pointtopoint channels [7, 8]. Furthermore, we show that the optimal encoders are characterized through a dynamic program. Several alternative formulations are discussed involving timehomogenous cost functions and/or variablelength codes, resulting in solutions described through fixedpoint, Bellmantype equations.
In the second part of this paper we discuss how the above results can shed light on the problem of characterizing the MAC feedback capacity. A multiletter capacity expression for DMMAC with noiseless feedback has been established in [9] and restated in [10]. Other than the case of Gaussian channels [11], currently there is no known singleletter capacity expression for general discrete memoryless MACs (DMMACs) with feedback. Leveraging the structural results for the optimal encoders for the DSAHT problem, we show that the capacity expression can be thought of as the average perunittime reward of an appropriately defined Markov controlled process. In order to achieve this structural result, we introduce some new quantities (other thatn the posterior belief on the message pair from the viewpoint of the receiver that was introduced for the DSAHT problem) that summarize the private beliefs of each transmitter for their own messages conditioned on the corresponding channel input and output.
In the following, we denote random variables with capital letters
, their realizations with small letters , and alphabets with calligraphic letters . A sequence is denoted with . We use the notation to denote. The space of probability distributions (or equivalently probability mass functions) on the finite alphabet
is denoted by .Ii Channel Model
We consider a twouser DMMAC. The input symbols , and the output symbol take values in the finite alphabets , and , respectively. The channel is memoryless in the sense that the current channel output is independent of all the past channel inputs and the channel outputs, i.e.,
(1) 
Our model considers noiseless feedback, that is, the presence of the channel output to both encoders with unit delay.
Consider the problem of transmission of messages , over the MAC with noiseless feedback using fixed length codes of length . Encoders generate their channel inputs based on their private messages and past outputs. Thus
(2) 
The decoder estimates the messages
and based on channel outputs, as(3) 
A fixedlength transmission scheme for the channel is the pair , consisting of the encoding functions with and decoding function . The error probability associated with the transmission scheme is defined as
(4) 
A further generalization of these schemes considers randomized encoding functions, i.e.,
(5) 
where or even randomized encoding functions with a common randomness (common between the transmitters and the receiver), i.e.,
(6) 
where , with
the uniform distribution over
. In this case, the decoder is of the form .For simplicity of exposition we only consider fixedlength schemes, although the model can be generalized to variablelength schemes and the subsequent structural results are valid in that case as well.
Iii Decentralized sequential active hypothesis testing on the MAC
One may pose the following optimization problem. Given the alphabets , , , the channel , the pair , and for a fixed length , design the optimal transmission scheme that minimizes the error probability .
(P1) 
In the following we reformulate the problem (P1) into an equivalent optimization problem. Using the “common agent” methodology for decentralized dynamic team problems [12], we now decompose the encoding process into an equivalent twostage process. In the first stage, based on the common information , the mappings (or “partial encoding functions”) , are generated as ^{1}^{1}1We use square brackets to denote functions with range being function sets, i.e., we use notation because is itself a function. (or collectively, ) where . In the second stage, each of these mappings are evaluated at the private information of each agent, producing . In other words, for , let be the collection of all encoding functions . In the first stage, the common information given by is transformed using mappings to produce a pair of encoding functions . In the second stage these functions are evaluated at the private messages producing .
Furthermore, it should be clear that for any pair of encoding functions, the optimal decoder is the ML decoder (assuming equally likely hypotheses), denoted by . Thus we have reformulated problem (P1) as
(P2) 
where we have defined with a slight abuse of notation based on the above equivalence between encoding functions and mappings , as well as the use of ML decoding.
In the following we will show that this problem can be further reformulated as a Markov decision process (MDP). We define the posterior belief
^{2}^{2}2Note that the posterior belief is used as a conditional distribution, and as a random variable on the message pair at time as(7a)  
(7b) 
The ML decoder can now be expressed based on as
(8) 
and the resulting error probability is
(9) 
where we defined the terminal cost function as
(10) 
and the expectation is wrt the random variable .
It is now a simple exercise to show that can be updated using Bayes rule in a policyindependent way as
(11) 
where the mapping is defined through
(12a)  
(12b)  
(12c)  
(12d) 
We summarize the above result into the following lemma.
Lemma 1
The posterior belief on the message pair can be updated in a policyindependent (i.e., independent) way as .
Proof:
The proof is essentially given in (12).
The final step in the “common agent” methodology is to show that a fictitious common agent who observes only the common information faces an MDP with state at time , ; action ; zero instantaneous costs for ; and terminal cost . Indeed,
is a controlled Markov chain, since
(13a)  
(13b)  
(13c) 
At this point we have transformed problem (P2) into the following MDP
(P3) 
As a result, the optimal policy is deterministic Markovian, i.e., of the form (or explicitly, ), resulting in an encoding policy of the form .
Furthermore, the characterization of the optimal Markov policy is the backward dynamic program
(14a)  
(14b)  
(14c) 
All the above results can be summarized in the following theorem
Theorem 1
Proof:
The proof is given in the previous discussion.
We conclude this section by pointing out that the main idea behind the characterization of the optimal solution of the decentralized sequential active hypothesis testing (DSAHT) problem was to transform the decentralized problem (three agents with common and private information) into a centralized problem (single, “fictitious” agent) who observes the common information, of all three agents and takes actions which are then evaluated on the private information to generate the inputs . The price to pay for this reduction is that the action set of the fictitious common agent is now a pair of functions (instead of the transmitted symbols). The gain from this characterization is that the solution can be obtained by backward dynamic programming and the resulting optimal encoding functions do not have a timevarying domain, but can be summarized into a sufficient statistic .
arxiv
Iiia Alternative Objectives and formulations
2col The same structural results can be derived for similar problems where the terminal cost is not the one defined above but an arbitrary function of . Due to space limitations, these alternative formulations are not presented here. They can be found in the full version of the paper [13].
arxiv
The same structural results can be derived for similar problems where the terminal cost is not the one defined above but an arbitrary function of . We mention here three such interesting cases

The first one relates to the entropy or equivalently the negative of the mutual information .
(15a) (15b) (15c) 
The second one relates to the conditional entropy or equivalently the negative of the mutual information .
(16a) (16b) (16c) 
The last one relates to the loglikelihood ratio of the true message pair
(17)
Interestingly, in the above cases the problem can be reformulated so that the terminal cost is distributed into timeinvariant instantaneous costs throughout the transmission, with these instantaneous costs having an intuitive explanation. Indeed, we can define timeinvariant instantaneous cost functions for and eliminate the terminal cost as follows

(18a) (18b) where (18c) (18d) (18e) (18f) (18g) (18h) with (18i) As a result, minimizing the final entropy (or equivalently the final mutual information ) is equivalent to minimizing the cumulative conditional entropy which is also equivalent to maximizing (on the average) of the cumulative drift of the loglikelihood of the true message pair .

(19a) (19b) where (19c) (19d) (19e) (19f) (19g) (19h) with (19i) 
(20a) (20b) where we identify the terms inside the summation as (20c) (20d) (20e) (20f) with (20g) (20h) (20i) where denotes the extrinsic JensenShannon divergence [NaJaWi15].
Clearly one may consider other cost functions, e.g., a linear combination of 1) and 2) or even a linear combination of 1), 2), and the symmetric quantity . Similarly, one can consider linear combination of loglikelihood ratios such as the one appearing in 3) with conditional beliefs , or in place of the joint belief , resulting in timeinvariant instantaneous costs with appropriate EJSrelated quantities.
Since the reformulated problem involves timeinvariant costs and a timehomogenous controlled Markov process, we can extend these results to infinitehorizon formulations with either discounted reward or average reward per unit time. The optimal policy will also be timeinvariant in this case and it is characterized through the solution of the following fixedpoint equations. For instance, for the average reward per unit time we have
(21a)  
(21b) 
We remark at this point, that a similar formulation with infinite horizon and variable length coding where we minimize a linear combination of the error probability and the length of transmission results in exactly the same structural results, i.e., summarizing the common history into the belief and in addition has timeinvariant optimal solutions. This formulation is the decentralized equivalent of the pointtopoint active sequential hypothesis testing discussed in [8].
Iv Connection between DSAHT and the MAC channel capacity
Iva Multiletter capacity expressions
A multiletter capacity expression for DMMAC with noiseless feedback has been established in [9] and can be stated as follows.