The noisy 20 questions problem (cf. [12, 2, 18, 10, 6, 4, 9]) arises when one aims to accurately estimate an arbitrarily distributed random variable by successively querying an oracle and using its noisy responses to form an estimate . A central goal in this problem is to find optimal query strategies that yield a good estimate for the unknown target random variable .
Depending on the query design strategy adopted, the 20 questions problem can either be adaptive or non-adaptive. In adaptive query procedures, the design of a subsequent query depends on all previous queries and noisy responses to these queries from the oracle. In non-adaptive query procedures, all the queries are designed independently in advance. For example, the bisection policy [6, Section 4.1] is an adaptive query procedure and the dyadic policy [6, Section 4.2]
is a non-adaptive query procedure. Compared with adaptive query procedures, non-adaptive query procedures have the advantage of lower computation cost, parallelizability and no need for feedback. Depending on whether or not the noisy responses depend on the queries, the noisy 20 questions problem is classified into two categories: querying with measurement-independent noise (e.g.,[6, 4]); and querying with measurement-dependent noise (e.g., [7, 9]). As argued in , measurement-dependent noise better models practical applications. For example, for target localization in a sensor network, the noisy response to each query can depend on the size of the query region due to possible presence of clutter. Another example is in human query systems where personal biases abut the state may affect the response.
In earlier works on the noisy 20 questions problem, e.g., [6, 16, 17], the queries were designed to minimize the entropy of the posterior distribution of the target variable . As pointed out in later works, e.g., [4, 3, 7, 9], other accuracy measures, such as the estimation resolution and the quadratic loss are often better criteria for localization, where the resolution is defined as the absolute difference between and its estimate , , and the quadratic loss is .
Motivated by the scenario of limited resources, computation and response time, we obtain new results on the non-asymptotic tradeoff among the number of queries , the achievable resolution and the excess-resolution probability of optimal adaptive and non-adaptive query procedures for noisy 20 questions estimation of an arbitrarily distributed random variable taking values in the alphabet .
Our contributions for non-adaptive querying are as follows. First, we derive non-asymptotic bounds of optimal non-adaptive query procedures for any number of queries and any excess-resolution probability
. Secondly, applying the Berry-Esseen theorem, under mild conditions on the measurement-dependent noise, we obtain a second-order asymptotic approximation to the achievable resolution of optimal non-adaptive query procedures with a finite number of queries. As a corollary of our result, we establish a phase transition in the excess-resolution probability as a function of the resolution decay rate for optimal non-adaptive query procedures. Finally, we specialize our second-order analyses to measurement-dependent versions of the binary symmetric channel.
Finally, we clarify the differences between our work and . First of all, our results hold for arbitrary discrete channels under mild conditions, while the results in  focused only on a measurement-dependent binary symmetric channel. Furthermore, our proof techniques are significantly different from . The authors in  used large deviations analysis to prove the achievability part and the Fano’s inequality for the converse part. In contrast, our proofs use recent advances in finite blocklength information theory [11, 15]. It is important to note that our non-asymptotic bounds for non-adaptive query schemes are novel and there are no such comparable results in the previous literature including . Finally, our second-order asymptotic result in Theorem 3 refined [7, Theorem 1]. In particular, Theorem 3 provides an approximation to the performance of optimal query procedures employing a finite number of queries while [7, Theorem 1] only characterizes the asymptotic performance when the number of queries tends to infinity.
Ii Problem Formulation
Random variables and their realizations are denoted by upper case variables (e.g., ) and lower case variables (e.g., ), respectively. All sets are denoted in calligraphic font (e.g., ). Let
be a random vector of length. We use
to denote the inverse of the cumulative distribution function (cdf) of the standard Gaussian. We use, and to denote the sets of real numbers, positive real numbers and integers respectively. Given any two integers , we use to denote the set of integers and use to denote . Given any , for any matrix by matrix , the infinity norm is defined as
. The set of all probability distributions on a finite setis denoted as
and the set of all conditional probability distributions fromto is denoted as . Furthermore, we use
to denote the set of all probability density functions on a set. All logarithms are base unless otherwise noted. Finally, we use to denote the indicator function.
Ii-a Noisy 20 Questions Problem
be a continuous random variable defined on the unit intervalwith arbitrary probability density function (pdf) . In the noisy 20 questions problem, a player aims to accurately estimate the value of the target random variable by posing a sequence of queries to an oracle knowing . After receiving the queries, the oracle finds binary answers and passes these answers through a measurement-dependent channel with transition matrix yielding noisy responses . Given the noisy responses , the player uses a decoding function to obtain an estimate of the target variable . Throughout the paper, we assume that the alphabet for the noisy response is finite.
A query procedure for the noisy 20 questions problem consists of the queries and the decoder . In general, these procedures can be classified into two categories: non-adaptive and adaptive querying. In a non-adaptive query procedure, the player needs to first determine the number of queries and then design all the queries simultaneously. In contrast, in an adaptive query procedure, the design of queries is done sequentially and the number of queries is a variable. In particular, when designing the -th query, the player can use the previous queries and the noisy responses from the oracle to these queries, i.e., , to formulate the next query . Furthermore, the player needs to choose a stopping criterion, which may be random, determining the number of queries to make.
In subsequent sections, we clarify the notion of the measurement-dependent channel including concrete examples and we present specific definitions of non-adaptive and adaptive query procedures.
Ii-B The Measurement-Dependent Channel
In this subsection, we describe succinctly the measurement-dependent channel scenario , also known as a channel with state [5, Chapter 7]. Given a sequence of queries , the channel from the oracle to the player is a memoryless channel whose transition probabilities are functions of the queries. Specifically, for any ,
where denotes the transition probability of the channel which depends on the -th query . Given any Lebesgue measurable query , define the size of as its Lebesgue measure, i.e., . Throughout the paper, we consider only Lebesgue measurable queries and assume that the measurement-dependent channel depends on the query only though its size, i.e., is equivalent to a channel with state where the state .
For any , any and any subsets , and of with sizes , and , we assume the measurement-dependent channel is continuous in the sense that there exists a constant depending on only such that
Given any , a channel is said to be a measurement-dependent Binary Symmetric Channel (BSC) with parameter if and for any ,
Ii-C Non-Adaptive Query Procedures
A non-adaptive query procedure with resolution and excess-resolution constraint is defined as follows.
Given any , and , an -non-adaptive query procedure for the noisy 20 questions problem consists of
and a decoder
such that the excess-resolution probability satisfies
We remark that the definition of the excess-resolution probability with respect to is inspired by rate-distortion theory [1, 8]. Our formulation differs from that of  where the authors constrained the -dependent maximum excess-resolution probability, where is the target variable.
Motivated by practical applications where the number of queries are limited (e.g., due to the high cost of queries and low-delay requirement), we are interested in the following non-asymptotic fundamental limit on achievable resolution :
Note that denotes the minimal resolution one can achieve with probability at least using a non-adaptive query procedure with queries. In other words, is the achievable resolution of optimal non-adaptive query procedures tolerating an excess-resolution probability of . Dual to (5) is the sample complexity, determined by the minimal number of queries required to achieve a resolution with probability at least , i.e.,
One can easily verify that for any ,
Thus, it suffices to focus on the fundamental limit .
Iii Main Results
The proof of all results are omitted due to space limitation. Details are available in our extended version .
Iii-a Non-Asymptotic Bounds
We first present an upper bound on the error probability of optimal non-adaptive query procedures. Given any , let be the marginal distribution on
induced by the Bernoulli distributionand the measurement-dependent channel . Furthermore, define the following information density
Correspondingly, for any , we define
as the mutual information density between and .
Given any , for any and any , there exists an -non-adaptive query procedure such that
where the tuple of random variables is distributed as with defined as the Bernoulli distribution with parameter (i.e., ).
Consider the measurement-independent channel where for all . It is straightforward to verify that for any , there exists an -non-adaptive query procedure such that
where the tuple of random variables is distributed as , the information density is defined as
and is induced by and . Comparing the measurement-independent case (11) with the measurement-dependent case (10), the non-asymptotic upper bound (10) in Theorem 1 differs from (11) in two aspects: an additional additive term and an additional multiplicative term in (10). As is made clear in the proof of Theorem 1, the additive term results from the atypicality of the measurement-dependent channel and the multiplicative term appears due to the change-of-measure we use to replace the measurement-dependent channel with the measurement-independent channel .
We next provide a non-asymptotic converse bound to complement Theorem 1. For simplicity, for any query and any , we use to denote .
Set . Any -non-adaptive query procedure satisfies the following. For any and any ,
The proof of Theorem 2 is decomposed into two steps: i) we use the result in  which states that the excess-resolution probability of any non-adaptive query procedure can be lower bounded by the error probability associated with channel coding over the measurement-dependent channel with uniform message distribution, minus a certain term depending on ; and ii) we apply the non-asymptotic converse bound for channel coding [15, Proposition 4.4] by exploiting the fact that, given a sequence of queries, the measurement-dependent channel is simply a time varying channel with deterministic states at each time point.
The non-asymptotic bounds in Theorems 1 and 2 lead to a second-order asymptotic result in Theorem 3, which provides an approximation to the finite blocklength fundamental limit . Furthermore, the exact calculation of the upper bound in Theorem 2 is challenging. However, for sufficiently large, as shown in the proof of Theorem 3, the supremum in (13) can be achieved by queries where each query has the same size.
Iii-B Second-Order Asymptotic Approximation
In this subsection, we present the second-order asymptotic approximation to the achievable resolution of optimal non-adaptive query procedures after queries subject to a worst case excess-resolution probability of .
Given measurement-dependent channels , the channel “capacity" is defined as
Let the capacity-achieving set be the set of optimizers achieving (14). Then, for any , define the following “dispersion” of the measurement-dependent channel
The case of will be the focus of the sequel of this paper. We assume that for any
, the third absolute moment ofis finite. Under this assumption, we obtain the second-order asymptotic result.
For any , the achievable resolution of optimal non-adaptive query procedures satisfies
where the remainder satisfies that .
We make the following remarks.
Firstly, Theorem 3
implies a phase transition in a machine learning sense[14, 13], which we interpret in Figure 1.
We remark that phase transition only appears in the second-order asymptotic analysis and is not revealed by the first-order asymptotic analysis, e.g., that developed in[7, Theorem 1].
Secondly, Theorem 3 refines [7, Theorem 1] in several directions. First, Theorem 3 is a second-order asymptotic result that provides good approximation for the finite blocklength performance while [7, Theorem 1] only characterizes the asymptotic resolution decay rate with vanishing worst-case excess-resolution probability, i.e., . Second, our results hold for any measurement-dependent channel satisfying (2) while [7, Theorem 1] only considers the measurement-dependent BSC.
Thirdly, the dominant event which leads to an excess-resolution in noisy 20 questions estimation is the atypicality of the information density (cf. (9)). To characterize the probability of this event, we make use of the Berry–Esseen theorem and show that the mean
and the varianceof the information density play critical roles.
Finally, we remark that any real number has the binary expansion . We can thus interpret the result in Theorem 3 as follows: using optimal non-adaptive query procedures, after queries, with probability of at least , one can extract the first bits of the binary expansion of the target variable .
Iii-C Case of Measurement-Dependent BSC
In the following, we specialize Theorem 3 to a measurement-dependent BSC. Given any and any , let . For any , the information density of a measurement-dependent BSC with parameter is
It can be verified that the capacity of the measurement-dependent BSC with parameter is given by
where is the binary entropy function.
Depending on the value of , the set of capacity-achieving parameters may or may not be singleton. In particular, for any , the capacity-achieving parameter is unique. When , there are two capacity-achieving parameters and where . It can be verified easily that . As a result, for any capacity-achieving parameter of the measurement-dependent BSC with parameter , the dispersion of the channel is
Set any . If the channel from the oracle to the player is a measurement-dependent BSC with parameter , then Theorem 3 holds with and for any .
We make the following observations.
First, if we let , then for any ,
This strengthens [7, Theorem 1] with strong converse.
Second, when one considers the measurement-independent BSC with parameter , then it can be shown that the achievable resolution of optimal non-adaptive query procedures satisfies
To compare the performances of optimal non-adaptive query procedures under measurement-dependent and measurement-independent channels, we plot in Figure 2 the second-order approximation to the average number of bits (in the binary expansion of the target random variable ) extracted per query after queries, i.e., and for and different values of (the remainder is ignored). We observe an interesting phenomenon. When , optimal query procedures under a measurement-dependent channel achieve a higher resolution than their counterpart in the measurement-independent case. Intuitively, this is because the probability of receiving wrong answers in the measurement-dependent channel is smaller compared with the measurement-independent channel with the same parameter. However, when , we find that the relative performances can be reversed. The reasons for this phenomenon are two fold: i) BSC is a symmetric channel, thus under the measurement-independent setting, having a BSC with crossover probability is equivalent to having a BSC with parameter since one can easily flip all bits; ii) under the measurement-dependent setting, since the probability of receiving wrong answers depends on the size of the query, this symmetric nature of BSC is lost.
Iv Numerical Illustration
In this section, we numerically illustrate the minimal achievable resolution of non-adaptive procedures over a measurement-dependent BSC with parameter . We consider the case where the target random variable
is uniformly distributed over the alphabetand set the target excess-resolution probability . The simulation results for this case is provided in Figure 3, which demonstrate strong agreement with the theoretical result in Corollary 4.
We derived the minimal achievable resolution of non-adaptive query procedures for the noisy 20 questions problem where the channel from the oracle to the player is a measurement-dependent discrete channel. In our extended version , we generalize our results to estimate a multidimensional target over the unit cube and to simultaneously estimate multiple targets. Furthermore, we establish a lower bound on the resolution gain associated with adaptive querying for estimating a target over the unit interval.
In this paper, we were interested in fundamental limits of optimal query procedures. It would be interesting to explore low-complexity practical query procedures and compare the performances of proposed query procedures to our derived benchmarks.
-  (1971) Rate-distortion theory. Wiley Online Library. Cited by: §II-C.
-  (1974) An interval estimation problem for controlled observations. Problemy Peredachi Informatsii 10 (3), pp. 51–61. Cited by: §I.
-  (2016) Sequential measurement-dependent noisy search. In 2016 IEEE Information Theory Workshop (ITW), pp. 221–225. Cited by: §I.
-  (2018) Unequal error protection querying policies for the noisy 20 questions problem. IEEE Trans. Inf. Theory 64 (2), pp. 1105–1131. Cited by: §I, §I, §I.
-  (2011) Network information theory. Cambridge University Press. Cited by: §II-B.
-  (2012) Twenty questions with noise: bayes optimal policies for entropy loss. Journal of Applied Probability 49 (1), pp. 114–136. Cited by: §I, §I, §I.
-  (2018) Searching with measurement dependent noise. IEEE Trans. Inf. Theory 64 (4), pp. 2690–2705. Cited by: §I, §I, §I, §II-B, §II-B, §II-C, §III-A, §III-B, §III-B, §III-C.
-  (2013) Lossy data compression: non-asymptotic fundamental limits. Ph.D. Thesis, Department of Electrical Engineering, Princeton University. Cited by: §II-C.
-  (2018) Improved target acquisition rates with feedback codes. IEEE Journal of Selected Topics in Signal Processing 12 (5), pp. 871–885. Cited by: §I, §I, §I.
-  (2002) Searching games with errors—fifty years of coping with liars. Theoretical Computer Science 270 (1-2), pp. 71–109. Cited by: §I.
-  (2010) Channel coding: non-asymptotic fundamental limits. Ph.D. Thesis, Department of Electrical Engineering, Princeton University. Cited by: §I.
-  (1961) On a problem of information theory. MTA Mat. Kut. Int. Kozl. B 6, pp. 505–516. Cited by: §I.
-  (2016) Phase transitions in group testing. In Proceedings of the Twenty-seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, Philadelphia, PA, USA, pp. 40–53. External Links: Cited by: §III-B.
-  (2017) Phase transitions in the pooled data problem. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 377–385. Cited by: §III-B.
-  (2014) Asymptotic estimates in information theory with non-vanishing error probabilities. Foundations and Trends® in Communications and Information Theory 11 (1–2), pp. 1–184. Cited by: §I, §III-A.
-  (2014) Collaborative 20 questions for target localization. IEEE Trans. Inf. Theory 60 (4), pp. 2233–2252. Cited by: §I.
-  (2015) On decentralized estimation with active queries. IEEE Transactions on Signal Processing 63 (10), pp. 2610–2622. Cited by: §I.
-  (1991) Adventures of a mathematician. Univ of California Press. Cited by: §I.
-  (2019) Resolution limits of noisy 20 questions estimation. arxiv 1909.12954. Cited by: §II-B, §III, §V.