The traditional multi-armed bandit problem aims to find the arm with the highest payoff. This is often motivated by practical applications such as to identify an ad with highest payoff in showing to users, or identifying a strategy with maximum payoff. In this work, we consider a setting with the objective being the identification of an arm/node which best captures the entire information of a system, i.e., the identification of arm which can best estimate all the other arms. In contrast to the traditional multi-armed bandit problem, this objective involves an estimation of the correlation structure among the various arms. This is motivated by several practical applications. For instance, in internet-of-things, sensors are used to take measurements from multiple locations with the objective of estimating the underlying parameter, e.g., temperature, over a region. Resource constraints mean that it might not possible to place sensors at the desired level of granularity. However, an estimate of the underlying distribution enables one to form an estimate of the parameter at points not measured. This estimate of the statistics of the underlying randomness is often formed using limited measurements from multiple points, before choosing the final location of the sensors. Another application of interest is in identifying members who can best approximate the social network. Instances include sensors used for measuring temperature in a region (Guestrin et al., 2005), thermal sensors on microprocessors (Long et al., 2008), optimizing queries over a sensornet (Deshpande et al., 2004) and placing sensors to detect contaminants in a water distribution network (Krause et al., 2008). In all these applications, the underlying correlation structure plays an important role. Problems of similar interest have also been studied in the realm of information theory in Boda and Narayan (2017), Boda (2018).
In this paper, we formulate a variant of the stochastic -armed bandit problem, where the objective is to identify the arm that best estimates all the other correlated arms. We measure how good an arm can estimate other arms using the mean-squared error (MSE) criterion, defined as follows:
We assume that the arms
are correlated sub-Gaussian random variables (r.v.s).Paul et al. (2014) consider a celluar network application, where the goal is to to monitor large communication networks with huge traffic. Since observing every node is computationally intensive, companies such as AT&T use measurements from various nodes to identify a subset which best captures the average behavior of the network. The requirement is for an algorithm that reduces the data acquisition cost by identifying the most-correlated subset of nodes, while using a minimum number of sample measurements. The authors in (Paul et al., 2014) show that a model approximating the underlying nodes as Gaussian r.v.s is useful and reliable.
Closely related problems in other application contexts include (i) selecting a few blogs that capture the information cascade (Leskovec et al., 2007); (ii) finding a subset of people that captures best the average behavior of a community; To put it differently, the notions of centrality in the context of document/news summarization (Erkan and Radev, 2004) and prestige in social networks (Heidemann et al., 2010) are closely related to the MSE objective in (1). In each of these applications, there is a cost associated with acquiring data and the challenge is to find the most correlated subset of blogs/people/etc using minimal observations about the community.
We study the basic problem of identifying the arm which has the best MSE in estimating the remaining arms in a multi-armed bandit framework.
We consider the best arm identification setting (Audibert et al., 2010; Kaufmann et al., 2015), where a bandit algorithm is given a fixed sampling budget, and is evaluated based on the probability of incorrect identification.
Challenges encountered for such a setup include:
(i) Any estimate for the MSE requires estimation of the underlying correlations, without assuming knowledge of the variances.
(ii) Estimate of the MSE of an arm involves estimating the correlation of arm with the remaining arms. This requires samples from all pairs of arms associated with . In particular, sampling arm alone would be insufficient towards estimating arm ’s MSE; and hence
(iii) A bandit algorithm needs to optimize sampling across all pairs of arms and not just among arms. This requires intricate decisions over a larger set, in contrast to the classical mean-value optimizing algorithms in a best arm identification framework.
We summarize our contributions below.
First, we introduce a new formulation to study the identification of arm which best estimates all arms. We first design an estimate and develop the concentration bound for the estimate of mean-squared error formed from available samples. Our estimator builds on the difference estimator introduced in (Liu and Bubeck, 2014), but estimation is technically more challenging in our setting as the underlying variances are not known and unlike Liu and Bubeck (2014), not necessarily assumed to be one.
Second, we analyze a nonadaptive uniform sampling strategy (i.e., a strategy that pulls each pair of arms an equal number of times) and propose an algorithm inspired by popular successive rejects (SR) (Audibert et al., 2010) for best-arm identification, but more intricate due to the nonlinearity of the objective function, the MSE objective function (1). A naive SR strategy that operates over phases, discarding all arm pairs associated with the arm having lowest empirical MSE is suboptimal. Instead, our SR algorithm maintains active sets for arms as well as pairs and discards a pair only if both constituent arms are out of the active arms set. We provide an upper bound, on the probability of error in identifying the best arm, for our SR algorithm and the latter bound involves a hardness measure that factors in the gaps in MSEs as well as the correlations, which are specific to the correlated bandit problem. As in the classic bandit setup, the upper bound shows that SR algorithm requires fewer samples to find the best arm in comparison to a uniform sampling strategy, especially, when is large and the underlying gaps (difference between MSE of optimal and suboptimal arms) are uneven.
Third, we prove a lower bound over all bandit problems with a certain hardness measure and to the best of our knowledge, this is the first lower bound for the correlated bandit problem that involves adaptive sampling strategies. The lower bound involves constructing problem transformations, where the optimal arm is “swapped” with one of the sub-optimal ones, resulting in problem instances. Unlike in the classic setup, any local change in the distribution of an arm impacts the MSE of all the other arms. Moreover, pulling arm pairs instead of individual arms makes the lower bound technically more challenging.
In (Liu and Bubeck, 2014), which is the closest related work, the authors consider a bandit problem, where the objective is to identify a subset of arms most correlated among themselves, i.e., to identify the local correlation structure within a subset of arms themselves. On the other hand, our problem is about forming global inference from samples of subsets of arms to identify the arm that is most correlated to the remaining arms. In Liu and Bubeck (2014), the authors consider a setting with positively correlated arms with unit variance, making the estimation task and hence, the overall best arm identification slightly easier. As we show later in Section 3, their estimation scheme does not extend to the more general non-unit variance setup that we consider. Finally, we also prove fundamental limits on the performance of any correlated bandit algorithm, through information-theoretic lower bounds, and to the best of our knowledge, no lower bounds exist for a correlated bandit problem.
The rest of the paper is organized as follows: In Section 2, we formalize the correlated bandit problem. In Section 3, we present the MSE estimation scheme and derive a concentration bound for our estimator. In Section 4, we examine uniform sampling strategy, while in Section 5, we present a successive-rejects type algorithm. In Section 6, we present a lower bound for the correlated bandit problem. We provide the convergence proofs in Section 7. While not the thrust of this work, we provide a few illustrative examples in Section 8 showing the performance of our successive-rejects type algorithm. Finally, in Section 9 we provide our concluding remarks.
We consider a set of correlated arms , whose samples are i.i.d. in time. For each arm , let denote the minimum mean-squared error (MMSE) of estimating all the remaining arms, i.e.,
Consider the special case of jointly Gaussian r.v.s
, whose joint probability distribution is characterized by the mean (taken to be zero for the sake of expository simplicity), andcovariance matrix :
In the above, , is the variance of arm and , the correlation coefficient between arms and .
The corresponding MMSE for arm is
Note that there is no error in arm estimating itself and the error in estimating the th arm is characterized by the correlation between and and the relevant variances. Further, the MMSE estimate for the case of Gaussian r.v.s is linear. In the more general case of non-Gaussian r.v.s, the MMSE estimate is typically nonlinear and any online computation is typically a computationally intense task. In such cases, we restrict ourselves to employing an optimal linear estimator which is still defined as the right side of (4). Thus, the right-side of (5) holds for all optimal linear estimators, with it being optimal for Gaussian r.v.s.
We consider a setting where the arms are sub-Gaussian, and focus on linear estimators. We recall the definition of sub-Gaussianity below. A r.v. is said to be -sub-Gaussian if For equivalent characterizations of sub-Gaussianity, the reader is referred to Theorem 2.1 of Wainwright (2015).
Clean this part We consider a fixed budget best-arm identification framework, and the interaction of our (bandit) algorithm with the environment is given below.
Notice that, in each round, the algorithm above pulls a pair of arms, and this is necessary to learn the underlying correlation structure.
In our setting, the performance metric associated with each arm is its MSE , and
the optimal arm, say , has the lowest MSE, i.e.,
The objective is to minimize the probability of error in identifying the best arm, i.e.,
where is the estimate of the best arm based on samples.
For the suboptimality of the arm is quantified by its gap in its MSE with respect to the optimal arm, i.e., The notation is used to refer to the best arm (with ties broken arbitrarily), i.e., s are ordered gaps of the arms:
Note that the problem with reduces to identifying the arm with higher variance and has no dependence on the correlation between the arms. The analysis of this case would be similar (estimate variance instead of mean) to the classical bandit problems and differs considerably from the setting with arms, which is the setting assumed hereafter.
3 MSE Estimation
Let denote the set of
i.i.d. samples obtained from the bivariate Gaussian distribution corresponding to the pair of arms. To identify the optimal arm, we form an estimate of to which end we form estimates for the variances and the correlation coefficient . We employ the following estimators for the aforementioned quantities: For any ,
The estimate for in (7) is akin to that proposed in Liu and Bubeck (2014), which considers a simpler setting where all the arms are known to have unit variance, i.e., For the unit variance setup, Liu and Bubeck (2014) establish via a likelihood ratio test that the difference based estimator for
is advantageous over the natural estimator for :
This superiority depends explicitly on the a priori knowledge of the variances being one,
which is not applicable to the general setting considered here, i.e., a setting where the variances are not necessarily one.
However, to exploit the optimality of the likelihood ratio test,
we express the estimator above in the spirit of (8)
which depend on the estimates of the variances to scale
the samples to obtain
Unlike the unit variance setup of Liu and Bubeck (2014), it is not possible to obtain a difference based estimator in our setting. Nevertheless, concentrates faster as approaches and this can be argued as follows: On the high probability event , we have
For any arm , the corresponding MSE is estimated using the quantities defined in (7) as follows:
The main result concerning the exponential concentration of the estimate around the true MSE is presented below. (MSE Concentration) Assume . Let be the MSE estimate given in (9), for . Then, for any , and for any , we have
where is a universal constant, and . In the above, it suffices to look at , since is less than , owing to the assumption that .
See Section 7.2. ∎
The claim in Proposition 3 holds for the more general case of sub-Gaussian r.v.s . However, in this case, the MSE is best in the class of linear estimators, and is not necessarily the minimum MSE estimator.
4 Uniform Sampling
A simple approach towards identifying the best arm is to select each pair equal number of times, estimate the MSE errors and recommend the arm with the lowest MSE estimate to be optimal, i.e., the samples used for estimation are For uniform sampling, the probability of error in identifying the optimal arm is
where is a universal constant.
If the correlations between all pairs of arms and the variances of all the arms are similar, then in the absence of this prior knowledge, the optimal strategy would involve sampling all pairs of arms an equal number of times. However, when this is not the case, uniform sampling might be a strictly inferior strategy because it fails to gather more samples which can enable a better estimation of MSE of arms with MSE close to the optimal arm. We present below a strategy which tries to sequentially zone in on a reduced set of possible candidates for the optimal arm and then sample the pairs of arms involved in the MSE estimation of these arms approximately equal number of times to get a better probability of error in identifying the best arm.
5 Successive Rejects
The successive rejects (SR) algorithm, which pulls pairs of arms111With abuse of notation, is used to denote the (unordered) pair of arms . to identify the arm which minimizes MSE, operates over phases as described in Figure 1 . The idea is to maintain a set of active arms and pairs of arms (for phase , these are denoted by and ) and eliminate arms (and some of their corresponding pairs) that have high MSE. The elimination scheme employed in Figure 1 departs significantly from the approach adopted in the classic SR algorithm for finding the arm with highest mean. To illustrate this, consider a setting with arms. If arms are out of contention after phase , . In the second phase, all the pairs in are pulled number of times. Now, if arm is out of contention at the end of this phase, the pairs and will be removed from and no longer be pulled in the later phases.
Notice that a strategy that finds the worst arm according to empirical MSE estimates and discards all pairs associated with that arm is clearly suboptimal, because samples from some of the discarded pairs of arms are essential to form estimate of MSE of arms which remain in contention. For e.g., in a -armed bandit setting, suppose that we discard all pairs associated with arm in some round. This would impact the quality of MSE estimate of arm , since the pair would be useful in training a better estimate of via .
Before presenting the main result that bounds the probability of error in identifying the best arm of the algorithm in Figure 1, we present the following problem complexities that capture the hardness of the learning task at hand (i.e., the order of number of samples required to find the best arm with reasonable probability):
The quantities and , have a connotation similar to that in the classical bandit setup and satisfy222The proof follows in a similar fashion as in the classic bandit setup (Audibert et al., 2010) and is given in Appendix 7.1 for the sake of completeness.
where is as defined in Figure 1.
Observe that the problem complexities depend both on the variances of the arms and the correlation between the arms through the gaps. The probability of error in identifying the best arm of SR satisfies
where is a universal constant.
See Section 7.4. ∎
From Theorem 4, it is apparent that an uniform sampling strategy would require samples to achieve a certain accuracy, while our SR variant for correlated bandits would require number of samples. SR scores over uniform sampling w.r.t. dependence on the number of arms because in our SR algorithm an increasing number of pairs of arms are removed from contention in successive phases. More importantly, SR has better dependence on the underlying gaps when compared to uniform sampling. In problem instances where the gaps are uneven, SR finds the best arm much faster than uniform sampling.
6 Lower Bound
To obtain the lower bound, we consider a -armed Gaussian bandit problem with the underlying joint probability distribution governed by the following covariance matrix:
Observe that is a valid covariance matrix and is positive semi-definite. The MSEs corresponding to arms are and more generally
Hence, we have the following order on the MSEs:
An approach in recent papers, cf. (Audibert et al., 2010; Kaufmann et al., 2015), for establishing lower bound for best-arm identification is to transform the bandit problem so that one of the sub-optimal arm is turned into an optimal one, while not affecting the rest of the arms. However, our setting involves correlated arms, with the correlation factors appearing in the mean-squared error objective and hence, one cannot make a sub-optimal arm optimal in a standalone fashion. We swap pairs of arms to interchange the MSE of a sub-optimal arm with that of the optimal arm and this introduces major deviations in the proof as compared classic -armed case, as we shall soon see. We describe our problem transformations next.
We form transformations of the bandit problem formulated at the beginning of this section. For “problem ,” arm is the best and for achieving this, we swap the first and th rows in . Let be the pdf associated with the given problem as in (11), and represent the pdf of the transformed bandit problem, where represents the
th transformation. Since we consider arms whose samples are i.i.d. in time, the joint distribution ofsamples is a product distribution of the underlying random variables and for the transformed problem by . For compactness, we use , and , .
For any problem with , we define and and the min-max probability of error in identifying the optimal arm is given by the theorem below. For any bandit strategy that returns the arm after rounds, there exists a transformation of the covariance matrix such that the probability of error on the transformed problem satisfies
is the problem complexity term,
, and .
See Section 7.5. ∎
. The problem complexity term in the upper bound involved the square of the gaps, whereas the lower bound involves just the gaps. We believe the upper bound for SR algorithm is optimal in terms of gap dependence and it would be interesting future work to establish a lower bound that involves squares of the gaps. In the lower bound proof, the Kullback-Leibler divergence terms for the transformed problems were bounded above by the gaps (for e.g., see (LABEL:eq:kl-bound-lb) in Section LABEL:sec:proof-sketch), leading to an overall lower bound with complexity . Nevertheless, the current proof is challenging owing to (i) pairs of arms being pulled in each round; (ii) the covariance matrix in (11) is non-trivial and its problem transformations are novel and finally, (iii) arriving at the bound for the aforementioned KL-divergence terms requires non-trivial algebraic effort.
7 Convergence Proofs
7.1 Problem complexities
We begin by showing the relation between the different problem complexities defined in the Section 2.
7.2 Proof of Proposition 3
For establishing the main claim in Proposition 3, we shall use two well-known sub-exponential concentration bounds, which we are given below. (Concentration of sample variance) Let , be independent sub-Gaussian r.v.s with common parameter . Let . Then, we have the following bound for any :
By definition, it follows that the square of a sub-Gaussian r.v. is sub-exponential. The main claim now follows from the concentration bound for sub-exponential r.v.s in Proposition 2.2 of (Wainwright, 2015). ∎
a vector of i.i.d. standard Gaussian r.v.s and a-Lipschitz function . Then, using Gaussian concentration for Lipschitz functions (cf. Section 2.3 of Wainwright (2015))
For with i.i.d. Gaussian r.v.s , consider and . Observing that is -Lipschitz, changing the variable from to and using (12), we obtain
The other inequality bounding the left tail follows by an argument similar to above.
For the case of sub-Gaussian r.v.s, the main claim can be inferred from Theorem 3.1.1 in Vershynin (2016) and we provide the proof details below for the sake of completeness. Observe that
The first inequality above holds because implies , for any , while the final inequality follows from Lemma 7.2 after observing that is sub-exponential since is sub-Gaussian. The main claim follows by changing the variable to from . As before, the other inequality bounding the left tail follows by a completely parallel argument. ∎
Next, we state and prove a result that establishes exponential concentration of the sample correlation coefficient. Notice that the MSE estimate in (9) is comprised of sample variances and sample correlation coefficients. To prove that the MSE estimate concentrates, we shall use Lemma 7.2 for terms involving sample variances, and the lemma below for terms involving sample correlation coefficients.
(Concentration of sample correlation coefficient) For independent Gaussian rvs , with mean zero and covariance matrix as defined in (3) and with , formed from samples using (7), for any , and for any , we have
where is a positive constant satisfying , .
We bound for and . The analysis below holds in general.
Consider the following event:
where the penultimate inequality relies on the assumption that . inlineinlinetodo: inline Why is the assumption required? It seems clean without out. Still haven’t seen a strong reason yet except that it will appear as throughout. As discussed, its easier to manage the constants this way. Feel free to genearlize.
Let and . Then, on the event , we have
The second term in (14) is bounded as follows:
where the penultimate inequality follows from an application of Lemma 7.2 and the last inequality uses the fact that . inlineinlinetodo: inlineShould this be ? Why do we need a condition on here? Doesn’t look like its used is a typo inlineinlinetodo: inline Also even if variances are less than , gaps can be as large as , so in general can be . The third term in (14) can be bounded in a similar fashion. The last term in (14) is bounded as follows: