1 Introduction
1.1 Motivation
Big data systems store billions of events generated by endusers. Machine learning algorithms are then used for instance to infer intelligent mobile phone applications, to recommend products and services, to optimize the choice of ads, to choose the best human machine interface, to insure selfcare of set top boxes… In this context of massive storage and massive usage of models inferred from personal data, privacy is an issue. Even if individual data are anonymized, the pattern of data associated with an individual is itself uniquely identifying. The
anonymity approach Sweeney (2002) provides a guarantee to resist to direct linkage between stored data and the individuals. However, this approach can be vulnerable to composition attacks: an adversary could use side information that combined with the anonymized data allows to retrieve a unique identifier Ganta et al (2008). Differential privacy Sarwate and Chaudhuri (2013)provides an alternative approach. The sensitive data are hidden. The guarantee is provided by algorithms that allow to extract information from data. An algorithm is differentially private if the participation of any record in the database does not alter the probability of any outcome by very much. The flaw of this approach is that, sooner or later, the sensitive data may be hacked by an adversary. Here, we propose to use a radical approach to insure privacy, that is a narrow interpretation of
privacy by design.Firstly, the useful information is inferred from the stream without storing data. As in the case of differential privacy, this privacy by design approach needs specific algorithms to infer useful information from the data stream. A lot of algorithms has been developed for stream mining, and most business needs can be handled without storing data: basic queries and statistics can be done on the data stream Babcock et al (2002), as well as queries on the join of data streams Chaudhuri et al (1999); Féraud et al (2009), online classification Domingos and Hulten (2002), online clustering Beringer et Hüllermeier (2002), and the more challenging task of decision making using contextual bandits Chu et al (2011); Féraud et al (2016). However, even if the data are not stored, the guarantee is not full: an adversary could intercept the data, then stores and deciphers it.
Secondly, to make the interception of data as expensive as possible for an adversary, we propose to locally process the personal data, benefiting from a new network architecture. For increasing the responsiveness of mobile phone services and applications, network equipment vendors and mobile operators specified a new network architecture: Mobile Edge Computing (MEC) provides IT and cloud computing capabilities within the Radio Access Network in close proximity to devices MEC white paper (2014). In addition to facilitate the distribution of interactive services and applications, the distribution of machine learning algorithms on MEC makes the interception task more difficult. As the data are locally processed, the adversary has to locally deploy and maintain technical devices or software to intercept and decipher the radio communication between devices and MEC servers.
1.2 Related works
Most of applications necessitate to take and optimize decisions with a partial feedback. That is why this paper focuses on a basic block which is called multiarmed bandits (mab). In its most basic formulation, it can be stated as follows: there are arms, each having an unknown distribution of bounded rewards. At each step, the player has to choose an arm and receives a reward. The player needs to explore to find profitable arms, but on other hand the player would like to exploit the best arms as soon as possible: this is the socalled explorationexploitation dilemna. The performance of a mab algorithm is assessed in term of regret (or opportunity loss) with regards to the unknown optimal arm. Optimal solutions have been proposed to solve this problem using a stochastic formulation in Auer et al (2002); Cappé et al (2013), using a Bayesian formulation in Kaufman et al (2012), or using an adversarial formulation in Auer et al (2002).
The best arm identification task consists in finding the best arm with high probability while minimizing the number of times suboptimal arms are sampled, which corresponds to minimize the regret of the exploitation phase while minimizing the cost of the exploration phase. While the regret minimization task has its roots in medical trials, where it is not acceptable to give a wrong treatment to a sick patient for exploration purpose, the best arm identification has its roots in pharmaceutical trials, where in a test phase the side effects of different drugs are explored, and then in a exploitation phase the best drug is produced and sold. The same distinction exists for digital applications, where for instance the regret minimization task is used for adserving, and the best arm identification task is used to choose the best human machine interface. Corresponding to these two related tasks, the fully sequential algorithms, such as UCB Auer et al (2002), explore and exploit at the same time, while the explorethencommit algorithms, such as Successive Elimination EvenDar et al (2002), consist in exploring first to eliminate sequentially the suboptimal arms thanks to a statistical test, and then in exploiting the best arm (see Perchet et al (2015) for a formal description of explorethencommit algorithms). The analysis of explorethencommit algorithms is based on the PAC setting Vailant (1984), and focuses on the sample complexity (i.e. the number of time steps) needed to find an approximation of the best arm with a failure probability . This formulation has been studied for best arm identification problem in EvenDar et al (2002); Bubeck et al (2009); Audibert et al (2010); Gabillon et al (2013), for dueling bandit problem in Urvoy et al (2013), for linear bandit problem in Soare et al (2014), for the contextual bandit problem in Féraud et al (2016), and for the nonstationary bandit problem in Allesiardo et al (2017).
Recent years have seen an increasing interest for the study of the collaborative distribution scheme: players collaborate to solve a multiarmed bandit problem. The distribution of nonstochastic experts has been studied in Kanade et al (2012). The distribution of stochastic multiarmed bandits has been studied for peer to peer network in Szörényi et al (2013). In Hillel et al (2013), the analysis of the distributed exploration is based on the sample complexity need to find the best arm with an approximation factor . When only one communication round is allowed, an algorithm with an optimal speedup factor of has been proposed. The algorithmic approach has been extended to the case where multiple communication rounds are allowed. In this case a speedup factor of is obtained while the number of communication rounds is in . The authors focused on the tradeoff between the number of communication rounds and the number of pulls per player. This analysis is natural when one would like to distribute the best arm identification task on a centralized processing architecture. In this case, the best arm identification tasks are synchronized and the number of communication rounds is the true cost.
The distribution of bandit algorithms on MEC, that we would like to address, is more challenging. When bandit algorithms are deployed close to the user’s devices, the event player is active
is modeled by an indicator random variable. Indeed, a player can choose an action only when an uncontrolled event occurs such as: the device of a user is switched on, a user has launched a mobile phone application, a user connects to a web page… Unlike in
Hillel et al (2013), where the draw of players is controlled by the algorithm, here we consider that the players are drawn from a distribution. As a consequence, synchronized communication rounds can no longer be used to control the communication cost. Here the cost of communications is modeled by the number of transmitted bits.1.3 Our contribution
Between the two main formulations of bandit algorithms, the regret minimization and the best arm identification tasks, we have chosen to distribute the best arm identification task for two reasons. Firstly, even if it has been shown that the explorethencommit algorithms are suboptimal for the regret minimization task with two arms by a factor Garivier et al (2016), they can be rate optimal for the regret minimization task, while the fully sequential algorithms cannot handle the best arm identification task. By distributing an explorethencommit algorithm, one can provide a reasonably good solution for the two tasks. Secondly, for distributing bandit algorithms, explorethencommit algorithms have a valuable property: the communications between players are needed only during the exploration phase. For each distributed best arm identification task, one can bound the communication cost and the time interval where communications are needed. This property facilitates the sharing of the bandwith between several distributed tasks.
In the next section, we propose a new problem setting handling the distribution of the best arm identification task between collaborative players. A lower bound states the minimum number of transmitted bits needed to reach the optimal speedup factor . Then, we propose a first algorithm, Distributed Median Elimination, which is optimal in term of number of transmitted bits, and which benefits from a near optimal speedup factor with respect to a rate optimal algorithm such as Median Elimination EvenDar et al (2002) run on a single player. This first algorithm is designed to obtain an optimal communication cost. In practice, it cannot handle the tradeoff between the communication cost and the exploration cost, ant it requires some knowledge on the distribution of players. Extended Distributed Median Elimination overcomes these limitations, by playing in parallel different instances of Distributed Median Elimination and selecting the best one. In the last section, experiments illustrate the analysis of proposed algorithms.
2 Problem setting
The distribution of the best arm identification task on the edge of the Radio Access Network is a collaborative game, where the players are the Mobile Edge Computing application servers, which cluster endusers. The players attempt to find the best arm as quickly as possible while minimizing the number of sent messages. There are two kinds of messages: the upward messages are sent from the MEC to the synchronization server, and the downward messages are sent from the synchronization server to all MEC (see Figure 1). This architecture handles the case where a context is observed before the action is chosen. The context can contain aggregated information at the MEC level or personal data stored beforehand in the device of the enduser. In the following, we focus on the case where no context is observed. We discuss the extension of the proposed algorithm to the contextual bandit in the future works. This architecture guarantees privacy since no event are stored. The part of the context containing personal data, which can be declarative data provided with an optin, are under the control of the enduser: if a context is stored in the user’s device, it can be suppressed by the enduser. Furthermore, the context can be built in order to insure anonymity or differential privacy. The only thing that can be observed by an adversary through the core network (between the MEC and the synchronization server) is upward messages, which corresponds to aggregated information across users of one MEC server, and downward messages, which correspond to aggregated information by all MEC servers. As personal data are locally processed, the adversary has to locally deploy and maintain technical devices or software to intercept and decipher the radio communication between devices and MEC servers. For the adversary, this makes expensive the data collection task.
Let be the set of players, and be the number of players. Let be a random variable denoting the active player (i.e. the player for which an event occurs), and
be the probability distribution of
. Let and be the set of indices of most active players, and be the number of most active players. Let be a set of actions, and be the set of actions of the player . Letbe a vector of bounded random variables,
be the random variable denoting the reward of the action and be the mean reward of the action . Let be the random variable denoting the reward of the action chosen by the player , and be its mean reward. Letbe the joint distribution of rewards and
active players.Definition 1:
an approximation of the best arm is an arm such that .
Definition 2:
the sample complexity is defined by the number of samples in needed by the algorithm to obtain an approximation of the best arm with a probability .
Definition 3:
the sample complexity of the distributed algorithm on players is defined by the number of samples per player in needed to obtain an approximation of the best arm with a probability .
In the following, the sample complexity of a rate optimal algorithm for the best arm identification problem is denoted , and the sample complexity of a rate optimal distributed algorithm is denoted .
Definition 4:
for the best arm identification task, the speedup factor of the algorithm distributed on players with respect to an optimal algorithm run independently on each player is defined by:
where is the number of samples in needed to obtain on average draws of the player , and is the number of samples in needed to obtain on average at least draws of each player included in .
Proposition 1:
for the best arm identification task, the speedup factor is greater or equal to the ratio between the sample complexity of an optimal algorithm run independently on each player and the one of the distributed algorithm :
Proof
The number of times a player is drawn at time horizon
is modeled by a binomial distribution of parameter T,
. At time step the mean number of draws of the player is . This implies that , and . ∎Assumption 1 (best arm identification task):
the mean reward of an action does not depend on the player: and , .
Assumption 1 is used to restrict the studied problem to the distribution of the best arm identification task. We discuss the extension of this distribution scheme to the contextual bandit problem in the future works.
Assumption 2 (binary code):
each transmitted message through the communication network is coded using a binary code^{1}^{1}1a prefix code such as a truncated binary code or a Huffman code (see Cover and Thomas (2006)) would be more efficient. To simplify the exposition of ideas, we have restricted the analysis to binary code.. For instance, when the synchronization server notifies to all players that the action is eliminated, it sends to all players the code .
In order to ease reading, in the following we will omit the algorithm in notations: denotes the sample complexity of the distributed algorithm on players, and denotes the number of samples in needed to obtain on average at least draws for each player included in . When assumptions 1 and 2 hold, Theorem 1 states a lower bound for this new problem.
Theorem 1:
there exists a distribution such that, any distributed algorithm on players needs to transmit at least bits to find with high probability an approximation of the best arm with an optimal speedup factor in .
Proof
Theorem 1 in Mannor and Tsitsiklis (2004) states that there exists a distribution such that any algorithm needs to sample at least times to find with high probability an approximation of the best arm. As a consequence, the total number of draws of players needed by a distributed algorithm cannot be lesser than this lower bound. Thus, there exists a distribution such that any distributed algorithm on players needs to sample at least times each player to find with high probability an approximation of the best arm. When the distribution of players is uniform, we have , and hence for a distributed algorithm which is rate optimal in , we have:
Median Elimination EvenDar et al (2002) is a rate optimal algorithm for finding an approximation of the best arm.
Thus, when the distribution of players is uniform, the speedup factor of any distributed algorithm cannot be higher than .
Let us assume that there exists a distributed algorithm that finds an approximation of the best arm with a speedup factor ,
and that transmits less than bits. There are only three possibilities to achieve this goal:
(1) a player does not transmit information about an action to the server,
(2) or the server does not transmit information about an action to a player,
(3) or this algorithm transmits less than bits for each action.
If a player does not transmit an information about an action to the server (condition ), then for this action the number of players is . Thus, the speedup factor cannot be reached.
If the server does not transmit information about an action to a player (condition ), then this player does not receive information about this action from the other players. As a consequence, this player cannot use information from other players to eliminate or to select this action, and in worst case the speedup factor becomes .
Thus, the number of sent messages cannot be less than upward messages plus downward messages. The minimum information that can be transmitted about an action is its index. Using a binary code (see Assumption 2), the number of bits needed to transmit the index of an action cannot be less than (condition ). ∎
3 Distributed Median Elimination
3.1 Algorithm description
Now, we can derive and analyze a simple and efficient algorithm to distribute the best arm identification task. Distributed Median Elimination deals with three sets of actions:

is the set of actions,

is the set of remaining actions of the player ,

is the set of actions that the player would not like to eliminate at local step .
Distributed Median Elimination uses the most active players () to eliminate suboptimal arms of local sets of actions of all players. When the algorithm stops, the players choose sequentially remaining actions from . The sketch of the proposed algorithm (see Algorithm 3) is the following:

Median Elimination algorithm with a (high) probability of failure is run on each player, without the right of local elimination.

When a player would like to eliminate an action, the corresponding index of the action is sent to the synchronization server.

When an half of the most active players would like to eliminate an action, the synchronization server eliminate the action with a (low) probability of failure by sending the index of the eliminated action to each player.
Remark 1:
Distributed Median Elimination algorithm stops when all the most active players would like
to eliminate all actions excepted their estimated best one (
). This implies that each player can output several actions, and that the remaining actions are not necessary the same for each player.The analysis is divided into four parts. The first part of the analysis insures that Distributed Median Elimination algorithm finds an approximation of the optimal arm with high probability. The second part states the communication cost in bits. The third part provides an upper bound of the number of pulls per player before stopping. The last part provides an upper bound of the number of samples in before stopping.
3.2 Analysis of the algorithm output
Lemma 1:
with a probability at least , Distributed Median Elimination finds an approximation of the optimal arm.
Proof
The proof uses similar arguments than those of Lemma 1 in EvenDar et al (2002). The main difference is that here, for insuring that when the algorithm stops it remains an approximation of the best arm, we need to state that such near optimal arms cannot be eliminated with high probability until all suboptimal arms have been eliminated. Consider the event:
According to algorithm 2 line , each arm is is sampled sufficiently such that:
Using the union bound, we obtain that .
In case where does not hold, the probability that a suboptimal arm be empirically better than an approximation of the best arm is:
Let be the number of suboptimal arms, which are empirically better than an approximation of the best arm. Using Markov inequality, we have:
where denotes the expectation with respect to the random variable .
As a consequence while it remains suboptimal arms, an approximation of the best arm is not eliminated with a probability .
When the number of suboptimals arms is lesser than lines of algorithm 2 insures that with a probability all the suboptimal arms are eliminated from , is not empty, and contains only approximations of the best arm.
Then using the union bound, the probability of failure is bounded by .
By construction, the approximation error is reduced at each step such that .
As a consequence when Distributed Median Elimination stops, each set contains an approximation of the best arm with a failure probability .
Distributed Median Elimination fails when it stops while and such that . This event could occur when players would like to eliminate all approximations of the best arm, with a probability .
∎
3.3 Analysis of the number of transmitted bits
Lemma 2:
Distributed Median Elimination stops transmitting bits.
Proof
Each action is sent to the server no more than once per player (see line of the algorithm 2).
When the algorithm stops, the players have not sent the code of their estimated best action (see stopping condition line of the algorithm 3).
Thus the number of upward messages is .
Then, the fact that the synchronization server sends each suboptimal action only once insures that the number of downward messages is .
The optimal length of a binary code needed to code an alphabet of size is .
Thus, the total number of transmitted bits is .
∎
3.4 Analysis of the number of pulls per player
Lemma 3:
Distributed Median Elimination stops when each of the most actives player have been drawn at most
Proof
The first steps of the proof are the same than those provided for Median Elimination (see Lemma 2 in EvenDar et al (2002)). For the completeness of the analysis, we recall them here. From line 4 of Algorithm 2, any player stops after:
(1) 
Replacing by in inequality 1, we provide the upper bound of the number of pulls per player. ∎
Theorem 2 states that when the speedup factor of Distributed Median Elimination is at least in with respect to an optimal algorithm such as Median Elimination EvenDar et al (2002) run on each player, while its communication cost is at most bits. Theorem 1 and Theorem 2 show that Distributed Median Elimination is optimal in term of number of transmitted bits and near optimal in term of speedup factor.
Theorem 2:
when with a probability at least , Distributed Median Elimination finds with high probability an approximation of the best arm, transmitting bits, and obtains a speedup factor at least in .
Proof
Using Lemma 1, 2, and 3, we state that Distributed Median Elimination finds with high probability an approximation of the best arm,
transmitting bits through the communication network, and using no more than
pulls per player.
Median Elimination is an optimal algorithm for finding an approximation of the best arm: its sample complexity reaches the lower bound
in pulls.
Thus using Proposition 1, the speedup factor of Distributed Median Elimination is:
∎
3.5 Analysis of the number of draws of players
The analysis of the number of pulls per player allows to state a near optimal speedup factor in . Now, we focus on the number of draws of players (i.e. the time step) needed to insure with high probability that all players find an approximation of the best arm. First, we consider the case where the true value of is known. This requires some knowledge of the distribution of players, which is realistic in many applications. For instance, in the case of Radio Access Network, the load of each cell or server is known, and hence the probability of each player is known. Theorem 3 provides an upper bound of the number of draws of players needed to find an approximation of the best arm with high probability, when is known. In the next section we consider the case where is unknown.
Theorem 3:
with a probability at least , Distributed Median Elimination finds an approximation of the best arm, transmitting bits through the communication network and using at most
where and .
Proof
Consider the event . We have . Let be the number of times where a player has not been drawn at time step . follows a negative binomial distribution with parameters ,. By definition of the negative binomial distribution, We have:
where denotes the expectation with respect to the random varible . Using the Hoeffding’s inequality, we have:
The number of draws is the sum of , the number of draws of a player, and , the draws which do not contain this player. Hence, setting , using Lemma 1 with a failure probability , and then using the union bound, the following inequality is true with a probability :
Using Lemma 3, we have:
Then using Lemma 2, we conclude the proof. ∎
4 Extended Distributed Median Elimination
4.1 Algorithm description
Notice that if is not gracefully set, the stopping time of Distributed Median Elimination is not controlled. The algorithm could not stop if contains a player with a zero probability, or could stop after a lot of time steps, if contains an unlikely player. Moreover, Distributed Median Elimination is designed to transmit an optimal number of bits. This first algorithm cannot handle the tradeoff between the number of time steps, where all players have selected an approximation of the best arm, and the number of transmitted bits. To overcome these limitations, we propose a straightforward extension of the proposed algorithm, which consists in playing in parallel instances of Distributed Median Elimination with equally spread values of (see Algorithm 4).
4.2 Analysis
Theorem 4:
with a probability at least , Extended Distributed Median Elimination finds an approximation of the best arm, transmitting bits through the communication network and using at most
draws of players, where , and .
Proof
The proof of Theorem 4 straightforwardly comes from Theorem 3. Theorem 3 holds for each instance with a probability . Then using the union bound, Theorem 3 holds for all instances with a probability . The communication cost is the sum of communications of each instance, and the number of time steps needed to find a near optimal arm is the minimum of all instances. ∎
Extended Distributed Median Elimination
handles the tradeoff between the time step where all players have chosen a near optimal arm and the number of transmitted bits (see Theorem 4).
The communication cost increases linearly with , while the needed number of draws decreases with .
Moreover, one can insure that the algorithm stops by setting an instance where , and that the algorithm has a speedup factor in in the worst case (i.e. when the distribution of players is uniform) by setting another instance of the algorithm where .
The cost of this good behavior is the factor in the communication cost.
5 Experiments
5.1 Experimental setting
In this section we provide and discuss some experiments done with a simulated environment. To illustrate and complete the analysis of the proposed algorithm, we compare Distributed Median Elimination on regret minimization problems using three baselines: Median Elimination EvenDar et al (2002) played independently on each player and Median Elimination with an unlimited communication cost illustrate the interest and the limits of the distribution approach, and UCB Auer et al (2002) with an unlimited communication cost is used as a benchmark for the two regret minimization problems. In order to finely capture the difference of performances between distributed and non distributed algorithm, we plot the estimated pseudoregret over time:
where is the action chosen by the player drawn at time , and is the estimated reward over trials.
Problem 1.
There are arms. The optimal arm has a mean reward , the second one , the third one , and the others have a mean reward of . The problem 1, where a few number of arms have high mean rewards and the others have low mean rewards, is easy for an explorethencommit strategy such as Distributed Median Elimination and for a fullysequential approach such as UCB.
Problem 2.
There are arms. The optimal arm has a mean reward , the second one , and the other have a mean reward of . With regard to this more difficult problem, where the gap between arms is tighter, explorethencommit and fullysequential algorithms will need more steps to play frequently the best arm. In contrast, Median Elimination is a fixeddesign approach: whatever the problem, it spends the same number of steps in exploration.
Uniform distribution.
Each player has a probability equal to . In this case, the knowledge of the distribution of players does not provide any particular benefit for Distributed Median Elimination: is set to , which is known by all blind algorithms. This case corresponds to the worst case for Distributed Median Elimination.
of players generates of events.
The players are part in two groups of sizes and . When a player is drawn, a uniform random variable is drawn. If the player belongs from the first group, and else from the second one. In this case, the knowledge of the distribution provides a useful information for setting . This knowledge, which corresponds to the number of most active players, is available for cells in a Radio Access Network.
For all the experiments, is set to , is set to , and the time horizon is . All the regret curves are averaged over trials.
5.2 Discussion
We notice that the number of transmitted bits is zero for Median Elimination played independently on each player, for Distributed Median Elimination run on players, and for Median Elimination and UCB with an unlimited communication cost. In comparison to algorithms with unlimited communication cost, Distributed Median Elimination needs times less bits to process a million of decisions: the communication cost of Distributed Median Elimination does not depend on the time horizon (see Lemma 2).
Problem 1  Uniform distribution of players
The number of players versus the regret at time horizon is plotted for the two problems when the distribution of players is uniform (see Figures 2a and 3a). Firstly, we observe that whatever the number of players Distributed Median Elimination is outperformed by Median Elimination with an unlimited communication cost. For players, the expected number of events per player is lower than one thousand: Distributed Median Elimination and Median Elimination performed on each player does not end the first elimination epoch. Secondly, for less than players, Distributed Median Elimination clearly outperforms Median Elimination with a zero communication cost.
The regret versus the time step is plotted for the two problems using players which are uniformely distributed (see Figures 2b and 3b).
For the first problem (see Figure 2b), where the gap is large, UCB with an unlimited communication cost benefits from its fullysequential approach: it outperforms clearly Median
Elimination. The second problem (see Figure 3b) is more difficult since the gap is tighter. As a consequence, the difference in perfomances between UCB and Median Elimination is small.
Distributed Median
Elimination significantly outperforms Median Elimination with zero communication cost on both problems.
When the distribution of players is not uniform, we observe that the gap in performances between Median Elimination with an unlimited communication cost and Distributed Median Elimination is reduced (see Figures 4a and 5a). In comparison to Median Elimination played on each player, Distributed Median Elimination exhibits a good behavior: when the most active players have found an approximation of the best arm, the sharing of information allows to eliminate the suboptimal arms for infrequent players which are numerous (see Figure 4b and 5b). As a consequence, the gap in performances between Distributed Median Elimination and Median Elimination played on a single player is increased.
To illustrate the interest of Extended Distributed Median Elimination when the knowledge of the distribution of players is not available, the value of the parameter versus the regret at the time horizon is plotted (see Figure 6) for the two problems with players. When of players generates of events, Distributed Median Elimination outperforms Median Elimination run on each player for a wide range of values of the parameter . However, when is overestimated, the speedup factor with respect to Median Elimination run on each player can be lesser than one. Without the knowledge of the true value of the parameter, by selecting the best instance Extended Distributed Median Elimination obtains the result of the best instance of Distributed Median Elimination and significantly outperforms Median Elimination run on each player. The communication cost becomes bits instead bits, when instances run in parallel.
6 Conclusion an future works
In order to distribute the best identification task as close as possible to the user’s devices, we have proposed a new problem setting, where the players are drawn from a distribution. This architecture guarantees privacy to the users since no data are stored and the only thing that can be observed by an adversary through the core network is aggregated information over users. When the distribution of players is known, we provided and analyzed a first algorithm for this problem: Distributed Median Elimination. We have showed that its communication cost is optimal, while its speedup factor in is near optimal. Then, we have proposed Extended Distributed Median Elimination, which handles the tradeoff between the communication cost and the speedup factor. In four illustrative experiments, we have compared the proposed algorithm with three baselines: Median Elimination with zero and unlimited communication costs, and UCB with an unlimited communication cost. According to the theoretical analysis, Distributed Median Elimination clearly outperforms Median Elimination with a zero communication cost. Finally, this distribution approach provides a speedup factor linear in term of number of Mobile Edge Computing application server, facilitates privacy by processing data close to the enduser, and its communication cost, which does not depend on the time horizon, allows to control the load of the telecommunication network while deploying a lot of decision making applications on the edge of the Radio Access Network.
These results are obtained when Assumption holds: the mean reward of actions does not depend on the player. Future works will extend this distributed approach to the case where Assumption does not hold, and in particular for the contextual bandit problem. Indeed, Distributed Median Elimination is a basic block, which can be extended to the selection of variables to build a distributed decision stump and then a distributed version of Bandit Forest Féraud et al (2016).
References

Allesiardo et al (2017)
Allesiardo, R., Féraud, R., Maillard, O. A.: The NonStationary Stochastic MultiArmed Bandit Problem
,
International Journal of Data Science and Analytics, 2017.
 Audibert et al (2010) Audibert, J.Y., Bubeck, S., Munos, R.: Best Arm Identification in MultiArmed Bandits, COLT, 2010.
 Auer et al (2002) Auer, P., Cesa Bianchi, N., Fischer, P.: Finitetime Analysis of the Multiarmed Bandit Problem, Machine Learning,47, 235256, 2002.
 Auer et al (2002) Auer, P., CesaBianchi, N., Freund, Y., Schapire, R. E.: The nonstochastic multiarmed bandit problem, SIAM J. COMPUT., 32 4877, 2002.
 Babcock et al (2002) Babcock, B., Babu, S., Datar, M., Motwani, R. and Widom, J.: Models and Issues in Data Stream Systems, ACM SIGMOD, 2002

Beringer et Hüllermeier (2002)
Beringer, J. and Hüllermeier, E.: Online clustering of parallel data streams,
IData & Knowledge Engineering, 58(2):180204, 2006
 Bubeck et al (2009) Bubeck, S., Wang, T., Stoltz, G.: Pure exploration in multiarmed bandits problems, COLT, 2009.
 Cappé et al (2013) Cappé, O., Garivier, A., Maillard, O.A., Munos, R., Stoltz, G.: KullbackLeibler Upper Confidence Bounds for Optimal Sequential Allocation. Annals of Statistics, Institute of Mathematical Statistics, 2013, 41 (3), pp.15161541.
 Chaudhuri et al (1999) Chaudhuri, S., Motwani, R., Narasayya, V.: On random sampling over joins. ACM SIGMOD.
 Chu et al (2011) Chu, W., Li, L., Reyzin, L., Shapire, R. E.: Contextual Bandits with Linear Payoff Functions, AISTATS, 2011.
 Cover and Thomas (2006) Cover, T. M., Joy A. Thomas, T. A.: Elements of Information Theory, WileyInterscience, 2006.
 Domingos and Hulten (2002) Domingos, P., Hulten, G.: Mining highspeed data streams, KDD, 7180, 2000.

EvenDar et al (2002)
EvenDar, E., Mannor, S., Mansour Y.: PAC Bounds for Multiarmed Bandit and Markov Decision Processes,
COLT, 255270, 2002.  Féraud et al (2009) Féraud, R., Clérot, F., Gouzien, P: Sampling the Join of Stream, 11th IFCS Conference, 2009.

Féraud et al (2016)
Féraud, R., Allesiardo, R., Urvoy, T., Clérot, F.: Random Forest for the Contextual Bandit Problem,
AISTATS, 2016.  Gabillon et al (2013) Gabillon, V., Ghavamzadeh M., Lazaric, A., Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, NIPS, 2013.
 Garivier et al (2016) Garivier, A., Kaufmann, E., Lattimore, T.: On ExploreThenCommit Strategies, NIPS, 2016.
 Ganta et al (2008) Ganta, S., R., Kasiviswanathan, S., P. and Smith, A.: Composition attacks and auxiliary information in data privacy, KDD, 2008.
 Hillel et al (2013) Hillel, E., Karnin, Z., Koren, T. Lempel, R., Somekh, O.: Distributed Exploration in MultiArmed Bandits, NIPS, 2013.
 Kanade et al (2012) Kanade, V., Liu Z., Radunovic̀, B.: Distributed NonStochastic Experts, NIPS, 2012.

Kaufman et al (2012)
Kaufman, E., Korda, N., Munos, R.: Thompson sampling: An asymptotically optimal finite time analysis,
COLT, 2012.  Mannor and Tsitsiklis (2004) Mannor, S., and Tsitsiklis, J. N.: The Sample Complexity of Exploration in the MultiArmed Bandit Problem, JMLR, 2004
 MEC white paper (2014) Mobile Edge Computing Introductory Technical White Paper, etsi.org, 20140901.
 Perchet et al (2015) Perchet, V., Rigollet, P., Chassang, S., Snowberg, E.: Batched Bandit Problems, The Annals of Statistics,Vol. 44, No. 2, 2016.
 Sarwate and Chaudhuri (2013) Sarwate, A. D. and Chaudhuri, K.: Signal Processing and Machine Learning with Differential Privacy, IEEE Signal Processing Magazine, 2013
 Sweeney (2002) Sweeney, L.: kanonymity: A model for protecting privacy, International Journal on Uncertainty, Fuzziness and Knowledge based Systems, 10(5):557570, 2002.
 Soare et al (2014) Soare, M., Lazaric, A., Munos, R.: BestArm Identification in Linear Bandits, NIPS, 2014.
 Szörényi et al (2013) Szörényi, B., BusaFekete, R., Hegedũs, I., Ormàndi, R., Jelasity, M., Kègl, B.: Gossipbased distributed stochastic bandit algorithms, ICML, 2013.
 Urvoy et al (2013) Urvoy, T., Clérot, F., Féraud, R., Naamane, S.: Generic Exploration and Karmed Voting Bandits, ICML, 2013.
 Vailant (1984) Valiant, L.: A theory of the learnable, Communications of the ACM, 27, 1984.
Comments
There are no comments yet.