Malicious Experts versus the multiplicative weights algorithm in online prediction

We consider a prediction problem with two experts and a forecaster. We assume that one of the experts is honest and makes correct prediction with probability μ at each round. The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster. Assuming the forecaster adopts the classical multiplicative weights algorithm, we find upper and lower bounds for the value function of the malicious expert. Our results imply that the multiplicative weights algorithm cannot resist the corruption of malicious experts. We also show that an adaptive multiplicative weights algorithm is asymptotically optimal for the forecaster, and hence more resistant to the corruption of malicious experts.



There are no comments yet.


page 1

page 2

page 3

page 4


Adversarial Policies in Learning Systems with Malicious Experts

We consider a learning system based on the conventional multiplicative w...

Adaptive Hedging under Delayed Feedback

The article is devoted to investigating the application of hedging strat...

Optimal anytime regret with two experts

The multiplicative weights method is an algorithm for the problem of pre...

Advice-Efficient Prediction with Expert Advice

Advice-efficient prediction with expert advice (in analogy to label-effi...

Fast rates for prediction with limited expert advice

We investigate the problem of minimizing the excess generalization error...

A PDE Approach to the Prediction of a Binary Sequence with Advice from Two History-Dependent Experts

The prediction of a binary sequence is a classic example of online machi...

Asymptotically optimal strategies for online prediction with history-dependent experts

We establish sharp asymptotically optimal strategies for the problem of ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Prediction with expert advice is classical and fundamental in the field of online learning, and we refer the reader to [6] for a nice survey. In this problem, a forecaster makes predictions based on advices of experts so as to minimize his loss, i.e., the cumulative difference between his predictions and true outcomes. A standard performance criterion is the regret: the difference between the loss of the forecaster and the minimum among losses of all experts. The prediction problem is often studied in the so-called adversarial setting and the stochastic setting. In the adversarial setting, the advice of experts is chosen by an adversary so as to maximize the regret of the forecaster, and therefore the problem can be viewed as a zero-sum game between the forecaster and the adversary (see e.g. [12] [9] [8] [5] [4]). In the stochastic setting, the losses of each expert are drawn independent and identically distributed over time from a fixed but unknown distribution, and smaller regrets can be achieved compared with the adversarial setting (see e.g. [7] [10] [13]).

In this paper, we consider the model in [14] which considers a mix of adversarial and stochastic settings. It is a learning system with two experts and a forecaster. One of the experts is honest, who at each round makes a correct prediction with probability . The other one is malicious, who knows the true outcome at each round and makes his predictions so as to maximize the loss of the forecaster. Here we assume that the forecaster adopts the classical multiplicative weights algorithm, and study its resistance to the corruption of the malicious expert. Denote by the expected cumulative loss for the forecaster, where is the strategy chosen by the malicious expert, is the fixed time horizon, and is the initial weight of the malicious expert. Instead of regret, we analyze the asymptotic maximal loss .

It was proved in [14] that if the malicious expert is only allowed to adopt offline policies, i.e., to decide whether to tell the true outcome at each round at the beginning of the game, then we have . It implies that the extra power of the malicious expert cannot incur extra losses to the forecaster.

Here we allow the malicious expert to adopt online policies, i.e., at each round, the malicious expert chooses whether to tell the truth based on all the prior histories. To find an upper bound on asymptotic losses, we rescale dynamic programming equations of the problem and obtain a partial differential equation (PDE). Then we prove that the unique solution of this PDE provides us an upper bound

For the lower bound, we design a simple strategy for the malicious expert and prove that

which implies that the malicious expert can incur extra losses to the forecaster when online policies are admissible. To make the forecaster more resistant to the malicious expert, we consider an adaptive multiplicative weights algorithm and prove that it is asymptotically optimal for the forecaster.

The rest of the paper is organized as follows. In Section 2, we mathematically formulate this problem and develop its dynamic programming equations. In Section 3, we show the upper bound of asymptotic losses, and in Section 4 we find the lower bound. In Section 5, we consider the malicious expert versus the adaptive multiplicative weights algorithm. In Section 6, we summarize our results and their implications.

2. Problem Formulation

In this section, we introduce the mathematical model as in [14]. Consider a learning system with two experts and a forecaster. For each round , denote the prediction of expert by , and the true outcome by .

Suppose that the forecaster adopts the multiplicative weights algorithm. For each round , denote by the weight of expert , . Then the prediction of the forecaster is

Given , the weights evolve as follows

Denote the entire history up to round by

Assume expert is honest, and at each round make correct predictions with probability independently of , i.e.,

Expert is malicious and knows the accuracy of expert and the outcome at each round. At each stage , based on the information , the malicious expert can choose to lie, i.e., make , or to tell the truth, i.e., make . Denote by the space of functions from to , where (truth) and (lie) represent and respectively.

At each round , the loss of the forecaster is , which is also the gain of the malicious expert. It can be easily verified that


And the evolution of is as follows:



For a fixed time horizon , the goal of the malicious expert is to maximize the cumulative loss of the forecaster by choosing a sequence of strategies , i.e., solving the optimization problem

According to (2.1), we obtain the expected current loss


In combination with (2.2), we get dynamic programming equations


together with initial conditions .

3. Upper bound on the Value function

In this section, we properly rescale the (2.4) and obtain a PDE (HJB). We explicitly solve this equation, and show that its solution (3.5) provides an upper bound on

3.1. Limiting PDE

To appropriately rescale (2.4) and follow the formulation of [2], we change the variable

and define

Then (2.4) becomes


Define scaled value functions via the equation . Substituting in (3.1), we obtain that


Taking to in (3.2), we obtain a first order PDE


where , and

Define , and Hamiltonians

Then (3.3) becomes


Following Ishii’s definition of viscosity solutions to discontinuous Hamiltonians, we complement (3.4) by

where and should be understood in the sense of viscosity solutions.

Solving (3.4) by the method of characteristics and assuming that the value function is differentiable with respect to on , we conjecture the solution

Proposition 3.1.

A viscosity solution of


is given by (3.5).


The initial condition is trivially satisfied. We show that is a subsolution. Suppose is differentiable, and achieves a local maximum at . Since is differentiable in the domain , we have if . Then it is can be easily verified that at , where if , and if .

Suppose is on the line . Note that

Since is a local maximum of , we must have

Take . As a result of

we obtain that

Since we can choose to be either positive or negative, it can be easily deduced that

Substituting into , we obtain that

If is on the line , we have sub/super differentials of

Therefore cannot achieve a local maximal on the line . Hence we have proved that is a subsolution of (HJB), and similarly, we can show that is a supersolution. ∎

3.2. Control problem

In this subsection, we show that there is a unique viscosity solution of (HJB) by applying results from [1] and [2]. First, we interpret (HJB) as a control problem.

In the domain , we take as the space of controls, and

as the controlled dynamics. For , define the space of controls , and the dynamics

The running cost in the domain is given by , in the domain by , and in by

where .

In order to let trajectories stay on the boundary for a while, for , we denote

We say a control is regular if , and denote

Define . We say a Lipschitz function , an admissible trajectory if there exists some control process , such that for a.e.


According to [2, Theorem 2.1], we have for a.e. . Denote by the set of admissible controlled trajectories starting from , i.e.,

Let us also introduce the set of regular trajectories,

For each , we define two value functions


where the cost function is given by

Note that in , the associated Hamiltonian of (3.7) and (3.8)

coincides with in the last subsection. Then according to [2, Theorem 3.3], both and are viscosity solutions of (HJB). We will show that they are actually equal and there is only one viscosity solution of (HJB).

Proposition 3.2.

is the unique viscosity solution of (HJB), and is the minimal supersolution of (HJB).


The argument is an application of results from [2]. Define the Hamiltonians on via

Let us compute . Suppose . Then it can be easily verified that maximizing over is equivalent to maximizing


subject to constraints,


We first fix and suppose . Due to the equality

and the fact that the coefficient before is negative, maximizing (3.9) is equivalent to minimizing under the constraints. It can be easily seen that the minimum can be obtained if and only if . Therefore the equation (3.10) becomes , and hence (3.9) is equal to . Now fix . In order to obtain the maximum of , we have to take . In that case and .

If , we have . Since is a regular control, we conclude that

We say a continuous function is viscosity solution of


if it satisfies (HJB) and

According to [2, Theorem 3.3], is a viscosity subsolution of , and hence also a viscosity subsolution of (3.11) since in our case. As a result of [2, Theorem 4.2, 4.4], is the viscosity solution of (3.11), and the comparison result holds for (3.11). Therefore we conclude that . Then according to their definitions (3.7) and (3.8), they must be equal.

Finally according to [2, Theorem 4.4], is the minimal supersolution of (HJB) and is the maximal subsolution of (HJB). Then if is a viscosity solution of (HJB), we must have and hence . ∎

3.3. Upper bound (3.12)

In this subsection, we show that

is a viscosity supersolution of (HJB). Then according to Proposition 3.2, we obtain that , and hence

In particular, if we take , then the above inequality becomes

Proposition 3.3.

v is a viscosity supersolution of (HJB).


The proof is almost the same as [3, Theorem 2.1], and we record here for completeness. Fixing arbitrary , we show that is a viscosity supersolution over . Assume that is a strict local minimum of for some . As a result of (3.2), it can be easily seen that . Without loss of generality, we assume that , and there exists some such that

  1. outside the ball ,

  2. in the ball .

Then there exists a sequence of such that and is a global minimum of . Due to the definition of , we have that and for any .

According to (3.2), we obtain that


We prove for the case , and the proof for is the same. Since , we can take a convergent subsequence. For simplicity, we still denote it by , and assume it converges to some . Letting in (3.3), we obtain that

Note that if

then we have

and hence

Similarly if


and hence

Therefore, we have shown that

4. Lower Bound on the Value function

It was proved in [14] that the asymptotic average value is for any offline strategy of the malicious expert if starting with weight . Here we provide a lower bound on the value functions for the corresponding online problem


which shows that the malicious expert has more advantages when he adopts online policies.

This lower bound can be achieved if the malicious expert chooses to lie at state and chooses to tell the truth at state . For , define the corresponding strategies by


and . We denote the value function associated with by

Proposition 4.1.

Under strategy ,

is a Markov chain with two states

starting with , and its transition probability is given by