1. Introduction
Prediction with expert advice is classical and fundamental in the field of online learning, and we refer the reader to [6] for a nice survey. In this problem, a forecaster makes predictions based on advices of experts so as to minimize his loss, i.e., the cumulative difference between his predictions and true outcomes. A standard performance criterion is the regret: the difference between the loss of the forecaster and the minimum among losses of all experts. The prediction problem is often studied in the socalled adversarial setting and the stochastic setting. In the adversarial setting, the advice of experts is chosen by an adversary so as to maximize the regret of the forecaster, and therefore the problem can be viewed as a zerosum game between the forecaster and the adversary (see e.g. [12] [9] [8] [5] [4]). In the stochastic setting, the losses of each expert are drawn independent and identically distributed over time from a fixed but unknown distribution, and smaller regrets can be achieved compared with the adversarial setting (see e.g. [7] [10] [13]).
In this paper, we consider the model in [14] which considers a mix of adversarial and stochastic settings. It is a learning system with two experts and a forecaster. One of the experts is honest, who at each round makes a correct prediction with probability . The other one is malicious, who knows the true outcome at each round and makes his predictions so as to maximize the loss of the forecaster. Here we assume that the forecaster adopts the classical multiplicative weights algorithm, and study its resistance to the corruption of the malicious expert. Denote by the expected cumulative loss for the forecaster, where is the strategy chosen by the malicious expert, is the fixed time horizon, and is the initial weight of the malicious expert. Instead of regret, we analyze the asymptotic maximal loss .
It was proved in [14] that if the malicious expert is only allowed to adopt offline policies, i.e., to decide whether to tell the true outcome at each round at the beginning of the game, then we have . It implies that the extra power of the malicious expert cannot incur extra losses to the forecaster.
Here we allow the malicious expert to adopt online policies, i.e., at each round, the malicious expert chooses whether to tell the truth based on all the prior histories. To find an upper bound on asymptotic losses, we rescale dynamic programming equations of the problem and obtain a partial differential equation (PDE). Then we prove that the unique solution of this PDE provides us an upper bound
For the lower bound, we design a simple strategy for the malicious expert and prove that
which implies that the malicious expert can incur extra losses to the forecaster when online policies are admissible. To make the forecaster more resistant to the malicious expert, we consider an adaptive multiplicative weights algorithm and prove that it is asymptotically optimal for the forecaster.
The rest of the paper is organized as follows. In Section 2, we mathematically formulate this problem and develop its dynamic programming equations. In Section 3, we show the upper bound of asymptotic losses, and in Section 4 we find the lower bound. In Section 5, we consider the malicious expert versus the adaptive multiplicative weights algorithm. In Section 6, we summarize our results and their implications.
2. Problem Formulation
In this section, we introduce the mathematical model as in [14]. Consider a learning system with two experts and a forecaster. For each round , denote the prediction of expert by , and the true outcome by .
Suppose that the forecaster adopts the multiplicative weights algorithm. For each round , denote by the weight of expert , . Then the prediction of the forecaster is
Given , the weights evolve as follows
Denote the entire history up to round by
Assume expert is honest, and at each round make correct predictions with probability independently of , i.e.,
Expert is malicious and knows the accuracy of expert and the outcome at each round. At each stage , based on the information , the malicious expert can choose to lie, i.e., make , or to tell the truth, i.e., make . Denote by the space of functions from to , where (truth) and (lie) represent and respectively.
At each round , the loss of the forecaster is , which is also the gain of the malicious expert. It can be easily verified that
(2.1) 
And the evolution of is as follows:
(2.2) 
where
For a fixed time horizon , the goal of the malicious expert is to maximize the cumulative loss of the forecaster by choosing a sequence of strategies , i.e., solving the optimization problem
3. Upper bound on the Value function
In this section, we properly rescale the (2.4) and obtain a PDE (HJB). We explicitly solve this equation, and show that its solution (3.5) provides an upper bound on
3.1. Limiting PDE
To appropriately rescale (2.4) and follow the formulation of [2], we change the variable
and define
Then (2.4) becomes
(3.1)  
Define scaled value functions via the equation . Substituting in (3.1), we obtain that
(3.2)  
Taking to in (3.2), we obtain a first order PDE
(3.3)  
where , and
Define , and Hamiltonians
Then (3.3) becomes
(3.4) 
Following Ishii’s definition of viscosity solutions to discontinuous Hamiltonians, we complement (3.4) by
where and should be understood in the sense of viscosity solutions.
Solving (3.4) by the method of characteristics and assuming that the value function is differentiable with respect to on , we conjecture the solution
(3.5) 
Proposition 3.1.
Proof.
The initial condition is trivially satisfied. We show that is a subsolution. Suppose is differentiable, and achieves a local maximum at . Since is differentiable in the domain , we have if . Then it is can be easily verified that at , where if , and if .
Suppose is on the line . Note that
Since is a local maximum of , we must have
Take . As a result of
we obtain that
Since we can choose to be either positive or negative, it can be easily deduced that
Substituting into , we obtain that
If is on the line , we have sub/super differentials of
Therefore cannot achieve a local maximal on the line . Hence we have proved that is a subsolution of (HJB), and similarly, we can show that is a supersolution. ∎
3.2. Control problem
In this subsection, we show that there is a unique viscosity solution of (HJB) by applying results from [1] and [2]. First, we interpret (HJB) as a control problem.
In the domain , we take as the space of controls, and
as the controlled dynamics. For , define the space of controls , and the dynamics
The running cost in the domain is given by , in the domain by , and in by
where .
In order to let trajectories stay on the boundary for a while, for , we denote
We say a control is regular if , and denote
Define . We say a Lipschitz function , an admissible trajectory if there exists some control process , such that for a.e.
(3.6)  
According to [2, Theorem 2.1], we have for a.e. . Denote by the set of admissible controlled trajectories starting from , i.e.,
Let us also introduce the set of regular trajectories,
For each , we define two value functions
(3.7)  
(3.8) 
where the cost function is given by
Note that in , the associated Hamiltonian of (3.7) and (3.8)
coincides with in the last subsection. Then according to [2, Theorem 3.3], both and are viscosity solutions of (HJB). We will show that they are actually equal and there is only one viscosity solution of (HJB).
Proposition 3.2.
Proof.
The argument is an application of results from [2]. Define the Hamiltonians on via
Let us compute . Suppose . Then it can be easily verified that maximizing over is equivalent to maximizing
(3.9) 
subject to constraints,
(3.10)  
We first fix and suppose . Due to the equality
and the fact that the coefficient before is negative, maximizing (3.9) is equivalent to minimizing under the constraints. It can be easily seen that the minimum can be obtained if and only if . Therefore the equation (3.10) becomes , and hence (3.9) is equal to . Now fix . In order to obtain the maximum of , we have to take . In that case and .
If , we have . Since is a regular control, we conclude that
We say a continuous function is viscosity solution of
(3.11)  
if it satisfies (HJB) and
According to [2, Theorem 3.3], is a viscosity subsolution of , and hence also a viscosity subsolution of (3.11) since in our case. As a result of [2, Theorem 4.2, 4.4], is the viscosity solution of (3.11), and the comparison result holds for (3.11). Therefore we conclude that . Then according to their definitions (3.7) and (3.8), they must be equal.
3.3. Upper bound (3.12)
In this subsection, we show that
is a viscosity supersolution of (HJB). Then according to Proposition 3.2, we obtain that , and hence
In particular, if we take , then the above inequality becomes
(3.12) 
Proposition 3.3.
v is a viscosity supersolution of (HJB).
Proof.
The proof is almost the same as [3, Theorem 2.1], and we record here for completeness. Fixing arbitrary , we show that is a viscosity supersolution over . Assume that is a strict local minimum of for some . As a result of (3.2), it can be easily seen that . Without loss of generality, we assume that , and there exists some such that

outside the ball ,

in the ball .
Then there exists a sequence of such that and is a global minimum of . Due to the definition of , we have that and for any .
According to (3.2), we obtain that
(3.13) 
We prove for the case , and the proof for is the same. Since , we can take a convergent subsequence. For simplicity, we still denote it by , and assume it converges to some . Letting in (3.3), we obtain that
Note that if
then we have
and hence
Similarly if
then
and hence
Therefore, we have shown that
∎
4. Lower Bound on the Value function
It was proved in [14] that the asymptotic average value is for any offline strategy of the malicious expert if starting with weight . Here we provide a lower bound on the value functions for the corresponding online problem
(4.1) 
which shows that the malicious expert has more advantages when he adopts online policies.
This lower bound can be achieved if the malicious expert chooses to lie at state and chooses to tell the truth at state . For , define the corresponding strategies by
(4.2) 
and . We denote the value function associated with by
Proposition 4.1.
Proof.
Under strategy ,
is a Markov chain with two states
starting with , and its transition probability is given by
Comments
There are no comments yet.