1 Introduction
Over the past years, online learning has become a very active research field. This is due to the widespread of applications with evolving or adversarial environments, e.g. routing schemes in networks [3], online marketplaces [5], spam filtering [11], etc. An online learning algorithm has to choose an action over a (possible infinite) set of feasible decisions. A loss/reward is associated to each decision which may be adversarially chosen. The losses/rewards are unknown to the algorithm beforehand. The goal is to minimize the regret, i.e. the difference between the total loss/reward of the online algorithm and that of the best single action in hindsight. A “good” online learning algorithm is an algorithm whose regret is sublinear as a function of the length of the timehorizon since then, on the average, the algorithm performs as well as the best single action in hindsight. Such an online algorithm is called an online learning algorithm with vanishing regret. For problems for which the offline version is hard, the notions of regret and vanishing regret have been extended to the notions of regret and vanishing regret in order to take into account the existence of an approximation algorithm instead of an exact algorithm for solving the offline optimization problem.
While a lot of online learning problems can be modeled as the so called “experts problem” by associating a feasible solution to an expert, there is clearly an efficiency challenge since there are potentially an exponential number of solutions making problematic the use of such an approach in practice. Other methods have been used as the online gradient descent [24], the follow the leader algorithm and its extensions follow the perturbed leader [15] for linear objective functions and its generalization to submodular objective functions [12], or the generalized follow the perturbed leader [7] algorithm. Hazan and Koren [13] proved that a noregret algorithm with runningtime polynomial in the size of the problem does not exist in general settings without any assumption on the structure of the problem.
Our work takes into account the computational efficiency of the online learning algorithm in the same vein as the works in [1, 15, 12, 22, 6, 7, 14, 9]. We study various discrete nonlinear combinatorial optimization problems in an online learning framework, focusing in particular on the family of minmax discrete optimization problems.
Our goal is to address the two following central questions:

are there negative results showing that getting vanishing regret (or even vanishing approximate regret) is computationally hard?

are there some notable differences in the efficiencies of follow the leader and gradient descent strategies for discrete problems?
Formally. An online learning problem consists of a decisionspace , a statespace and an objective function that can be either a cost or a reward function. Any problem of this class can be viewed as an iterative adversarial game with rounds where the following procedure is repeated for : (a) Decide an action , (b) Observe a state , (c) Suffer loss or gain reward .
We use as another way to refer to the objective function f after observing the state , i.e. the objective function at round .
The objective of the player is to minimize/maximize the accumulative cost/reward of his decided actions, which is given by the aggregation . An online learning algorithm is any algorithm that decides the actions at every round before observing . We compare the decisions of the algorithm with those of the best static action in hindsight, defined as: , or , for minimization or maximization problems, respectively. This is the action that a (hypothetical) offline oracle would compute, if it had access to the entire sequence . The typical measurement for the efficiency of an online learning algorithm is the regret, defined as: .
A learning algorithm typically uses some kind of randomness, and the regret denotes the expectation of the above quantity. We are interested in online learning algorithms that have the ”vanishing regret” property. This means that as the ”game” progresses (), the average deviation between the algorithm’s average cost/payoff to the average cost/payoff of the optimum action in hindsight tends to zero. Typically, a vanishing regret algorithm is an algorithm with regret such that: . However, as we are interested in polynomial time algorithms, we consider only vanishing regret where (that guarantees the convergence in polynomial time). Throughout the paper, whenever we mention vanishing regret, we mean regret where .
For many online learning problems, even their offline versions are hard. Thus, it is not feasible to produce a vanishing regret sequence with an efficient algorithm. For such cases, the notion of regret has been defined as:
Hence, we are interested in vanishing regret sequences for some for which we know how to approximate the offline problem. The notion of vanishing regret is defined in the same way as that of vanishing regret. In this article we focus on computational issues. Efficiency for an online learning algorithm needs to capture both the computation of and the convergence speed. This is formalized in the following definition (where denotes the size of the instance).
Definition 1.
A polynomial time vanishing regret algorithm is an online learning algorithm for which (1) the computation of is polynomial in and (2) the expected regret is bounded by for some polynomial and some constant .
Note that in case , we simply use the term polynomial time vanishing regret algorithm.
1.1 Our contribution
In Section 2, we provide a general reduction showing that many (minmax) polynomial time solvable problems not only do not have a vanishing regret, but also no vanishing approximation regret, for some (unless ). Then, we focus on a particular minmax problem, the minmax version of the vertex cover problem which is solvable in polynomial time in the offline case. The previous reduction proves that there is no regret online algorithm, unless Unique Game is in ; we prove a matching upper bound providing an online algorithm based on the online gradient descent method.
In Section 3, we turn our attention to online learning algorithms that are based on an offline optimization oracle that, given a set of instances of the problem, is able to compute the optimum static solution. We show that for different nonlinear discrete optimization problems, it is strongly hard to solve the offline optimization oracle, even for problems that can be solved in polynomial time in the static case (e.g. minmax vertex cover, minmax perfect matching, etc.). We also prove that the offline optimization oracle is strongly hard for the problem of scheduling a set of jobs on identical machines, where is a fixed constant. To the best of our knowledge, up to now algorithms based on the follow the leader method for nonlinear objective functions require an exact oracle or a FPTAS oracle in order to obtain vanishing regret. Thus, strong hardness for the multiple instance version of the offline problem indicates that followtheleadertype strategies can’t be used for the online problem, at least with our current knowledge. On the positive side, we present an online algorithm with vanishing regret that is based on the follow the perturbed leader algorithm for a generalization of knapsack problem [2].
1.2 Further related works
Online Learning, or Online Convex Optimization, is an active research domain. In this section, we only summarize works which are directly related to ours. We refer the reader to comprehensive books [21, 11] and references therein for a more complete overview. The first noregret algorithm has been given by Hannan [10]. Subsequently, Littlestone and Warmuth [18] and Freund and Schapire [8] gave improved algorithms with regret where is the size of the action space. However, these algorithms have runningtime which is exponential in the size of the input for many applications, in particular for combinatorial optimization problems. An intriguing question is whether there exists a noregret online algorithm with runningtime polynomial in . Hazan and Koren [13] proved that no such algorithm exists in general settings without any assumption on the structure. Designing online polynomialtime algorithms with approximation and vanishing regret guarantees for combinatorial optimization problems is a major research agenda.
In their breakthrough paper, Kalai and Vempala [15] presented the first efficient online algorithm, called FollowthePerturbedLeader (FTPL), for linear objective functions. The strategy consists of adding perturbation to the cumulative gain (payoff) of each action and then selecting the action with the highest perturbed gain. This strategy has been generalized and successfully applied to several settings [12, 22, 6, 7]. Specifically, FTPL and its generalized versions have been used to design efficient online noregret algorithms with oracles beyond linear settings: to submodular settings [12] and nonconvex settings [1]. However, all these approaches require bestresponse oracles, and as we show in this paper, for several problems such bestresponse oracles require exponential time computation.
Another direction is to design online learning algorithms using (offline polynomialtime) approximation algorithms as oracles. Kakade et al. [14] provided an algorithm which is inspired by Zinkevich’s algorithm [24] (gradient descent): at every step, the algorithm updates the current solution in the direction of the gradient and project back to the feasible set using an approximation algorithm. They showed that given an approximation algorithm for a linear optimization problem, after prediction rounds (time steps) the online algorithm achieves an regret bound of using calls to the approximation algorithm per round in average. Later on, Garber [9] gave an algorithm with regret bound of using only calls to the approximation algorithm per round in average. These algorithms rely crucially on the linearity of the objective functions and it remains an interesting open question to design algorithms for online nonlinear optimization problems.
2 Hardness of online learning for minmax problems
2.1 General reduction
As mentioned in the introduction, in this section we give some answers to question on ruling out the existence of vanishing regret algorithm for a broad family of online minmax problems, even for ones that are polynomialtime solvable in the offline case. In fact, we provide a general reduction (see Theorem 1) showing that many minmax problems do not admit vanishing regret for some unless .
More precisely, we focus on a class of cardinality minimization problems where, given an elements set , a set of constraints on the subsets of (defining feasible solutions) and an integer , the goal is to determine whether there exists a feasible solution of size at most . This is a general class of problems, including for instance graph problems such as Vertex Cover, Dominating Set, Feedback Vertex Set, etc.
Given such a cardinality problem , let minmax be the optimization problem where given nonnegative weights for all the elements of , one has to compute a feasible solution (under the same set of constraints as in problem ) such that the maximum weight of all its elements is minimized. The online minmax problem is the online learning variant of minmax, where the weights on the elements of change over time.
Interestingly, the minmax version of all the problems mentioned above are polynomially solvable. This is actually true as soon as, for problem , every superset of a feasible solution is feasible. Then one just has to check for each possible weight if the set of all elements of weight at most agrees with the constraints. For example, one can decide if there exists a vertex cover with the maximum weight as follows: remove all vertices of weight strictly larger than , and check if the remaining vertices form a vertex cover.
We will show that, in contrast, if is complete then its online learning minmax version has no vanishing regret algorithm (unless ), and that if has an inapproximability gap , then there is no vanishing regret for its online learning minmax version. Let us first recall the notion of approximation gap, where denotes the minimum size of a feasible solution to the cardinality problem .
Definition 2.
Given two numbers , let [A,B]Gap be the decision problem where given an instance of such that or , we need to decide whether .
Now we can state the main result of the section.
Theorem 1.
Let be a cardinality minimization problem and be real numbers with . Assume that the problem [A,B]Gap is complete. Then, for every where is an arbitrarily small constant, there is no polynomial time vanishing regret algorithm for online minmax unless .
Proof.
We prove this theorem by deriving an polytime algorithm for [A,B]Gap that gives, under the assumption of a vanishing regret algorithm for online minmax
, the correct answer with probability of error at most
. This would imply that the [A,B]Gap problem is in and thus .Let be a vanishing regret algorithm for online minmax for some where is a constant and . Let be a time horizon which will be fixed later. We construct the following (offline) algorithm for [A,B]Gap using as an oracle (subroutine). At every step , use the oracle to compute a solution . Then, choose one element of uniformly at random and assign weight 1 to that element; assign weight 0 to other elements. Consequently, the cost incurred to is 1 at every step. These weight assignments over times, yet simple, are crucial. Intuitively, the assignments will be used to learn about the optimal solution of the [A,B]Gap problem (given the performance of the learning algorithm ). The formal description is given in Algorithm 1.
We are now analyzing Algorithm 1. If the algorithm outputs , this means that at some step the oracle has figured out a feasible solution with . Since (the minimum cardinality feasible solution) is known to be either or , the output is always correct.
If the algorithm outputs , then this means that every solution had a cardinality that was greater or equal to . We bound the probability that Algorithm 1 returns a wrong answer in this case. Let be the regret achieved by the oracle (online learning algorithm) on the set of instances produced in Algorithm 1. Let denote the event that the algorithm returns a wrong answer. By Adam’s Law, we have:
From Algorithm 1 it should be clear that at every step, the oracle always suffers loss 1. By definition of regret, this means that:
Now, we consider a minimum cardinality feasible solution (for the initial instance of the cardinality minimization problem ). We have . As Algorithm 1 returns a wrong answer, and at every time , has at least elements. Furthermore, by the construction of the weights, there is only one element with weight 1. Thus, with probability at most (and otherwise). Thus, we get:
since . Hence, .
As has vanishing regret, i.e., there exists a constant such that where is a polynomial of the problem parameters. Therefore,
Choose parameter , we get that . Besides, the running time of Algorithm 1 is polynomial since it consists of (polynomial in the size of the problem) iterations and the running time of each iteration is polynomial (as is a polynomial time algorithm).
In conclusion, if there exists a vanishing regret algorithm for online minmax, then the complete problem [A,B]Gap is in , implying . ∎
The inapproximability (gap) results for the aforementioned problems give lower bounds on the approximation ratio of any vanishing regret algorithm for their online minmax version. For instance, the online minmax dominating set problem has no vanishing constantregret algorithm based on the approximation hardness in [19]. We state the lower bound explicitly for the online minmax vertex cover problem in the following corollary, as we refer to it later by showing a matching upper bound. They are based on the hardness results for vertex cover in [17] and [16] (hardness and UGChardness, respectively).
Corollary 2.
The online minmax vertex cover problem does not admit a polynomial time vanishing regret unless . It does not admit a polynomial time vanishing regret unless Unique Game is in .
Now, consider complete cardinality problems which have no known inapproximability gap (for instance Vertex Cover in planar graphs, which admits a PTAS). Then we can show the following impossibility result.
Corollary 3.
If a cardinality problem is Complete, then there is no vanishing regret algorithm for online minmax unless .
Proof.
We note that the proof of Theorem 1 does not require , and to be constant: they can be functions of the instance, and the result holds as soon as is polynomially bounded (so that remains polynomially bounded in ). Then, for a cardinality problem , if and , then deciding whether is the same as deciding whether or . By setting , and in proof of Theorem 1 we get the result. ∎
2.2 Minmax Vertex Cover: matching upper bound with Gradient Descent
In this section we will present an online algorithm for the minmax vertex cover problem based on the classic Online Gradient Descent (OGD) algorithm. In the latter, at every step the solution is obtained by updating the previous one in the direction of the (sub)gradient of the objective and projecting to a feasible convex set. The particular nature of the minmax vertex cover problem is that the objective function is the norm and the set of feasible solutions is discrete (nonconvex). In our algorithm, we consider the following standard relaxation of the problem:
At time step , we update the solution by a subgradient with in coordinate and 0 in other coordinates. Moreover, after projecting the solution to the polytope , we round the solution by a simple procedure: if then and otherwise. The formal algorithm is given in Algorithm 2.
The following theorem, coupled with Corollary 2, show the tight bound of on the approximation ratio of polynomialtime online algorithms for Minmax Vertex Cover (assuming UGC conjecture).
Theorem 4.
Assume that . Then, after time steps, Algorithm 2 achieves
3 Computational issues for Follow the Leader based methods
The most natural approach in online learning is for the player to always pick the leading action, i.e. the action that is optimal to the observed history . However it can be proven ([15]) that any deterministic algorithm that always decides on the leading action can be “tricked” by the adversary in order to make decision that are worse than the optimal action in hindsight, thus leading to large regret algorithms. On this regard, we need to add a regularization term containing randomness to the optimization oracle in order to make our algorithms less predictable and more stable. Thus, the Follow the Regularized Leader strategy in a minimization problem, consists of deciding on an action such that:
where is the regularization term.
There are many variations of the Follow the Leader (FTL) algorithm that differentiate on the applied objective functions and the type of regularization term. For linear objectives, Kalai and Vempala [15] suggested the Follow the Perturbed Leader algorithm where the regularization term is simply the cost/payoff of each action on a randomly generated instance of the problem. Dudik et al. [7] were able to generalize the FTPL algorithm of Kalai and Vempala [15] for nonlinear objectives, by introducing the concept of shared randomness and a much more complex perturbation mechanism.
A common element between every Follow the Leader based method, is the need for an optimization oracle over the observed history of the problem. This is a minimum requirement since the regularization term can make determining the leader even harder, but most algorithms are able to map the perturbations to the value of the objective function on a set of instances of the problem and thus eliminate this extra complexity. To the best of our knowledge, up to now FTL algorithms for nonlinear objective functions require an exact or a FPTAS oracle in order to obtain vanishing regret. Thus, strong hardness for the multiple instance version of the offline problem indicates that the FTL strategy cannot be used for the online problem, at least with our current knowledge.
3.1 Computational hardness results
As we mentioned, algorithms that use the “Follow the Leader” strategy heavily rely on the existence of an optimization oracle for the multiinstance version of the offline problem. For linear objectives, it is easy to see ([15]) that optimization over a set of instances is equivalent to optimization over a single instance and thus any algorithm for the offline problem can be transformed to an online learning algorithm. However, for nonlinear problems this assumption is not always justified since even when the offline problem is polytimesolvable, the corresponding variation with multiple instances can be strongly hard.
In this section we present some problems where we can prove that the optimum solution over a set of instances is hard to approximate. More precisely, in the multiinstance version of a given problem, we are given an integer , a set of feasible solutions , and objective functions over . The goal is to minimize (over ) .
We will show computational hardness results for the multiinstance versions of:

minmax vertex cover (already defined).

minmax perfect matching, where we are given an undirected graph and a weight function on the edges and we need to determine a perfect matching such that the weight of the heaviest edge on the matching is minimized.

minmax path, where we are given an undirected graph , two vertices and , and a weight function on the edges and we need to determine an path such that the weight of the heaviest edge in the path is minimized.

, where we are given identical parallel machines, a set of jobs and processing times and we need to determine a schedule of the jobs to the machines (without preemption) such that the makespan, i.e. the time that elapses until the last job is processed, is minimized.
Hence, in the multiinstance versions of these problems, we are given weight functions over vertices (minmax vertex cover) or edges (minmax perfect matching, minmax path), or processing time vectors ().
Theorem 5.
The multiinstance versions of minmax vertex cover, minmax perfect matching, minmax path and are strongly hard.
Proof.
Here we present the proof for the multiinstance version of the minmax perfect matching and the minmax path problems, which use a similar reduction from the Max3DNF problem. The proofs for multiinstance minmax vertex cover and multiinstance can be found at appendices A.1 and A.2 respectively.
In the Max3DNF problem, we are given a set of n boolean variables and m clauses that are conjunctions of three variables in or their negations and we need to determine a truth assignment such that the number of satisfied clauses is maximized.
We start with the multiinstance minmax perfect matching problem. For every instance of the Max3DNF problem we construct a graph and weight functions defined as follows:

To each variable is associated a 4cycle on vertices . This 4cycle has two perfect matchings: either is matched with and is matched with , corresponding to setting the variable to true, or viceversa, corresponding to setting to false. This specifies a onetoone correspondence between the solutions of the two problems.

Each weight function corresponds to one conjunction: if , otherwise . Edges incident to vertices always get weight 0.
The above construction can obviously be done in polynomial time to the size of the input. It remains to show the correlation between the objective values of these solutions. If a clause is satisfied by a truth assignment then (since it is a conjunction) every literal on the clause must be satisfied. From the construction of the instance of multiinstance minmax matching, the corresponding matching will have a maximum weight of 0 for the weight function . If a clause is not satisfied by a truth assignment, then the corresponding matching will have a maximum weight of 1 for the weight function . Thus, from the reduction we get
where stands for the value of a solution. This equation already proves the hardness result of Theorem 5. It actually also shows hardness. Indeed, the optimal value OPT of Max3DNF verifies . Assuming the existence of a approximation algorithm for multiinstance minmax perfect matching problem, we can get a approximation algorithm for Max3DNF. Since Max3DNF is Hard, multiinstance minmax perfect matching is also Hard.
A similar reduction leads to the same result for the minmax path problem: starting from an instance of 3DNF, build a graph where . Vertex corresponds to variable There are two arcs and between and . We are looking for paths. Taking edge (resp. ) corresponds to setting to true (resp. false). As previously this gives a onetoone correspondence between solutions. Each clause corresponds to one weight function: if then , if then . All other weights are 0. Then for a path , if and only if is satisfied by the corresponding truth assignment. The remainder of the proof is exactly the same as the one of minmax perfect matching. ∎
Theorem 5 gives insight on the hardness of nonlinear multiinstance problems compared to their singleinstance counterparts. As we proved, the multiinstance is strongly NPHard while is known to admit a FPTAS [20, 23]. Also, the multiinstance version of minmax perfect matching, minmax path and minmax vertex cover are proved to be Hard while their singleinstance versions can be solved in polynomial time. We also note that these hardness results hold for the very specific case where weights/processing times are in , for which , as well as the other problems, become trivial.
We also note that the inapproximability bound we acquired for the multiinstance minmax vertex cover under UGC is tight, since we can formulate the problem as a linear program, solve it’s continuous relaxation and then use a rounding algorithm to get a vertex cover of cost at most twice the optimum for the problem.
The results on the minmax vertex cover problem also provides some answer to question (Q2) addressed in the introduction. As we proved in Section 2.2, the online gradient descent method (paired with a rounding algorithm) suffices to give a vanishing 2regret algorithm for online minmax vertex cover. However, since the multiinstance version of the problem is APXhard there is no indication that the follow the leader approach can be used in order to get the same result and match the lower bound of Corollary 2 for the problem.
3.2 Online generalized knapsack problem
In this section we present a vanishing regret algorithm for the online learning version of the following generalized knapsack problem. In the traditional knapsack problem, one has to select a set of items with total weight not exceeding a fixed “knapsack” capacity and maximizes the total profit of the set. Instead, we assume that the knapsack can be customized to fit more items. Specifically, there is a capacity and if the total weight of the items exceeds this capacity, then we have to pay times the extra weight. Formally:
Definition 3 (Generalized Knapsack Problem (GKP)).
Given a set of items with nonnegative weights and nonnegative profits , a knapsack capacity and a constant , determine a set of items that maximizes the total profit:
This problem, as well as generalizations with other penalty costs for overweight, have been studied for instance in [4, 2] (see there for practical motivations). In an online learning setting, we assume that we have items with static weights and a static constant . On each timestep, we need to select a subset of those items and then we learn the capacity of the knapsack and the profit of every item, gaining some profit or even suffering loss based on our decision.
As we showed in Section 3.1, many nonlinear problems do not have an efficient (polynomial) offline oracle and as a direct consequence, the follow the leader strategy can not directly be applied to get vanishing regret. While GKP is clearly not linear due to the maximum in the profit function, we will show that there exists a FPTAS for solving its multiple instances variation. We will use this result to get a vanishing regret algorithm for the online version of GKP (Theorem 6).
Since the problem is not linear, we use the the generalized FTPL (GFTPL) framework of Dudik et al. [7], which does not rely on the assumption that the objective function is linear. While in the linear case it was sufficient to consider an “extra” random observation (FTPL), a much more complex perturbation mechanism is needed in order for the analysis to work if the objective function is not linear. The key idea of the GFTPL algorithm is to use common randomness for every feasible action but apply it in a different way. This concept was referred by the authors of [7] as shared randomness, using the notion of translation matrix. The method is presented in Appendix B.1.
Theorem 6.
There is a polynomial time vanishing regret algorithm for GKP.
Proof.
(sketch) The proof is based on the three following steps:

First we note that GFTPL works (gives vanishing regret) even if the oracle admits a FPTAS. This is necessary since our problem is clearly hard.

Second, we provide for GKP an ad hoc translation matrix. This shows that the GFTPL method can be applied to our problem. Moreover, this matrix is built in such a way that the oracle needed for GFTPL is precisely a multiinstance oracle.

Third, we show that there exists an FPTAS multiinstance oracle.
The first two points are given in appendices B.1 and B.2 respectively. We only show the last point. To do this, we show that we can map a set of instances of the generalized knapsack problem to a single instance of the more general convexgeneralized knapsack problem. Suppose that we have a set of instances of GKP. Then, the total profit of every item set is:
where and . Let the total weight of the item set and a nondecreasing ordering of the knapsack capacities. Then:
Note that the above function is always convex. This means that at every time step , we need a FPTAS for the maximization problem where is a convex function. We know that such an FPTAS exists ([2]). In this paper, the authors suggest a FPTAS with time complexity by assuming that the convex function can be computed at constant time. In our case the convex function is part of the input; with binary search we can compute it in logarithmic time. ∎
4 Conclusion
In the paper, we have presented a general framework showing the hardness of online learning algorithms for minmax problems. We have also showed a sharp separation between two widelystudied online learning algorithms, online gradient descent and followtheleader, from the approximation and computational complexity aspects. The paper gives rise to several interesting directions. A first one is to extend the reduction framework to objectives other than minmax. A second direction is to design online vanishing regret algorithms with approximation ratio matched to the lower bound guarantee. Finally, the proof of Theorem 1 needs a nonoblivious adversary. An interesting direction would be to get the same lower bounds with an oblivious adversary if possible.
Appendix
Appendix A Hardness of multiinstance problems (Theorem 5)
a.1 Hardness of multiinstance minmax vertex cover
We make a straightforward reduction from the vertex cover problem. Consider any instance of the vertex cover problem, with . We construct weight functions such that in vertex has weight 1 and all other vertices have weight 0. If we consider the instance of the multiinstance minmax vertex cover with graph and weight functions , it is clear that any vertex cover has total cost that is equal to its size, since for any vertex there is exactly one weight function where and for every other weight function.
Since vertex cover is strongly hard, hard to approximate within ratio and UGChard to approximate within ratio , the same negative results hold for the multiinstance minmax vertex cover problem.
a.2 Hardness of multiinstance P3——Cmax
We prove that the multiinstance problem is strongly hard even when the processing times are in , using a reduction from the complete 3coloring problem. In the 3coloring (3C) problem, we are given a graph and we need to decide whether there exists a coloring of its vertices with 3 colors such that if two vertices are connected by an edge, they cannot have the same color.
For every instance of the 3C problem with and , we construct (in polynomial time) an instance of the multiinstance with jobs and processing time vectors. Every edge corresponds to a processing time vector with jobs and having processing time 1 and every other job having processing time 0. It is easy to see that at each time step the makespan is either 1 or 2 and thus the total makespan is at least and at most .
If there exists a 3coloring on then by assigning every color to a machine, at each time step there will not be two jobs with nonzero processing time in the same machine and thus the makespan will be 1 and the total solution will have cost . If the total solution has cost then this means that at every time step the makespan was 1 and by assigning to the jobs of every machine the same color we get a 3 coloring of . Hence, the multiinstance variation of the problem is strongly hard.
Appendix B A polynomial time vanishing regret algorithm for GKP (Theorem 6)
b.1 Generalized follow the perturbed leader
For the sake of completeness, we introduce the generalized FTPL (GFTPL) method of Dudik et al. [7], which can be used to achieve a vanishing regret for non linear objective functions for some discrete problems. The key idea of the GFTPL algorithm is to use common randomness for every feasible action but apply it in a different way. This concept was referred by the authors of [7] as shared randomness. In their algorithm, the regularization term of the FTPL algorithm is substituted by the inner product where is a random vector and is a vector corresponding to the action . In FTPL it was sufficient to have but in this general setting, must be the row of a translation matrix that corresponds to action .
Definition 4 (Admissible Matrix [7]).
A matrix is admissible if its rows are distinct. It is admissible if it is admissible and also (i) the number of distinct elements within each column is at most and (ii) the distinct elements within each column differ by at least .
Definition 5 (Translation Matrix [7]).
A translation matrix is a admissible matrix with rows and Ncolumns. Since the number of rows is equal to the number of feasible actions, we denote as the row corresponding to action . In the general case, and is used to denote the diameter of the translation matrix.
From the definition of the translation matrix it becomes clear that the action space needs to be finite. Note that the number of feasible actions can be exponential to the input size, since we do not need to directly compute the translation matrix. The generalized FTPL algorithm for a maximization problem is presented in algorithmic box 3. At time , the algorithm decides the perturbed leader as the action that maximizes the total payoff on the observed history plus some noise that is given by the inner product of and the perturbation vector . Note that in [7] the algorithm only needs an oracle with an additive error . We will see later that it works also for a multiplicative error (more precisely, for an FPTAS).
Let us denote as the diameter of the objective function, i.e., .
Theorem 7 ([7]).
By using an appropriate to draw the random vector, the regret of the generalized FTPL algorithm is:
By setting , this clearly gives a vanishing regret.
Let us quote two difficulties to use this algorithm. First, the oracle has to solve a problem where the objective function is the sum of a multiinstance version of the offline problem and the perturbation. We will see in Appendix B.2 how we can implement the perturbation mechanism as the payoff of action on a set of (random) observations of the problem.
Second, if the multiinstance version is hard, having an efficient algorithm solving the oracle with an additive error is quite improbable. We remark that the assumption of an additive error can be replaced by the assumption of the existence of a FPTAS for the oracle. Namely, let us consider a modification of Algorithm 3 where at at each time we compute a solution such that :
(1) 
Then, if we use to denote the maximum payoff, i.e., , by applying the same analysis as in [7], we can show that by fixing we are guaranteed to get an action that has at least the same total perturbed payoff of decision if an additive optimization parameter was used. The computation is polynomial if we use an FPTAS. Then, we can still get a vanishing regret by using instead of (considering all parameters of the problem as constants).
As a corollary, we can achieve a vanishing regret for any online learning problem in our setting by assuming access to an oracle OPT that can compute (for any ) in polynomial time a decision satisfying Equation (1).
b.2 Distinguisher sets and a translation matrix for GKP
As noted above, an important issue in the method arises from the perturbation. Until now, the translation matrix could be any admissible matrix as long as it had one distinct row for every feasible action in . However, this matrix has to be considered by the oracle in order to decide . In [7] the authors introduce the concept of implementability that overcomes this problem. We present a simplified version of this property.
Definition 6 (Distinguisher Set).
A distinguisher set for an offline problem P is a set of instances such that for any feasible actions :
This means that in a set of instances that “forces” any two different actions to differentiate in at least one of their payoffs over the instances in . If we can determine such a set, then we can construct a translation matrix that significantly simplifies our assumptions on the oracle.
Let be a distinguisher set for our problem. Then, for every feasible action we can construct the corresponding row of such that:
Since is a distinguisher set, the translation matrix is guaranteed to be admissible. Furthermore, according to the set we can always determine some and parameters for the translation matrix. By implementing using a distinguisher set, the expression we need to (approximately) maximize at each round can be written as:
This shows that the perturbations transform into a set of weighted instances, were the weights
are randomly drawn from uniform distribution
. This is already a significant improvement, since now the oracle has to consider only weighted instances of the offline problem and not the arbitrary perturbation we were assuming until now. Furthermore, for a variety of problems (including GKP), we can construct a distinguisher set such that:If this is true, then we can shift the random weights of the oracle inside the instances:
Thus, if we have a distinguisher set for a given problem, to apply GFTPL all we need is an FPTAS for optimizing the total payoff over a set of weighted instances.
We now provide a distinguisher set for the generalized knapsack problem. Consider a set of instances of the problem such that in instance item has profit , all other items have profit 0 and the knapsack capacity is . Since the total weight of a set of items can never exceed , it is easy to see that :
For any two different assignments and , there is at least one item that they don’t have in common. It is easy to see that in the corresponding instance one of the assignments will have total profit and the other will have total profit . Thus, the proposed set of instances is indeed a distinguisher set for the generalized knapsack problem. We use this set of instances to implement the matrix. Then, every column of will have exactly 2 distinct values 0 and , making the translation matrix admissible. As a result, in order to achieve a vanishing regret for online learning GKP, all we need is an FPTAS for the multiinstance generalized knapsack problem.
References
 Agarwal et al. [2019] Naman Agarwal, Alon Gonen, and Elad Hazan. Learning in nonconvex games with an optimization oracle. In Proc. 32nd Conference on Learning Theory, volume 99, pages 18–29, 2019.
 Antoniadis et al. [2013] Antonios Antoniadis, ChienChung Huang, Sebastian Ott, and José Verschae. How to pack your items when you have to buy your knapsack. In Proc. Mathematical Foundations of Computer Science, pages 62–73, 2013.
 Awerbuch and Kleinberg [2008] Baruch Awerbuch and Robert Kleinberg. Online linear optimization and adaptive routing. Journal of Computer and System Sciences, 74(1):97 – 114, 2008.
 Barman et al. [2012] Siddharth Barman, Seeun Umboh, Shuchi Chawla, and David L. Malec. Secretary problems with convex costs. In Proc. 39th Colloquium on Automata, Languages, and Programming, pages 75–87, 2012.
 CesaBianchi et al. [2015] N. CesaBianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in secondprice auctions. IEEE Transactions on Information Theory, 61(1):549–564, Jan 2015.
 Daskalakis and Syrgkanis [2016] Constantinos Daskalakis and Vasilis Syrgkanis. Learning in auctions: Regret is hard, envy is easy. In Proc. 57th Symposium on Foundations of Computer Science, pages 219–228, 2016.
 Dudik et al. [2017] Miroslav Dudik, Nika Haghtalab, Haipeng Luo, Robert E Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan. Oracleefficient online learning and auction design. In Proc. 58th Symposium on Foundations of Computer Science (FOCS), pages 528–539, 2017.
 Freund and Schapire [1997] Yoav Freund and Robert E Schapire. A decisiontheoretic generalization of online learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
 Garber [2017] Dan Garber. Efficient online linear optimization with approximation algorithms. In Advances in Neural Information Processing Systems, pages 627–635, 2017.
 Hannan [1957] James Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.
 Hazan [2016] Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(34):157–325, 2016.

Hazan and Kale [2012]
Elad Hazan and Satyen Kale.
Online submodular minimization.
Journal of Machine Learning Research
, 13:2903–2922, 2012. 
Hazan and Koren [2016]
Elad Hazan and Tomer Koren.
The computational power of optimization in online learning.
In
Proc. 48th Symposium on Theory of Computing
, pages 128–141, 2016.  Kakade et al. [2009] Sham M Kakade, Adam Tauman Kalai, and Katrina Ligett. Playing games with approximation algorithms. SIAM Journal on Computing, 39(3):1088–1106, 2009.
 Kalai and Vempala [2005] Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291 – 307, 2005.
 Khot and Regev [2008] Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within 2epsilon. J. Comput. Syst. Sci., 74(3):335–349, 2008.
 Khot et al. [2018] Subhash Khot, Dor Minzer, and Muli Safra. Pseudorandom sets in grassmann graph have nearperfect expansion. In Proc. 59th Symposium on Foundations of Computer Science, pages 592–601, 2018.
 Littlestone and Warmuth [1994] Nick Littlestone and Manfred K Warmuth. The weighted majority algorithm. Information and computation, 108(2):212–261, 1994.
 Raz and Safra [1997] Ran Raz and Shmuel Safra. A subconstant errorprobability lowdegree test, and a subconstant errorprobability PCP characterization of NP. In Proc. 29th Symposium on the Theory of Computing, pages 475–484, 1997.
 Sahni [1976] Sartaj K. Sahni. Algorithms for scheduling independent tasks. J. ACM, 23(1):116–127, 1976.
 ShalevShwartz et al. [2012] Shai ShalevShwartz et al. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2012.
 Syrgkanis et al. [2016] Vasilis Syrgkanis, Akshay Krishnamurthy, and Robert Schapire. Efficient algorithms for adversarial contextual learning. In International Conference on Machine Learning, pages 2159–2168, 2016.
 Woeginger [2000] Gerhard J. Woeginger. When does a dynamic programming formulation guarantee the existence of a fully polynomial time approximation scheme (fptas)? INFORMS J. on Computing, 12(1), January 2000.
 Zinkevich [2003] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proc. 10th International Conference on Machine Learning, pages 928–935, 2003.
Comments
There are no comments yet.