1 Introduction
One of the most fundamental and wellstudied questions in learning theory is whether one can learn a given problem using an optimization oracle. For online learning in games, it was shown by Kalai and Vempala (2005) that an optimization oracle giving the best decision in hindsight is sufficient for attaining optimal regret.
However, in many nonconvex settings, such an optimization oracle is either unavailable or NPhard to compute. In contrast, in many such cases, efficient approximation algorithms are usually known, and are guaranteed to return a solution within a certain multiplicative factor of the optimum. These include not only combinatorial optimization problems such as
Max Cut, Weighted Set Cover, Metric Traveling Salesman Problem, Set Packing, etc., but also machine learning problems such as
Low Rank Matrix Completion.Kakade et al. (2009) considered the question of whether an approximation algorithm is sufficient to obtain vanishing regret compared with an approximation to the best solution in hindsight. They gave an algorithm for this offlinetoonline conversion. However, their reduction is inefficient in the number of periteration queries to the approximation oracle, which grows linearly with time. Ideally, an efficient reduction should call the oracle only a constant number of times per iteration and guarantee optimal regret at the same time, and this was considered an open question in the literature.
Various authors have improved upon this original offlinetoonline reduction under certain cases, as we survey below. Recently, Garber (2017) has made significant progress by giving a more efficient reduction, which improves the number of oracle calls both in the full information and the bandit settings. He explicitly asked whether a nearoptimal reduction with only logarithmically many calls per iteration exists.
1.1 Our Results
In this paper we resolve this question on the positive side, and in a more general setting. We give two different algorithms in the full information setting, one based on the online mirror descent (OMD) method and another based on the continuous multiplicative weight update (CMWU)
algorithm, which give optimal regret and are oracleefficient. Furthermore, our algorithms apply to more general loss vectors. Our results are summarized in the table below.
Algorithm  regret in rounds  oracle complexity per round  loss vectors 

Kakade et al. (2009)  general  
Garber (2017)  nonnegative  
Algorithm 1 (OMD)  PNIP property  
Algorithm 5 (CMWU)  general 
In addition to these two algorithms, we give an improved bandit algorithm based on OMD: it attains the same regret as in (Kakade et al., 2009; Garber, 2017) with a lower computational cost: our method requires oracle calls over all the game iterations, as opposed to in the previous best method.
Besides the improved oracle complexity, our methods have the following additional advantages:

While the algorithm in (Garber, 2017) requires nonnegative loss vectors, our second algorithm, based on CMWU, can work with general loss vectors. Furthermore, our OMDbased algorithm can also work with loss vectors from any convex cone satisfying the pairwise nonnegative inner product (PNIP) property defined in Definition 4.1 (together with an appropriately chosen regularizer), which is more general than the nonnegative orthant.

Our methods apply to a general online improper learning setting, in which the predictions can be from a potentially different set from the target set to compete against. Previous work considered this different set to be a constant multiple of the target set, which makes sense primarily for combinatorial optimization problems.
However, in many interesting problems, such as Low Rank Matrix Completion, the natural approximation algorithm returns a matrix of higher rank. This is not in a constant multiple of the set of all low rank matrices, and our additional generality allows us to obtain meaningful results even for this case.

Our first algorithm is based on the general OMD methodology, and thus allows any strongly convex regularizer. This can give better regret bounds, in terms of the space geometry, compared with the previous algorithm of (Garber, 2017) that is based on online gradient descent and Euclidean regularization. The improvement in regret bounds can be as large as the dimension.

Our bandit algorithm is based on OMD with a new regularizer that is inspired from the construction of barycentric spanners, and may be of independent interest.
1.2 Our Techniques
The more general one of our algorithms is based on a completely different methodology compared with previous onlinetooffline reductions. It is a variant of the continuous multiplicative weight update (CMWU) algorithm, or the continuous hedge algorithm. Our idea is to apply CMWU over a superset of the target set, and in every iteration the algorithm tries to play the mean of a loglinear distribution. To check feasibility of this mean, we show how to design a separationordecomposition oracle
, which either certifies that the mean is infeasible  in this case it provides a separating hyperplane between the mean and the target set and thus gives a more refined superset of the target set, or provides a distribution over feasible points whose average is superior to the mean in terms of the regret. Using this approach, the more oracle calls the algorithm makes, the tighter superset it can obtain, and we show an interesting tradeoff between the oracle complexity and the regret bound.
The other algorithm follows the line of Garber (2017). We show how to significantly speed up Garber’s infeasible projection oracle, and to generalize Garber’s algorithm from online gradient descent (OGD) to online mirror descent (OMD).
This additional generality is crucial in our bandit algorithm, where we make use of a novel regularizer in OMD, called the barycentric regularizer
, in order to have a lowvariance unbiased estimator of the loss vector. This geometric regularization may be of independent interest.
1.3 Related Work
The reduction from online learning to offline approximation algorithms was already considered by Kalai and Vempala (2005). Their scheme, based on the followtheperturbedleader (FTPL) algorithm, requires very strong approximation guarantee from the approximation oracle, namely, a fully polynomial time approximation scheme (FPTAS), and requires an approximation that improves with time. Balcan and Blum (2006) used the same approach in the context of mechanism design.
Kalai and Vempala (2005) also proposed a specialized reduction that works under certain conditions on the approximation oracle, satisfied by some known algorithms for problems such as MAXCUT. Fujita et al. (2013) further gave more general reductions that apply to problems whose approximation algorithms are based on convex relaxations of mathematical programs. Their scheme is also based on the FTPL method.
Recent advancements on blackbox onlinetooffline reductions were made in (Kakade et al., 2009; Dudík et al., 2016; Garber, 2017). Hazan and Koren (2016) showed that efficient reductions are in general impossible, unless special structure is present. In the settings we consider this special structure is a linear cost function over the space.
Our algorithms fall into one of two templates. The first is the online mirror descent method, which is an adaptive version of the followtheregularizedleader (FTRL) algorithm. The second is the continuous multiplicative weight update method, which dates back to Cover’s portfolio selection method (Cover, 1991) and Vovk’s aggregating algorithm (Vovk, 1990). The reader is referred to the books (CesaBianchi and Lugosi, 2006; ShalevShwartz, 2012; Hazan, 2016) for details and background on these prediction frameworks. We also make use of polynomialtime algorithms for sampling from logconcave distributions (Lovász and Vempala, 2007).
2 Preliminaries
We use to denote the Euclidean norm of a vector . For and , denote by the Euclidean ball in of radius centered at , i.e., . For , , and , define , , , and . The convex hull of is denoted by . Denote by the volume (Lebesgue measure) of a set . Denote by
the probability simplex in
, i.e., .A set is called a cone if for any we have . For any , define the dual cone of as . is always a convex cone, even when is neither convex nor a cone.
For any closed set , define to be the projection onto , namely The wellknown Pythagorean theorem characterizes an important property of projections onto convex sets:
Lemma 2.1 (Pythagorean theorem).
For any closed convex set , and , we have , or equivalently, .
Definition 2.2.
A function () is Legendre if

is convex;

is strictly convex with continuous gradient defined over ’s interior ;

for any sequence converging to a boundary point of , .
Definition 2.3.
For a Legendre function , the Bregman divergence with respect to is defined as ().
The Pythagorean theorem can be generalized to projections with respect to a Bregman divergence (see e.g. Lemma 11.3 in (CesaBianchi and Lugosi, 2006)):
Lemma 2.4 (Generalized Pythagorean theorem).
For any closed convex set , , , and any Legendre function , letting , we must have .
Logconcave distributions.
A distribution over with a density function is said to be logconcave if is a concave function. For a convex set equipped with a membership oracle, there exist polynomialtime algorithms for sampling from any logconcave distribution over (Lovász and Vempala, 2007). This can be used to approximately compute the mean of any logconcave distribution.
We have the following classical result which says that every halfspace close enough to the mean of a logconcave distribution must contain at least constant probability mass. For simplicity, we only state and prove the result for isotropic (i.e., identity covariance) logconcave distributions, but the result can be easily generalized to allow arbitrary covariance.
Lemma 2.5.
Consider any isotropic (identity covariance) logconcave distribution over with mean . Then for any halfspace such that , we have .
The proof of Lemma 2.5 is given in Appendix A. As an implication, we have the following lemma regarding mean computation of a logconcave distribution, which is useful in this paper.
Lemma 2.6.
For any logconcave distribution in with mean , whose support is in (), and any and , it is possible to compute a point in time such that with probability at least we have:

;

for any half space containing , .
For our purpose in this paper, it always suffices to choose and ( being the total number of rounds) without hurting our regret bounds. Therefore, for ease of presentation, we will assume that we can compute the mean of boundedsupported logconcave distributions exactly.
3 Online Improper Linear Optimization with an Improper Optimization Oracle
Now we describe the problem setting we consider in this paper. Let () be two compact subsets of , and let be a convex cone. Suppose we have an improper linear optimization oracle , which given an input can output a point such that
In other words, it performs linear optimization over but is allowed to output a point from a (possibly different) set . Note that this implicitly requires that “dominates” in all directions in , that is, for all we must have .
Online improper linear optimization.
Consider a repeated game with rounds. In round , the player chooses a point while an adversary chooses a loss vector (), and then the player incurs a loss . The goal for the player is to have a cumulative loss that is comparable to that of the best single decision in hindsight.
We assume that the player only has access to the optimization oracle . Therefore, it is only fair to compare with the best decision in in hindsight. The (improper) regret over rounds is defined as
We sometimes treat as a function on , i.e., .
Full information and bandit settings.
We consider both full information and bandit settings. In the full information setting, after the player makes her choice in round , the entire loss vector is revealed to the player; in the bandit setting, only the loss value is revealed to the player.
regret minimization with an approximation oracle.
The problem of online linear optimization with an approximation oracle considered by Kakade et al. (2009) and Garber (2017) is a special instance in our online improper linear optimization framework. In this problem, the player has access to an approximate linear optimization oracle over (), which given a direction as input can output a point such that
In this setting we will consider and ; many combinatorial optimization problems with efficient approximation algorithms fall into this regime. The goal in the online problem is therefore to minimize the regret, defined as
To see why this is a special case of online improper linear optimization, note that we can take and then the approximation oracle is equivalent to and the regret is equal to the improper regret .
4 Efficient Online Improper Linear Optimization via Online Mirror Descent
In this section, we give an efficient online improper linear optimization algorithm (in the full information setting) based on online mirror descent (OMD) equipped with a strongly convex regularizer , which achieves regret when the regularizer
and the domain of linear loss functions
satisfy the pairwise nonnegative inner product (PNIP) property (Definition 4.1). This property holds for many interesting domains with appropriately chosen regularizers. Notable examples include the nonnegative orthant , the positive semidefinite matrix cone, and the Lorentz cone .Definition 4.1 (Pairwise nonnegative inner product).
For a twicedifferentiable Legendre function with domain and a convex cone , we say satisfies the pairwise nonnegative inner product (PNIP) property, if for all and , where , it holds that .
Examples.
satisfies the PNIP property if:

(with domain ) and ;

(with domain ) and ;

(with domain ), where ,
is an invertible matrix, and
. This is useful in our bandit algorithm in Section 6.
4.1 Online Mirror Descent with a ProjectionandDecomposition Oracle
We first show that assuming the availability of a projectionanddecomposition (PAD) oracle (Definition 4.2), we can implement a variant of the OMD algorithm that achieves optimal regret. In Section 4.2
, we show how to construct a PAD oracle using the oracle
. In Section 4.3, we bound the number of oracle calls to in our algorithm.Definition 4.2 (Projectionanddecomposition oracle).
A projectionanddecomposition (PAD) oracle onto , , is defined as a procedure that given , , a convex cone and a Legendre function produces a tuple , where , and , such that:

is “closer” to than with respect to the Bregman divergence of (and hence is an “infeasible projection”): ;

, and is a point that “almost dominates” in all directions in . In other words, there exists such that .
The purpose of the PAD oracle is the following. Suppose the OMD algorithm tells us to play a point . Since might not be in the feasible set , we can call the PAD oracle to find another point as well as a distribution over points . The first property in Definition 4.2 is sufficient to ensure that playing also gives low regret, and the second property further ensures that we have a distribution of points in that suffers less loss than for every possible loss function so we can play according to that distribution.
Theorem 4.3.
Suppose satisfies the PNIP property (Definition 4.1). Then for any , Algorithm 1 satisfies the following regret guarantee:
In particular, if is strongly convex and , setting and , we have
Proof.
First, for any fixed round , let be the output of in this round. We know by the second property of the PAD oracle that there exists such that . Since is equal to with probability , letting , we have
(1) 
We make use of the following properties of Bregman divergence, which can be verified easily (see e.g. Section 11.2 in (CesaBianchi and Lugosi, 2006)):
(2) 
Consider any . We have
(3)  
(by algorithm definition)  
(by (2))  
(by property of the PAD oracle)  
(by telescoping) 
Combining (1) and (3), we can bound the expected improper regret of Algorithm 1 as
(4)  
By the optimality condition , we have
(5) 
Plugging (5) into (4) and noting , we finish the proof of the first regret bound.
When is strongly convex, we have the following wellknown property:^{5}^{5}5See http://xingyuzhou.org/blog/notes/strongconvexity for a proof.
Then by the definition in Algorithm 1 we have
(6) 
From the above inequality and the choices of parameters and , we have
For the problem of regret minimization using an approximation oracle, we have the following regret guarantee, which is an immediate corollary of Theorem 4.3.
Corollary 4.4.
If , , , , setting , , Algorithm 1 has the following regret guarantee:
4.2 Construction of the ProjectionandDecomposition Oracle
Now we show how to construct the PAD oracle using the improper linear optimization oracle . Our construction is given in Algorithm 2.
Theorem 4.5.
We break the proof of Theorem 4.5 into several lemmas.
Lemma 4.6.
Proof.
Since we have , by the KKT condition, we have
for some . On the other hand, note that , for some , where . Therefore, for all we have . This means . ∎
Proof.
According to the algorithm, for each , is the Bregman projection of onto a halfspace containing , since the oracle ensures for all . Then by the generalized Pythagorean theorem (Lemma 2.4) we know for all and . Therefore we have for all and .
Let . Then there exists such that for all , where the last inequality is due to the strong convexity of . This implies for all . Therefore, when , we must have , which means the loop must have terminated at this time. This proves the lemma. ∎
Lemma 4.8.
Under the setting of Theorem 4.5, for all , there exists such that .
Proof.
We assume for contradiction that there exists a unit vector such that . Note that . Letting , we have
Since for , we have .
By the algorithm, we know that for all , there exists such that . Notice that from Lemma 4.6 we know for all . Thus for all there exists such that . In other words, we have
Therefore, we must have . We also have for each from Lemma 2.6, since is the intersection of with a halfspace that does not contain ’s centroid in the interior. Then we have
where the last step is due to , which is true according to the termination condition of the loop. Therefore we have a contradiction. ∎
We need the following basic property of projection onto a convex cone. The proof is given in Appendix B.
Lemma 4.9.
For any closed convex cone and any , we have .
The following lemma is a more general version of Lemma 6 in (Garber, 2017).
Lemma 4.10.
Given , and a convex cone , for any , the following two statements are equivalent:

There exists and such that .

For all , , there exists such that .
Geometric interpretation of Lemma 4.10.
Before proving Lemma 4.10, we discuss its geometric intuition. For simplicity of illustration, we only consider here. First we look at the case where . In this case the lemma simply degenerated to the fact
In the general case where is an arbitrary convex cone, lemma 4.10 becomes
Denote . For the “” side, if , it is clear that for all we must have for some . For the “” side, if , then satisfies for all . Moreover it is easy to see , which completes the proof. See Figure 1 for a graphic illustration.
Proof of Lemma 4.10.
Suppose (A) holds. Then for any , , we have