Small-loss bounds for online learning with partial information
We consider the problem of adversarial (non-stochastic) online learning with partial information feedback, where at each stage, a decision maker picks an action from a finite set of possible actions. We develop a black-box approach to solving such problems where the learner observes as feedback only losses of a subset of the actions that include the selected action. Specifically, when losses of actions are non-negative, under the graph-based feedback model introduced by Mannor and Shamir, we offer algorithms that attain the so called "small-loss" regret bounds with high probability. Prior to our work, there was no data-dependent guarantee for general feedback graphs even for pseudo-regret (without dependence on the number of actions, i.e., taking advantage of the increased information feedback). Addressing this, we provide a high probability small-loss guarantee. Taking advantage of the black-box nature of our technique, we show applications to getting high probability small loss guarantees for semi-bandits (including routing in networks) and contextual bandits, including possibly infinite comparator class (such as infinite possible strategies in contextual bandits) as well as learning with slowly changing (shifting) comparators. In the special case of classical bandit and semi-bandit problems, we provide optimal small-loss, high-probability guarantees of O(√(dL^)) for the actual regret, where d is the number of arms and L^ is the loss of the best arm or action, answering open questions of Neu. Previous work for bandits and semi-bandits offered analogous regret guarantee only for pseudo-regret and only in expectation.
READ FULL TEXT