Online Subset Selection using α-Core with no Augmented Regret

09/28/2022
by   Sourav Sahoo, et al.
0

We consider the problem of sequential sparse subset selections in an online learning setup. Assume that the set [N] consists of N distinct elements. On the t^th round, a monotone reward function f_t: 2^[N]→ℝ_+, which assigns a non-negative reward to each subset of [N], is revealed to a learner. The learner selects (perhaps randomly) a subset S_t ⊆ [N] of k elements before the reward function f_t for that round is revealed (k ≤ N). As a consequence of its choice, the learner receives a reward of f_t(S_t) on the t^th round. The learner's goal is to design an online subset selection policy to maximize its expected cumulative reward accrued over a given time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new concept of α-Core, which is a generalization of the notion of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called α-augmented regret. In this new metric, the power of the offline benchmark is suitably augmented compared to the online policy. We give several illustrative examples to show that a broad class of reward functions, including submodular, can be efficiently learned with the SCore policy. We also outline how the SCore policy can be used under a semi-bandit feedback model and conclude the paper with a number of open problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

k – Online Policies and Fundamental Limits

This paper introduces and studies the k problem – a generalization of th...
research
07/18/2022

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems u...
research
05/08/2018

Multinomial Logit Bandit with Linear Utility Functions

Multinomial logit bandit is a sequential subset selection problem which ...
research
01/14/2022

On Reward-Penalty-Selection Games

The Reward-Penalty-Selection Problem (RPSP) can be seen as a combination...
research
10/16/2020

Online non-convex optimization with imperfect feedback

We consider the problem of online learning with non-convex losses. In te...
research
10/28/2018

MaxHedge: Maximising a Maximum Online with Theoretical Performance Guarantees

We introduce a new online learning framework where, at each trial, the l...
research
02/16/2021

Making the most of your day: online learning for optimal allocation of time

We study online learning for optimal allocation when the resource to be ...

Please sign up or login with your details

Forgot password? Click here to reset