Online Subset Selection using α-Core with no Augmented Regret

09/28/2022
by   Sourav Sahoo, et al.
0

We consider the problem of sequential sparse subset selections in an online learning setup. Assume that the set [N] consists of N distinct elements. On the t^th round, a monotone reward function f_t: 2^[N]→ℝ_+, which assigns a non-negative reward to each subset of [N], is revealed to a learner. The learner selects (perhaps randomly) a subset S_t ⊆ [N] of k elements before the reward function f_t for that round is revealed (k ≤ N). As a consequence of its choice, the learner receives a reward of f_t(S_t) on the t^th round. The learner's goal is to design an online subset selection policy to maximize its expected cumulative reward accrued over a given time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new concept of α-Core, which is a generalization of the notion of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called α-augmented regret. In this new metric, the power of the offline benchmark is suitably augmented compared to the online policy. We give several illustrative examples to show that a broad class of reward functions, including submodular, can be efficiently learned with the SCore policy. We also outline how the SCore policy can be used under a semi-bandit feedback model and conclude the paper with a number of open problems.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro