Bandit Learning for Dynamic Colonel Blotto Game with a Budget Constraint

03/23/2021
by   Vincent Leon, et al.
0

We consider a dynamic Colonel Blotto game (CBG) in which one of the players is the learner and has limited troops (budget) to allocate over a finite time horizon. At each stage, the learner strategically determines the budget and its distribution to allocate among the battlefields based on past observations. The other player is the adversary, who chooses its budget allocation strategies randomly from some fixed but unknown distribution. The learner's objective is to minimize the regret, which is defined as the difference between the optimal payoff in terms of the best dynamic policy and the realized payoff by following a learning algorithm. The dynamic CBG is analyzed under the framework of combinatorial bandit and bandit with knapsacks. We first convert the dynamic CBG with the budget constraint to a path planning problem on a graph. We then devise an efficient dynamic policy for the learner that uses a combinatorial bandit algorithm Edge on the path planning graph as a subroutine for another algorithm LagrangeBwK. A high-probability regret bound is derived, and it is shown that under the proposed policy, the learner's regret in the budget-constrained dynamic CBG matches (up to a logarithmic factor) that of the repeated CBG without budget constraints.

READ FULL TEXT
research
09/11/2019

Combinatorial Bandits for Sequential Learning in Colonel Blotto Games

The Colonel Blotto game is a renowned resource allocation problem with a...
research
06/16/2023

Understanding the Role of Feedback in Online Learning with Switching Costs

In this paper, we study the role of feedback in online learning with swi...
research
04/09/2012

Knapsack based Optimal Policies for Budget-Limited Multi-Armed Bandits

In budget-limited multi-armed bandit (MAB) problems, the learner's actio...
research
02/09/2022

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

We consider the combinatorial bandits problem with semi-bandit feedback ...
research
05/27/2019

Colonel Blotto and Hide-and-Seek Games as Path Planning Problems with Side Observations

Resource allocation games such as the famous Colonel Blotto (CB) and Hid...
research
11/19/2019

Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek

Resource allocation games such as the famous Colonel Blotto (CB) and Hid...
research
05/27/2019

Colonel Blotto Games and Hide-and-Seek Games as Path Planning Problems with Side Observations

Resource allocation games such as the famous Colonel Blotto (CB) and Hid...

Please sign up or login with your details

Forgot password? Click here to reset