Policy Choice and Best Arm Identification: Comments on "Adaptive Treatment Assignment in Experiments for Policy Choice"

09/16/2021
by   Kaito Ariu, et al.
14

Adaptive experimental design for efficient decision-making is an important problem in economics. The purpose of this paper is to connect the "policy choice" problem, proposed in Kasy and Sautmann (2021) as an instance of adaptive experimental design, to the frontiers of the bandit literature in machine learning. We discuss how the policy choice problem can be framed in a way such that it is identical to what is called the "best arm identification" (BAI) problem. By connecting the literature, we identify that the asymptotic optimality of policy choice algorithms tackled in Kasy and Sautmann (2021) is a long-standing open question in the literature. While Kasy and Sautmann (2021) presents an interesting and important empirical study, unfortunately, this connection highlights several major issues with the theoretical results. In particular, we show that Theorem 1 in Kasy and Sautmann (2021) is false. We find that the proofs of statements (1) and (2) of Theorem 1 are incorrect. Although the statements themselves may be true, they are non-trivial to fix. Statement (3), and its proof, on the other hand, is false, which we show by utilizing existing theoretical results in the bandit literature. As this question is critically important, garnering much interest in the last decade within the bandit community, we provide a review of recent developments in the BAI literature. We hope this serves to highlight the relevance to economic problems and stimulate methodological and theoretical developments in the econometric community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2014

Functional Bandits

We introduce the functional bandit problem, where the objective is to fi...
research
03/02/2023

Open Problem: Optimal Best Arm Identification with Fixed Budget

Best arm identification or pure exploration problems have received much ...
research
05/27/2021

Towards Minimax Optimal Best Arm Identification in Linear Bandits

We study the problem of best arm identification in linear bandits in the...
research
06/14/2022

On the Finite-Time Performance of the Knowledge Gradient Algorithm

The knowledge gradient (KG) algorithm is a popular and effective algorit...
research
02/14/2022

The Impact of Batch Learning in Stochastic Linear Bandits

We consider a special case of bandit problems, named batched bandits, in...
research
02/27/2015

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Motivated by the task of hyperparameter optimization, we introduce the n...
research
04/09/2023

Asymptotic expansion for batched bandits

In bandit algorithms, the randomly time-varying adaptive experimental de...

Please sign up or login with your details

Forgot password? Click here to reset