Statistical Consequences of Dueling Bandits

10/16/2021
by   Nayan Saxena, et al.
0

Multi-Armed-Bandit frameworks have often been used by researchers to assess educational interventions, however, recent work has shown that it is more beneficial for a student to provide qualitative feedback through preference elicitation between different alternatives, making a dueling bandits framework more appropriate. In this paper, we explore the statistical quality of data under this framework by comparing traditional uniform sampling to a dueling bandit algorithm and find that dueling bandit algorithms perform well at cumulative regret minimisation, but lead to inflated Type-I error rates and reduced power under certain circumstances. Through these results we provide insight into the challenges and opportunities in using dueling bandit algorithms to run adaptive experiments.

READ FULL TEXT
research
03/11/2018

Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type

Design of incentives or recommendations to users is becoming more common...
research
10/30/2021

Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

Using bandit algorithms to conduct adaptive randomised experiments can m...
research
10/02/2015

A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit

Adaptive and sequential experiment design is a well-studied area in nume...
research
09/06/2023

Getting too personal(ized): The importance of feature choice in online adaptive algorithms

Digital educational technologies offer the potential to customize studen...
research
09/14/2018

Dueling Bandits with Qualitative Feedback

We formulate and study a novel multi-armed bandit problem called the qua...
research
08/13/2021

Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models

How to explore efficiently is a central problem in multi-armed bandits. ...
research
02/18/2022

Adaptivity and Confounding in Multi-Armed Bandit Experiments

We explore a new model of bandit experiments where a potentially nonstat...

Please sign up or login with your details

Forgot password? Click here to reset