DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret

05/06/2020
by   Yichun Hu, et al.
5

Dynamic treatment regimes (DTRs) for are personalized, sequential treatment plans that adapt the treatment decisions to an individual's time-varying features and intermediate outcomes at each treatment stage. While existing literature mostly focuses on learning the optimal DTR from sequentially randomized data, we study the problem of developing the optimal DTR in an online manner, where decisions in each round affect both our cumulative reward and our data collection for future learning, which we term the DTR bandit problem. We propose a novel algorithm that, by carefully balancing exploration and exploitation, achieves rate-optimal regret when the transition and reward models are linear. We demonstrate the empirical success of our algorithm both on synthetic data and in data from a real-world randomized trial for major depressive disorder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

In this paper, we study the problem of optimal data collection for polic...
research
04/14/2022

Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine

Dynamic treatment regimes (DTRs) are used in medicine to tailor sequenti...
research
02/21/2020

Online Batch Decision-Making with High-Dimensional Covariates

We propose and investigate a class of new algorithms for sequential deci...
research
03/24/2022

Making SMART decisions in prophylaxis and treatment studies

The optimal prophylaxis, and treatment if the prophylaxis fails, for a d...
research
11/01/2020

Experimental Design for Regret Minimization in Linear Bandits

In this paper we propose a novel experimental design-based algorithm to ...
research
10/14/2022

A Reinforcement Learning Approach to Estimating Long-term Treatment Effects

Randomized experiments (a.k.a. A/B tests) are a powerful tool for estima...
research
11/11/2019

A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro Data

Precision oncology, the genetic sequencing of tumors to identify druggab...

Please sign up or login with your details

Forgot password? Click here to reset