Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks

09/13/2017
by   Yingfei Wang, et al.
0

We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success in either offline (training) or online (testing) phases. Our problem is motivated by real-world applications where observations are time-consuming and/or expensive. We develop a knowledge gradient policy using an online Bayesian linear classifier to guide the experiment by maximizing the expected value of information of labeling each alternative. We provide a finite-time analysis of the estimated error and show that the maximum likelihood estimator based produced by the KG policy is consistent and asymptotically normal. We also show that the knowledge gradient policy is asymptotically optimal in an offline setting. This work further extends the knowledge gradient to the setting of contextual bandits. We report the results of a series of experiments that demonstrate its efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2015

The Knowledge Gradient with Logistic Belief Models for Binary Classification

We consider sequential decision making problems for binary classificatio...
research
05/06/2023

Efficient Learning for Selecting Top-m Context-Dependent Designs

We consider a simulation optimization problem for a context-dependent de...
research
07/06/2016

An optimal learning method for developing personalized treatment regimes

A treatment regime is a function that maps individual patient informatio...
research
10/14/2020

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Online decision-making problem requires us to make a sequence of decisio...
research
11/27/2021

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Offline policy learning (OPL) leverages existing data collected a priori...
research
10/14/2020

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Online decision making aims to learn the optimal decision rule by making...
research
01/02/2020

Adversarial Policies in Learning Systems with Malicious Experts

We consider a learning system based on the conventional multiplicative w...

Please sign up or login with your details

Forgot password? Click here to reset