HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare

02/18/2023
by   Ge Gao, et al.
0

Reinforcement learning (RL) has been extensively researched for enhancing human-environment interactions in various human-centric tasks, including e-learning and healthcare. Since deploying and evaluating policies online are high-stakes in such tasks, off-policy evaluation (OPE) is crucial for inducing effective policies. In human-centric environments, however, OPE is challenging because the underlying state is often unobservable, while only aggregate rewards can be observed (students' test scores or whether a patient is released from the hospital eventually). In this work, we propose a human-centric OPE (HOPE) to handle partial observability and aggregated rewards in such environments. Specifically, we reconstruct immediate rewards from the aggregated rewards considering partial observability to estimate expected total returns. We provide a theoretical bound for the proposed method, and we have conducted extensive experiments in real-world human-centric tasks, including sepsis treatments and an intelligent tutoring system. Our approach reliably predicts the returns of different policies and outperforms state-of-the-art benchmarks using both standard validation methods and human-centric significance tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2021

Off-Policy Reinforcement Learning with Delayed Rewards

We study deep reinforcement learning (RL) algorithms with delayed reward...
research
06/10/2020

Reinforcement Learning from a Mixture of Interpretable Experts

Reinforcement learning (RL) has demonstrated its ability to solve high d...
research
08/18/2020

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

As the operations of autonomous systems generally affect simultaneously ...
research
07/20/2023

Reparameterized Policy Learning for Multimodal Trajectory Optimization

We investigate the challenge of parametrizing policies for reinforcement...
research
06/06/2022

A Human-Centric Take on Model Monitoring

Predictive models are increasingly used to make various consequential de...
research
10/19/2022

RCareWorld: A Human-centric Simulation World for Caregiving Robots

We present RCareWorld, a human-centric simulation world for physical and...
research
06/11/2022

Network Centric Policy Design

Two important challenges in policy design are better understanding of th...

Please sign up or login with your details

Forgot password? Click here to reset