The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making

05/29/2018
by   Luchen Li, et al.
0

Off-policy reinforcement learning enables near-optimal policy from suboptimal experience, thereby provisions opportunity for artificial intelligence applications in healthcare. Previous works have mainly framed patient-clinician interactions as Markov decision processes, while true physiological states are not necessarily fully observable from clinical data. We capture this situation with partially observable Markov decision process, in which an agent optimises its actions in a belief represented as a distribution of patient states inferred from individual history trajectories. A Gaussian mixture model is fitted for the observed data. Moreover, we take into account the fact that nuance in pharmaceutical dosage could presumably result in significantly different effect by modelling a continuous policy through a Gaussian approximator directly in the policy space, i.e. the actor. To address the challenge of infinite number of possible belief states which renders exact value iteration intractable, we evaluate and plan for only every encountered belief, through heuristic search tree by tightly maintaining lower and upper bounds of the true value of belief. We further resort to function approximations to update value bounds estimation, i.e. the critic, so that the tree search can be improved through more compact bounds at the fringe nodes that will be back-propagated to the root. Both actor and critic parameters are learned via gradient-based approaches. Our proposed policy trained from real intensive care unit data is capable of dictating dosing on vasopressors and intravenous fluids for sepsis patients that lead to the best patient outcomes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2019

Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs

Health-related data is noisy and stochastic in implying the true physiol...
research
07/04/2012

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

We consider the estimation of the policy gradient in partially observabl...
research
06/24/2022

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

We study Reinforcement Learning for partially observable dynamical syste...
research
08/13/2017

Belief Tree Search for Active Object Recognition

Active Object Recognition (AOR) has been approached as an unsupervised l...
research
10/15/2020

Cooperative-Competitive Reinforcement Learning with History-Dependent Rewards

Consider a typical organization whose worker agents seek to collectively...
research
08/13/2019

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

Continuous control and planning remains a major challenge in robotics an...
research
07/06/2019

Entropic Regularization of Markov Decision Processes

An optimal feedback controller for a given Markov decision process (MDP)...

Please sign up or login with your details

Forgot password? Click here to reset