Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs

05/17/2019
by   Luchen Li, et al.
0

Health-related data is noisy and stochastic in implying the true physiological states of patients, limiting information contained in single-moment observations for sequential clinical decision making. We model patient-clinician interactions as partially observable Markov decision processes (POMDPs) and optimize sequential treatment based on belief states inferred from history sequence. To facilitate inference, we build a variational generative model and boost state representation with a recurrent neural network (RNN), incorporating an auxiliary loss from sequence auto-encoding. Meanwhile, we optimize a continuous policy of drug levels with an actor-critic method where policy gradients are obtained from a stablized off-policy estimate of advantage function, with the value of belief state backed up by parallel best-first suffix trees. We exploit our methodology in optimizing dosages of vasopressor and intravenous fluid for sepsis patients using a retrospective intensive care dataset and evaluate the learned policy with off-policy policy evaluation (OPPE). The results demonstrate that modelling as POMDPs yields better performance than MDPs, and that incorporating heuristic search improves sample efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2018

The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making

Off-policy reinforcement learning enables near-optimal policy from subop...
research
07/04/2012

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

We consider the estimation of the policy gradient in partially observabl...
research
03/13/2020

Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation

Our aim is to establish a framework where reinforcement learning (RL) of...
research
09/14/2015

Optimization of anemia treatment in hemodialysis patients via reinforcement learning

Objective: Anemia is a frequent comorbidity in hemodialysis patients tha...
research
05/23/2022

Flow-based Recurrent Belief State Learning for POMDPs

Partially Observable Markov Decision Process (POMDP) provides a principl...
research
10/08/2021

Medical Dead-ends and Learning to Identify High-risk States and Treatments

Machine learning has successfully framed many sequential decision making...
research
12/17/2021

Sequential decision making for a class of hidden Markov processes, application to medical treatment optimisation

Motivated by a medical decision making problem, this paper focuses on an...

Please sign up or login with your details

Forgot password? Click here to reset