Nested Policy Reinforcement Learning

10/06/2021
by   Aishwarya Mandyam, et al.
0

Off-policy reinforcement learning (RL) has proven to be a powerful framework for guiding agents' actions in environments with stochastic rewards and unknown or noisy state dynamics. In many real-world settings, these agents must operate in multiple environments, each with slightly different dynamics. For example, we may be interested in developing policies to guide medical treatment for patients with and without a given disease, or policies to navigate curriculum design for students with and without a learning disability. Here, we introduce nested policy fitted Q-iteration (NFQI), an RL framework that finds optimal policies in environments that exhibit such a structure. Our approach develops a nested Q-value function that takes advantage of the shared structure between two groups of observations from two separate environments while allowing their policies to be distinct from one another. We find that NFQI yields policies that rely on relevant features and perform at least as well as a policy that does not consider group structure. We demonstrate NFQI's performance using an OpenAI Gym environment and a clinical decision making RL task. Our results suggest that NFQI can develop policies that are better suited to many real-world clinical environments.

READ FULL TEXT

page 5

page 9

research
01/14/2022

Reinforcement Learning in Time-Varying Systems: an Empirical Study

Recent research has turned to Reinforcement Learning (RL) to solve chall...
research
01/05/2019

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

Real-world tasks are often highly structured. Hierarchical reinforcement...
research
01/21/2023

Quasi-optimal Learning with Continuous Treatments

Many real-world applications of reinforcement learning (RL) require maki...
research
09/03/2020

Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

Learning classifier systems (LCSs) are population-based predictive syste...
research
11/19/2019

MANGA: Method Agnostic Neural-policy Generalization and Adaptation

In this paper we target the problem of transferring policies across mult...
research
05/31/2022

Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Many real-world settings involve costs for performing actions; transacti...
research
10/18/2019

RTFM: Generalising to Novel Environment Dynamics via Reading

Obtaining policies that can generalise to new environments in reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset