Active Measure Reinforcement Learning for Observation Cost Minimization

05/26/2020
by   Colin Bellinger, et al.
0

Standard reinforcement learning (RL) algorithms assume that the observation of the next state comes instantaneously and at no cost. In a wide variety of sequential decision making tasks ranging from medical treatment to scientific discovery, however, multiple classes of state observations are possible, each of which has an associated cost. We propose the active measure RL framework (Amrl) as an initial solution to this problem where the agent learns to maximize the costed return, which we define as the discounted sum of rewards minus the sum of observation costs. Our empirical evaluation demonstrates that Amrl-Q agents are able to learn a policy and state estimator in parallel during online training. During training the agent naturally shifts from its reliance on costly measurements of the environment to its state estimator in order to increase its reward. It does this without harm to the learned policy. Our results show that the Amrl-Q agent learns at a rate similar to standard Q-learning and Dyna-Q. Critically, by utilizing an active strategy, Amrl-Q achieves a higher costed return.

READ FULL TEXT
research
05/29/2020

Reinforcement Learning

Reinforcement learning (RL) is a general framework for adaptive control,...
research
11/02/2020

Reinforcement Learning with Efficient Active Feature Acquisition

Solving real-life sequential decision making problems under partial obse...
research
06/27/2019

Emergence of Exploratory Look-Around Behaviors through Active Observation Completion

Standard computer vision systems assume access to intelligently captured...
research
04/27/2023

One-Step Distributional Reinforcement Learning

Reinforcement learning (RL) allows an agent interacting sequentially wit...
research
06/14/2018

Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

In this paper, we focus on policy discrepancy in return-based deep Q-net...
research
06/07/2023

Generalization Across Observation Shifts in Reinforcement Learning

Learning policies which are robust to changes in the environment are cri...
research
05/07/2023

Truncating Trajectories in Monte Carlo Reinforcement Learning

In Reinforcement Learning (RL), an agent acts in an unknown environment ...

Please sign up or login with your details

Forgot password? Click here to reset