Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion

02/05/2018
by   Isaac J. Sledge, et al.
0

Reinforcement learning in environments with many action-state pairs is challenging. At issue is the number of episodes needed to thoroughly search the policy space. Most conventional heuristics address this search problem in a stochastic manner. This can leave large portions of the policy space unvisited during the early training stages. In this paper, we propose an uncertainty-based, information-theoretic approach for performing guided stochastic searches that more effectively cover the policy space. Our approach is based on the value of information, a criterion that provides the optimal trade-off between expected costs and the granularity of the search process. The value of information yields a stochastic routine for choosing actions during learning that can explore the policy space in a coarse to fine manner. We augment this criterion with a state-transition uncertainty factor, which guides the search process into previously unexplored regions of the policy space.

READ FULL TEXT

page 11

page 16

research
02/28/2017

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Conventional reinforcement learning methods for Markov decision processe...
research
10/08/2017

Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy ...
research
03/21/2019

Reduction of Markov Chains using a Value-of-Information-Based Approach

In this paper, we propose an approach to obtain reduced-order models of ...
research
10/28/2021

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

Temporal-Difference (TD) learning methods, such as Q-Learning, have prov...
research
04/14/2020

A Demonstration of Issues with Value-Based Multiobjective Reinforcement Learning Under Stochastic State Transitions

We report a previously unidentified issue with model-free, value-based a...
research
09/16/2022

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

Lying on the heart of intelligent decision-making systems, how policy is...
research
06/28/2021

Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

Many transfer problems require re-using previously optimal decisions for...

Please sign up or login with your details

Forgot password? Click here to reset