Inverse Policy Evaluation for Value-based Sequential Decision-making

08/26/2020
by   Alan Chan, et al.
11

Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., Q-learning), and acting greedily with respect to the estimates with an arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the values reflect those of some policy, over which the greedy policy will be an improvement. However, value-iteration can produce value functions that do not correspond to any policy. This is especially relevant in the function-approximation regime, when the true value function can't be perfectly represented. In this work, we explore the use of inverse policy evaluation, the process of solving for a likely policy given a value function, for deriving behavior from a value function. We provide theoretical and empirical results to show that inverse policy evaluation, combined with an approximate value iteration algorithm, is a feasible method for value-based control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2021

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

We study the off-policy evaluation (OPE) problem in reinforcement learni...
research
07/02/2014

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

Tackling large approximate dynamic programming or reinforcement learning...
research
07/30/2021

Maximum Entropy Dueling Network Architecture

In recent years, there have been many deep structures for Reinforcement ...
research
11/19/2014

Compress and Control

This paper describes a new information-theoretic policy evaluation techn...
research
08/28/2018

High-confidence error estimates for learned value functions

Estimating the value function for a fixed policy is a fundamental proble...
research
03/04/2021

Inverse Reinforcement Learning with Explicit Policy Estimates

Various methods for solving the inverse reinforcement learning (IRL) pro...
research
03/28/2018

One-step dispatching policy improvement in multiple-server queueing systems with Poisson arrivals

Policy iteration techniques for multiple-server dispatching rely on the ...

Please sign up or login with your details

Forgot password? Click here to reset