Avoiding Confusion between Predictors and Inhibitors in Value Function Approximation

12/19/2013
by   Patrick C. Connor, et al.
0

In reinforcement learning, the goal is to seek rewards and avoid punishments. A simple scalar captures the value of a state or of taking an action, where expected future rewards increase and punishments decrease this quantity. Naturally an agent should learn to predict this quantity to take beneficial actions, and many value function approximators exist for this purpose. In the present work, however, we show how value function approximators can cause confusion between predictors of an outcome of one valence (e.g., a signal of reward) and the inhibitor of the opposite valence (e.g., a signal canceling expectation of punishment). We show this to be a problem for both linear and non-linear value function approximators, especially when the amount of data (or experience) is limited. We propose and evaluate a simple resolution: to instead predict reward and punishment values separately, and rectify and add them to get the value needed for decision making. We evaluate several function approximators in this slightly different value function approximation architecture and show that this approach is able to circumvent the confusion and thereby achieve lower value-prediction errors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2018

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning

Learning in sparse reward settings remains a challenge in Reinforcement ...
research
06/27/2012

A compact, hierarchical Q-function decomposition

Previous work in hierarchical reinforcement learning has faced a dilemma...
research
12/04/2015

Q-Networks for Binary Vector Actions

In this paper reinforcement learning with binary vector actions was inve...
research
02/19/2020

Value-driven Hindsight Modelling

Value estimation is a critical component of the reinforcement learning (...
research
05/20/2022

Seeking entropy: complex behavior from intrinsic motivation to occupy action-state path space

Intrinsic motivation generates behaviors that do not necessarily lead to...
research
04/15/2021

Scale Invariant Solutions for Overdetermined Linear Systems with Applications to Reinforcement Learning

Overdetermined linear systems are common in reinforcement learning, e.g....
research
11/13/2022

CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification

Data valuation, or the valuation of individual datum contributions, has ...

Please sign up or login with your details

Forgot password? Click here to reset