Unsupervised Basis Function Adaptation for Reinforcement Learning

03/23/2017
by   Edward Barker, et al.
0

When using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate (i.e. to learn) approximation architectures. We investigate a method of adapting approximation architectures which uses feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. We introduce an algorithm based upon this idea which adapts a state aggregation approximation architecture on-line. Assuming S states, we demonstrate theoretically that - provided the following relatively non-restrictive assumptions are satisfied: (a) the number of cells X in the state aggregation architecture is of order √(S)S_2S or greater, (b) the policy and transition function are close to deterministic, and (c) the prior for the transition function is uniformly distributed - our algorithm can guarantee, assuming we use an appropriate scoring function to measure VF error, error which is arbitrarily close to zero as S becomes large. It is able to do this despite having only O(X_2S) space complexity (and negligible time complexity). We conclude by generating a set of empirical results which support the theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2017

Count-Based Exploration in Feature Space for Reinforcement Learning

We introduce a new count-based optimistic exploration algorithm for Rein...
research
10/05/2017

Exploration in Feature Space for Reinforcement Learning

The infamous exploration-exploitation dilemma is one of the oldest and m...
research
04/22/2022

Adaptive Online Value Function Approximation with Wavelets

Using function approximation to represent a value function is necessary ...
research
04/15/2021

An L^2 Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Reinforcement learning (RL) algorithms based on high-dimensional functio...
research
10/06/2019

Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

We propose a new aggregation framework for approximate dynamic programmi...
research
08/22/2021

A Boosting Approach to Reinforcement Learning

We study efficient algorithms for reinforcement learning in Markov decis...
research
09/02/2018

Stable approximation schemes for optimal filters

We explore a general truncation scheme for the approximation of (possibl...

Please sign up or login with your details

Forgot password? Click here to reset