Adapting the Function Approximation Architecture in Online Reinforcement Learning

06/17/2021
by   John D. Martin, et al.
7

The performance of a reinforcement learning (RL) system depends on the computational architecture used to approximate a value function. Deep learning methods provide both optimization techniques and architectures for approximating nonlinear functions from noisy, high-dimensional observations. However, prevailing optimization techniques are not designed for strictly-incremental online updates. Nor are standard architectures designed for observations with an a priori unknown structure: for example, light sensors randomly dispersed in space. This paper proposes an online RL prediction algorithm with an adaptive architecture that efficiently finds useful nonlinear features. The algorithm is evaluated in a spatial domain with high-dimensional, stochastic observations. The algorithm outperforms non-adaptive baseline architectures and approaches the performance of an architecture given side-channel information. These results are a step towards scalable RL algorithms for more general problems, where the observation structure is not available.

READ FULL TEXT

page 4

page 5

research
01/16/2019

Representation Learning on Graphs: A Reinforcement Learning Application

In this work, we study value function approximation in reinforcement lea...
research
10/30/2010

Predictive State Temporal Difference Learning

We propose a new approach to value function approximation which combines...
research
06/25/2017

Count-Based Exploration in Feature Space for Reinforcement Learning

We introduce a new count-based optimistic exploration algorithm for Rein...
research
02/26/2019

Diagnosing Bottlenecks in Deep Q-learning Algorithms

Q-learning methods represent a commonly used class of algorithms in rein...
research
05/16/2023

Coagent Networks: Generalized and Scaled

Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011...
research
03/14/2016

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

High-dimensional observations and complex real-world dynamics present ma...
research
08/08/2017

Demixing Structured Superposition Signals from Periodic and Aperiodic Nonlinear Observations

We consider the demixing problem of two (or more) structured high-dimens...

Please sign up or login with your details

Forgot password? Click here to reset