Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

07/18/2013
by   Zheng Wen, et al.
0

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function lies within a given hypothesis class, OCP selects optimal actions over all but at most K episodes, where K is the eluder dimension of the given hypothesis class. We establish further efficiency and asymptotic performance guarantees that apply even if the true value function does not lie in the given hypothesis class, for the special case where the hypothesis class is the span of pre-specified indicator functions over disjoint sets. We also discuss the computational complexity of OCP and present computational results involving two illustrative examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

Provably Efficient Reinforcement Learning with General Value Function Approximation

Value function approximation has demonstrated phenomenal empirical succe...
research
09/09/2019

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

We explore fixed-horizon temporal difference (TD) methods, reinforcement...
research
07/20/2021

Reinforcement learning autonomously identifying the source of errors for agents in a group mission

When agents are swarmed to carry out a mission, there is often a sudden ...
research
01/14/2020

Unsupervised Learning of the Set of Local Maxima

This paper describes a new form of unsupervised learning, whose input is...
research
06/27/2012

Statistical Linear Estimation with Penalized Estimators: an Application to Reinforcement Learning

Motivated by value function estimation in reinforcement learning, we stu...
research
12/09/2020

Semi-Supervised Off Policy Reinforcement Learning

Reinforcement learning (RL) has shown great success in estimating sequen...
research
08/18/2022

Hybrid Learning with New Value Function for the Maximum Common Subgraph Problem

Maximum Common induced Subgraph (MCS) is an important NP-hard problem wi...

Please sign up or login with your details

Forgot password? Click here to reset