Reducing Planning Complexity of General Reinforcement Learning with Non-Markovian Abstractions

12/26/2021
by   Sultan J. Majeed, et al.
11

The field of General Reinforcement Learning (GRL) formulates the problem of sequential decision-making from ground up. The history of interaction constitutes a "ground" state of the system, which never repeats. On the one hand, this generality allows GRL to model almost every domain possible, e.g.Bandits, MDPs, POMDPs, PSRs, and history-based environments. On the other hand, in general, the near-optimal policies in GRL are functions of complete history, which hinders not only learning but also planning in GRL. The usual way around for the planning part is that the agent is given a Markovian abstraction of the underlying process. So, it can use any MDP planning algorithm to find a near-optimal policy. The Extreme State Aggregation (ESA) framework has extended this idea to non-Markovian abstractions without compromising on the possibility of planning through a (surrogate) MDP. A distinguishing feature of ESA is that it proves an upper bound of O(ε^-A· (1-γ)^-2A) on the number of states required for the surrogate MDP (where A is the number of actions, γ is the discount-factor, and ε is the optimality-gap) which holds uniformly for all domains. While the possibility of a universal bound is quite remarkable, we show that this bound is very loose. We propose a novel non-MDP abstraction which allows for a much better upper bound of O(ε^-1· (1-γ)^-2· A · 2^A). Furthermore, we show that this bound can be improved further to O(ε^-1· (1-γ)^-2·log^3 A ) by using an action-sequentialization method.

READ FULL TEXT
research
03/15/2014

Near-optimal Reinforcement Learning in Factored MDPs

Any reinforcement learning algorithm that applies to all Markov decision...
research
07/12/2014

Extreme State Aggregation Beyond MDPs

We consider a Reinforcement Learning setup where an agent interacts with...
research
10/13/2022

A Direct Approximation of AIXI Using Logical State Abstractions

We propose a practical integration of logical state abstraction with AIX...
research
12/18/2020

Exact Reduction of Huge Action Spaces in General Reinforcement Learning

The reinforcement learning (RL) framework formalizes the notion of learn...
research
10/12/2020

Nearly Minimax Optimal Reward-free Reinforcement Learning

We study the reward-free reinforcement learning framework, which is part...
research
08/23/2018

One (more) line on the most Ancient Algorithm in History

We give a new simple and short ("one-line") analysis for the runtime of ...
research
02/29/2012

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

The use of Reinforcement Learning in real-world scenarios is strongly li...

Please sign up or login with your details

Forgot password? Click here to reset