
Extreme State Aggregation Beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with...
read it

Solving Continual Combinatorial Selection via Deep Reinforcement Learning
We consider the Markov Decision Process (MDP) of selecting a subset of i...
read it

ModelBased Reinforcement Learning Exploiting StateAction Equivalence
Leveraging an equivalence property in the statespace of a Markov Decisi...
read it

Fast Reinforcement Learning with Large Action Sets using ErrorCorrecting Output Codes for MDP Factorization
The use of Reinforcement Learning in realworld scenarios is strongly li...
read it

TLearning
Traditional Reinforcement Learning (RL) has focused on problems involvin...
read it

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
In an effort to better understand the different ways in which the discou...
read it

Policy learning in SE(3) action spaces
In the spatial action representation, the action space spans the space o...
read it
Exact Reduction of Huge Action Spaces in General Reinforcement Learning
The reinforcement learning (RL) framework formalizes the notion of learning with interactions. Many realworld problems have large statespaces and/or actionspaces such as in Go, StarCraft, protein folding, and robotics or are nonMarkovian, which cause significant challenges to RL algorithms. In this work we address the large actionspace problem by sequentializing actions, which can reduce the actionspace size significantly, even down to two actions at the expense of an increased planning horizon. We provide explicit and exact constructions and equivalence proofs for all quantities of interest for arbitrary historybased processes. In the case of MDPs, this could help RL algorithms that bootstrap. In this work we show how actionbinarization in the nonMDP case can significantly improve Extreme State Aggregation (ESA) bounds. ESA allows casting any (nonMDP, nonergodic, historybased) RL problem into a fixedsized nonMarkovian statespace with the help of a surrogate Markovian process. On the upside, ESA enjoys similar optimality guarantees as Markovian models do. But a downside is that the size of the aggregated statespace becomes exponential in the size of the actionspace. In this work, we patch this issue by binarizing the actionspace. We provide an upper bound on the number of states of this binarized ESA that is logarithmic in the original actionspace size, a doubleexponential improvement.
READ FULL TEXT
Comments
There are no comments yet.