
Exact Reduction of Huge Action Spaces in General Reinforcement Learning
The reinforcement learning (RL) framework formalizes the notion of learn...
read it

Solving Continual Combinatorial Selection via Deep Reinforcement Learning
We consider the Markov Decision Process (MDP) of selecting a subset of i...
read it

VarianceAware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
The problem of reinforcement learning in an unknown and discrete Markov ...
read it

Improved Exploration in Factored AverageReward MDPs
We consider a regret minimization task under the averagereward criterio...
read it

OracleEfficient Reinforcement Learning in Factored MDPs with Unknown Structure
We consider provablyefficient reinforcement learning (RL) in nonepisod...
read it

Exploration in Structured Reinforcement Learning
We address reinforcement learning problems with finite state and action ...
read it

Learning Good State and Action Representations via Tensor Decomposition
The transition kernel of a continuousstateaction Markov decision proce...
read it
ModelBased Reinforcement Learning Exploiting StateAction Equivalence
Leveraging an equivalence property in the statespace of a Markov Decision Process (MDP) has been investigated in several studies. This paper studies equivalence structure in the reinforcement learning (RL) setup, where transition distributions are no longer assumed to be known. We present a notion of similarity between transition probabilities of various stateaction pairs of an MDP, which naturally defines an equivalence structure in the stateaction space. We present equivalenceaware confidence sets for the case where the learner knows the underlying structure in advance. These sets are provably smaller than their corresponding equivalenceoblivious counterparts. In the more challenging case of an unknown equivalence structure, we present an algorithm called ApproxEquivalence that seeks to find an (approximate) equivalence structure, and define confidence sets using the approximate equivalence. To illustrate the efficacy of the presented confidence sets, we present CUCRL, as a natural modification of UCRL2 for RL in undiscounted MDPs. In the case of a known equivalence structure, we show that CUCRL improves over UCRL2 in terms of regret by a factor of √(SA/C), in any communicating MDP with S states, A actions, and C classes, which corresponds to a massive improvement when C ≪ SA. To the best of our knowledge, this is the first work providing regret bounds for RL when an equivalence structure in the MDP is efficiently exploited. In the case of an unknown equivalence structure, we show through numerical experiments that CUCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.
READ FULL TEXT
Comments
There are no comments yet.