Expert-Guided Symmetry Detection in Markov Decision Processes

by   Giorgio Angelotti, et al.

Learning a Markov Decision Process (MDP) from a fixed batch of trajectories is a non-trivial task whose outcome's quality depends on both the amount and the diversity of the sampled regions of the state-action space. Yet, many MDPs are endowed with invariant reward and transition functions with respect to some transformations of the current state and action. Being able to detect and exploit these structures could benefit not only the learning of the MDP but also the computation of its subsequent optimal control policy. In this work we propose a paradigm, based on Density Estimation methods, that aims to detect the presence of some already supposed transformations of the state-action space for which the MDP dynamics is invariant. We tested the proposed approach in a discrete toroidal grid environment and in two notorious environments of OpenAI's Gym Learning Suite. The results demonstrate that the model distributional shift is reduced when the dataset is augmented with the data obtained by using the detected symmetries, allowing for a more thorough and data-efficient learning of the transition functions.



There are no comments yet.


page 1

page 2

page 3

page 4


Exploiting Expert-guided Symmetry Detection in Markov Decision Processes

Offline estimation of the dynamical model of a Markov Decision Process (...

Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

A popular approach to solving a decision process with non-Markovian rewa...

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations

Control applications often feature tasks with similar, but not identical...

Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation

This paper proposes a formal approach to learning and planning for agent...

A Reinforcement Learning Environment for Polyhedral Optimizations

The polyhedral model allows a structured way of defining semantics-prese...

Learning Good State and Action Representations via Tensor Decomposition

The transition kernel of a continuous-state-action Markov decision proce...

Exploitation vs Caution: Risk-sensitive Policies for Offline Learning

Offline model learning for planning is a branch of machine learning that...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.