Continuous MDP Homomorphisms and Homomorphic Policy Gradient

09/15/2022
by   Sahand Rezaei-Shoshtari, et al.
5

Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.

READ FULL TEXT

page 23

page 25

page 28

page 30

research
05/09/2023

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Reinforcement learning on high-dimensional and complex problems relies o...
research
11/22/2018

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement lea...
research
09/14/2022

A Simple Approach for State-Action Abstraction using a Learned MDP Homomorphism

Animals are able to rapidly infer from limited experience when sets of s...
research
11/16/2021

Off-Policy Actor-Critic with Emphatic Weightings

A variety of theoretically-sound policy gradient algorithms exist for th...
research
05/06/2022

Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization

We extend the idea underlying the success of green simulation assisted p...
research
07/20/2023

Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment

We leverage the fast physics simulator, MuJoCo to run tasks in a continu...
research
06/04/2020

Policy Learning of MDPs with Mixed Continuous/Discrete Variables: A Case Study on Model-Free Control of Markovian Jump Systems

Markovian jump linear systems (MJLS) are an important class of dynamical...

Please sign up or login with your details

Forgot password? Click here to reset