Policy Dispersion in Non-Markovian Environment

02/28/2023
by   Bohao Qu, et al.
0

Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and action. However, a reward sometimes depends on the history of states and actions, which may result in the decision process in a non-Markovian environment. In such environments, agents receive rewards via temporally-extended behaviors sparsely, and the learned policies may be similar. This leads the agents acquired with similar policies generally overfit to the given task and can not quickly adapt to perturbations of environments. To resolve this problem, this paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment, in which a policy dispersion scheme is designed for seeking diverse policy representation. Specifically, we first adopt a transformer-based method to learn policy embeddings. Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies. Finally, we prove that if the dispersion matrix is positive definite, the dispersed embeddings can effectively enlarge the disagreements across policies, yielding a diverse expression for the original policy embedding distribution. Experimental results show that this dispersion scheme can obtain more expressive diverse policies, which then derive more robust performance than recent learning baselines under various learning environments.

READ FULL TEXT

page 4

page 7

page 14

research
07/05/2017

Learning to Design Games: Strategic Environments in Deep Reinforcement Learning

In typical reinforcement learning (RL), the environment is assumed given...
research
07/09/2021

Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes

The success of reinforcement learning in typical settings is, in part, p...
research
06/12/2019

Fast Task Inference with Variational Intrinsic Successor Features

It has been established that diverse behaviors spanning the controllable...
research
08/28/2023

Context-Aware Composition of Agent Policies by Markov Decision Process Entity Embeddings and Agent Ensembles

Computational agents support humans in many areas of life and are theref...
research
01/08/2023

Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Many real-world reinforcement learning (RL) problems necessitate learnin...
research
12/18/2021

Exploiting Expert-guided Symmetry Detection in Markov Decision Processes

Offline estimation of the dynamical model of a Markov Decision Process (...
research
11/14/2018

Incentivizing Exploration with Unbiased Histories

In a social learning setting, there is a set of actions, each of which h...

Please sign up or login with your details

Forgot password? Click here to reset