PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

by   Anish Agarwal, et al.

We consider offline reinforcement learning (RL) with heterogeneous agents under severe data scarcity, i.e., we only observe a single historical trajectory for every agent under an unknown, potentially sub-optimal policy. We find that the performance of state-of-the-art offline and model-based RL methods degrade significantly given such limited data availability, even for commonly perceived "solved" benchmark settings such as "MountainCar" and "CartPole". To address this challenge, we propose a model-based offline RL approach, called PerSim, where we first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy. We do so by positing that the transition dynamics across agents can be represented as a latent function of latent factors associated with agents, states, and actions; subsequently, we theoretically establish that this function is well-approximated by a "low-rank" decomposition of separable agent, state, and action latent functions. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.We perform extensive experiments across several benchmark environments and RL methods. The consistent improvement of our approach, measured in terms of state dynamics prediction and eventual reward, confirms the efficacy of our framework in leveraging limited historical data to simultaneously learn personalized policies across agents.


page 1

page 2

page 3

page 4


MOReL : Model-Based Offline Reinforcement Learning

In offline reinforcement learning (RL), the goal is to learn a successfu...

Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

In many real-world applications, collecting large and high-quality datas...

ENTROPY: Environment Transformer and Offline Policy Optimization

Model-based methods provide an effective approach to offline reinforceme...

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

We present a novel approach to address the challenge of generalization i...

Variational Latent Branching Model for Off-Policy Evaluation

Model-based methods have recently shown great potential for off-policy e...

Cluster-Based Social Reinforcement Learning

Social Reinforcement Learning methods, which model agents in large netwo...

Data-Driven Reinforcement Learning for Virtual Character Animation Control

Virtual character animation control is a problem for which Reinforcement...

Please sign up or login with your details

Forgot password? Click here to reset