MoCoDA: Model-based Counterfactual Data Augmentation

by   Silviu Pitis, et al.

The number of states in a dynamic process is exponential in the number of objects, making reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to the real world, they will need to react to and reason about unseen combinations of objects. We argue that the ability to recognize and use local factorization in transition dynamics is a key element in unlocking the power of multi-object reasoning. To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions. Knowing the local structure also allows us to predict which unseen states and actions this dynamics model will generalize to. We propose to leverage these observations in a novel Model-based Counterfactual Data Augmentation (MoCoDA) framework. MoCoDA applies a learned locally factored dynamics model to an augmented distribution of states and actions to generate counterfactual transitions for RL. MoCoDA works with a broader set of local structures than prior work and allows for direct control over the augmented training distribution. We show that MoCoDA enables RL agents to learn policies that generalize to unseen states and actions. We use MoCoDA to train an offline RL agent to solve an out-of-distribution robotics manipulation task on which standard offline RL algorithms fail.


page 1

page 2

page 3

page 4


Counterfactual Data Augmentation using Locally Factored Dynamics

Many dynamic processes, including common scenarios in robotic control an...

Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation

Reinforcement learning (RL) algorithms usually require a substantial amo...

Systematic Generalization for Predictive Control in Multivariate Time Series

Prior work has focused on evaluating the ability of neural networks to r...

S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Offline reinforcement learning (Offline RL) suffers from the innate dist...

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

Offline reinforcement learning (RL) methods strike a balance between exp...

Causal Dynamics Learning for Task-Independent State Abstraction

Learning dynamics models accurately is an important goal for Model-Based...

Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning

A backdoor attack allows a malicious user to manipulate the environment ...

Please sign up or login with your details

Forgot password? Click here to reset