Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

06/04/2023
by   Miguel Suau, et al.
0

Reinforcement learning agents may sometimes develop habits that are effective only when specific policies are followed. After an initial exploration phase in which agents try out different actions, they eventually converge toward a particular policy. When this occurs, the distribution of state-action trajectories becomes narrower, and agents start experiencing the same transitions again and again. At this point, spurious correlations may arise. Agents may then pick up on these correlations and learn state representations that do not generalize beyond the agent's trajectory distribution. In this paper, we provide a mathematical characterization of this phenomenon, which we refer to as policy confounding, and show, through a series of examples, when and how it occurs in practice.

READ FULL TEXT
research
06/10/2021

Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions

In multi-agent reinforcement learning, the inherent non-stationarity of ...
research
03/11/2021

Analyzing the Hidden Activations of Deep Policy Networks: Why Representation Matters

We analyze the hidden activations of neural network policies of deep rei...
research
04/06/2022

Federated Reinforcement Learning with Environment Heterogeneity

We study a Federated Reinforcement Learning (FedRL) problem in which n a...
research
07/15/2022

Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement Learning

Deep Reinforcement Learning (RL) agents often overfit the training envir...
research
01/21/2022

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

The increased complexity of state-of-the-art reinforcement learning (RL)...
research
01/09/2018

A Deterministic Protocol for Sequential Asymptotic Learning

In the classic herding model, agents receive private signals about an un...
research
04/26/2023

Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

In this paper, we define, evaluate, and improve the “relay-generalizatio...

Please sign up or login with your details

Forgot password? Click here to reset