A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

05/11/2022
by   Archit Sharma, et al.
1

While reinforcement learning (RL) provides a framework for learning through trial and error, translating RL algorithms into the real world has remained challenging. A major hurdle to real-world application arises from the development of algorithms in an episodic setting where the environment is reset after every trial, in contrast with the continual and non-episodic nature of the real-world encountered by embodied agents such as humans and robots. Prior works have considered an alternating approach where a forward policy learns to solve the task and the backward policy learns to reset the environment, but what initial state distribution should the backward policy reset the agent to? Assuming access to a few demonstrations, we propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations. This keeps the agent close to the task-relevant states, allowing for a mix of easy and difficult starting states for the forward policy. Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks from the EARL benchmark, with 40 gains on the hardest task, while making fewer assumptions than prior works.

READ FULL TEXT
research
12/17/2021

Autonomous Reinforcement Learning: Formalism and Benchmarking

Reinforcement learning (RL) provides a naturalistic framing for learning...
research
10/17/2022

You Only Live Once: Single-Life Reinforcement Learning

Reinforcement learning algorithms are typically designed to learn a perf...
research
05/08/2023

DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing RL Safety

Deploying reinforcement learning agents in the real world can be challen...
research
06/30/2022

Denoised MDPs: Learning World Models Better Than the World Itself

The ability to separate signal from noise, and reason with clean abstrac...
research
05/05/2021

Solving Sokoban with backward reinforcement learning

In some puzzles, the strategy we need to use near the goal can be quite ...
research
02/18/2020

Empirical Policy Evaluation with Supergraphs

We devise and analyze algorithms for the empirical policy evaluation pro...
research
12/31/2011

T-Learning

Traditional Reinforcement Learning (RL) has focused on problems involvin...

Please sign up or login with your details

Forgot password? Click here to reset