Learning from Trajectories via Subgoal Discovery

11/03/2019
by   Sujoy Paul, et al.
0

Learning to solve complex goal-oriented tasks with sparse terminal-only rewards often requires an enormous number of samples. In such cases, using a set of expert trajectories could help to learn faster. However, Imitation Learning (IL) via supervised pre-training with these trajectories may not perform as well and generally requires additional finetuning with expert-in-the-loop. In this paper, we propose an approach which uses the expert trajectories and learns to decompose the complex main task into smaller sub-goals. We learn a function which partitions the state-space into sub-goals, which can then be used to design an extrinsic reward function. We follow a strategy where the agent first learns from the trajectories using IL and then switches to Reinforcement Learning (RL) using the identified sub-goals, to alleviate the errors in the IL step. To deal with states which are under-represented by the trajectory set, we also learn a function to modulate the sub-goal predictions. We show that our method is able to solve complex goal-oriented tasks, which other RL, IL or their combinations in literature are not able to solve.

READ FULL TEXT
research
04/06/2019

Reinforced Imitation in Heterogeneous Action Space

Imitation learning is an effective alternative approach to learn a polic...
research
05/17/2023

Goal-Conditioned Supervised Learning with Sub-Goal Prediction

Recently, a simple yet effective algorithm – goal-conditioned supervised...
research
05/28/2022

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

Combinatorial optimisation problems framed as mixed integer linear progr...
research
02/13/2023

Imitation from Observation With Bootstrapped Contrastive Learning

Imitation from observation (IfO) is a learning paradigm that consists of...
research
06/24/2023

Learning from Pixels with Expert Observations

In reinforcement learning (RL), sparse rewards can present a significant...
research
11/28/2018

Trajectory-based Learning for Ball-in-Maze Games

Deep Reinforcement Learning has shown tremendous success in solving seve...
research
12/04/2019

Learning from Interventions using Hierarchical Policies for Safe Learning

Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well ...

Please sign up or login with your details

Forgot password? Click here to reset