Model Learning for Look-ahead Exploration in Continuous Control

11/20/2018
by   Arpit Agarwal, et al.
2

We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies . Our skills are multi-goal policies learned in isolation in simpler environments using existing multigoal RL formulations, analogous to options or macroactions. Coarse skill dynamics, i.e., the state transition caused by a (complete) skill execution, are learnt and are unrolled forward during lookahead search. Policy search benefits from temporal abstraction during exploration, though itself operates over low-level primitive actions, and thus the resulting policies does not suffer from suboptimality and inflexibility caused by coarse skill chaining. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parametrized skills as building blocks of the policy itself, as opposed to guiding exploration. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parameterized skills as building blocks of the policy itself, as opposed to guiding exploration.

READ FULL TEXT

page 2

page 6

page 9

page 11

research
11/04/2022

Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Skill-based reinforcement learning (RL) has emerged as a promising strat...
research
06/11/2015

Bootstrapping Skills

The monolithic approach to policy representation in Markov Decision Proc...
research
06/14/2023

Skill-Critic: Refining Learned Skills for Reinforcement Learning

Hierarchical reinforcement learning (RL) can accelerate long-horizon dec...
research
02/10/2022

SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition

Though many reinforcement learning (RL) problems involve learning polici...
research
05/27/2022

Non-Markovian policies occupancy measures

A central object of study in Reinforcement Learning (RL) is the Markovia...
research
11/24/2022

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

The ability to effectively reuse prior knowledge is a key requirement wh...
research
01/27/2022

Rethinking Learning Dynamics in RL using Adversarial Networks

We present a learning mechanism for reinforcement learning of closely re...

Please sign up or login with your details

Forgot password? Click here to reset