Learnings Options End-to-End for Continuous Action Tasks

11/30/2017
by   Martin Klissarov, et al.
0

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2016

The Option-Critic Architecture

Temporal abstraction is key to scaling up learning and planning in reinf...
research
05/23/2019

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...
research
09/14/2017

When Waiting is not an Option : Learning Options with a Deliberation Cost

Recent work has shown that temporally extended actions (options) can be ...
research
12/04/2018

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...
research
08/06/2021

Temporally Abstract Partial Models

Humans and animals have the ability to reason and make predictions about...
research
03/23/2022

ZSM-based Management and Orchestration of 3GPP Network Slicing: An Architectural Framework and Deployment Options

Driven by closed-loop automation, the Zero-Touch Network and Services Ma...
research
07/21/2018

Safe Option-Critic: Learning Safety in the Option-Critic Architecture

Designing hierarchical reinforcement learning algorithms that induce a n...

Please sign up or login with your details

Forgot password? Click here to reset