Hierarchical Average Reward Policy Gradient Algorithms

11/20/2019
by   Akshay Dharmavaram, et al.
0

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an ordinary differential equation based approach for our convergence analysis and prove that the parameters of the intra-option policies, termination functions, and value functions, converge to their corresponding optimal values, with probability one. Finally, we illustrate the competitive advantage of learning options, in the average reward setting, on a grid-world environment with sparse rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2019

On the Role of Weight Sharing During Deep Option Learning

The options framework is a popular approach for building temporally exte...
research
12/11/2017

The Eigenoption-Critic Framework

Eigenoptions (EOs) have been recently introduced as a promising idea for...
research
10/27/2018

Learning Abstract Options

Building systems that autonomously create temporal abstractions from dat...
research
10/23/2020

Learning Guidance Rewards with Trajectory-space Smoothing

Long-term temporal credit assignment is an important challenge in deep r...
research
10/26/2021

Average-Reward Learning and Planning with Options

We extend the options framework for temporal abstraction in reinforcemen...
research
12/04/2018

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...
research
04/05/2021

A Dual-Critic Reinforcement Learning Framework for Frame-level Bit Allocation in HEVC/H.265

This paper introduces a dual-critic reinforcement learning (RL) framewor...

Please sign up or login with your details

Forgot password? Click here to reset