Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

06/03/2019
by   Harm van Seijen, et al.
0

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis, which identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communi...
research
03/31/2020

Exploration in Action Space

Parameter space exploration methods with black-box optimization have rec...
research
01/10/2013

Planning by Prioritized Sweeping with Small Backups

Efficient planning plays a crucial role in model-based reinforcement lea...
research
03/07/2022

Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive Reinforcement Learning

In this paper, we study gap-dependent regret guarantees for risk-sensiti...
research
12/06/2021

Hierarchical Reinforcement Learning with Timed Subgoals

Hierarchical reinforcement learning (HRL) holds great potential for samp...
research
06/13/2021

A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems

This paper proposes a novel fuzzy action selection method to leverage hu...

Please sign up or login with your details

Forgot password? Click here to reset