Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement Learning

04/13/2021
by   Takato Okudo, et al.
0

Reinforcement learning, which acquires a policy maximizing long-term rewards, has been actively studied. Unfortunately, this learning type is too slow and difficult to use in practical situations because the state-action space becomes huge in real environments. Many studies have incorporated human knowledge into reinforcement Learning. Though human knowledge on trajectories is often used, a human could be asked to control an AI agent, which can be difficult. Knowledge on subgoals may lessen this requirement because humans need only to consider a few representative states on an optimal trajectory in their minds. The essential factor for learning efficiency is rewards. Potential-based reward shaping is a basic method for enriching rewards. However, it is often difficult to incorporate subgoals for accelerating learning over potential-based reward shaping. This is because the appropriate potentials are not intuitive for humans. We extend potential-based reward shaping and propose a subgoal-based reward shaping. The method makes it easier for human trainers to share their knowledge of subgoals. To evaluate our method, we obtained a subgoal series from participants and conducted experiments in three domains, four-rooms(discrete states and discrete actions), pinball(continuous and discrete), and picking(both continuous). We compared our method with a baseline reinforcement learning algorithm and other subgoal-based methods, including random subgoal and naive subgoal-based reward shaping. As a result, we found out that our reward shaping outperformed all other methods in learning efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 8

page 10

page 11

page 12

research
04/13/2021

Reward Shaping with Dynamic Trajectory Aggregation

Reinforcement learning, which acquires a policy maximizing long-term rew...
research
01/10/2023

Mastering Diverse Domains through World Models

General intelligence requires solving tasks across many domains. Current...
research
08/09/2022

On the Importance of Critical Period in Multi-stage Reinforcement Learning

The initial years of an infant's life are known as the critical period, ...
research
04/02/2018

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

In many environments only a tiny subset of all states yield high reward....
research
12/05/2019

Learning Human Objectives by Evaluating Hypothetical Behavior

We seek to align agent behavior with a user's objectives in a reinforcem...
research
06/21/2019

Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Drawing an inspiration from behavioral studies of human decision making,...
research
03/11/2021

Generalizable Episodic Memory for Deep Reinforcement Learning

Episodic memory-based methods can rapidly latch onto past successful str...

Please sign up or login with your details

Forgot password? Click here to reset