Reward Shaping Using Convolutional Neural Network

10/30/2022
by   Hani Sami, et al.
0

In this paper, we propose Value Iteration Network for Reward Shaping (VIN-RS), a potential-based reward shaping mechanism using Convolutional Neural Network (CNN). The proposed VIN-RS embeds a CNN trained on computed labels using the message passing mechanism of the Hidden Markov Model. The CNN processes images or graphs of the environment to predict the shaping values. Recent work on reward shaping still has limitations towards training on a representation of the Markov Decision Process (MDP) and building an estimate of the transition matrix. The advantage of VIN-RS is to construct an effective potential function from an estimated MDP while automatically inferring the environment transition matrix. The proposed VIN-RS estimates the transition matrix through a self-learned convolution filter while extracting environment details from the input frames or sampled graphs. Due to (1) the previous success of using message passing for reward shaping; and (2) the CNN planning behavior, we use these messages to train the CNN of VIN-RS. Experiments are performed on tabular games, Atari 2600 and MuJoCo, for discrete and continuous action space. Our results illustrate promising improvements in the learning speed and maximum cumulative reward compared to the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2017

Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

We analyze linear regression problem with a nonconvex regularization cal...
research
06/17/2023

FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning – A Physics-Constrained Approach to Markov Decision Processes

Inverse Reinforcement Learning (IRL) is a compelling technique for revea...
research
11/19/2021

Expert-Guided Symmetry Detection in Markov Decision Processes

Learning a Markov Decision Process (MDP) from a fixed batch of trajector...
research
11/02/2016

CRF-CNN: Modeling Structured Information in Human Pose Estimation

Deep convolutional neural networks (CNN) have achieved great success. On...
research
07/14/2021

Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks

In high-dimensional state spaces, the usefulness of Reinforcement Learni...
research
10/06/2020

Reward Propagation Using Graph Convolutional Networks

Potential-based reward shaping provides an approach for designing good r...
research
06/05/2018

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

In this paper, we tackle the singing voice phoneme segmentation problem ...

Please sign up or login with your details

Forgot password? Click here to reset