Reward Shaping via Diffusion Process in Reinforcement Learning

06/20/2023
by   Peeyush Kumar, et al.
0

Reinforcement Learning (RL) models have continually evolved to navigate the exploration - exploitation trade-off in uncertain Markov Decision Processes (MDPs). In this study, I leverage the principles of stochastic thermodynamics and system dynamics to explore reward shaping via diffusion processes. This provides an elegant framework as a way to think about exploration-exploitation trade-off. This article sheds light on relationships between information entropy, stochastic system dynamics, and their influences on entropy production. This exploration allows us to construct a dual-pronged framework that can be interpreted as either a maximum entropy program for deriving efficient policies or a modified cost optimization program accounting for informational costs and benefits. This work presents a novel perspective on the physical nature of information and its implications for online learning in MDPs, consequently providing a better understanding of information-oriented formulations in RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2021

A Max-Min Entropy Framework for Reinforcement Learning

In this paper, we propose a max-min entropy framework for reinforcement ...
research
02/27/2020

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes (MD...
research
08/22/2019

Opponent Aware Reinforcement Learning

In several reinforcement learning (RL) scenarios such as security settin...
research
05/31/2022

One Policy is Enough: Parallel Exploration with a Single Policy is Minimax Optimal for Reward-Free Reinforcement Learning

While parallelism has been extensively used in Reinforcement Learning (R...
research
12/04/2018

Exploration versus exploitation in reinforcement learning: a stochastic control approach

We consider reinforcement learning (RL) in continuous time and study the...
research
09/15/2021

Balancing detectability and performance of attacks on the control channel of Markov Decision Processes

We investigate the problem of designing optimal stealthy poisoning attac...
research
05/23/2022

Learning to branch with Tree MDPs

State-of-the-art Mixed Integer Linear Program (MILP) solvers combine sys...

Please sign up or login with your details

Forgot password? Click here to reset