A Theoretical Connection Between Statistical Physics and Reinforcement Learning

06/24/2019
by   Jad Rahme, et al.
6

Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function Z, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and Q-functions can be derived from this partition function and interpreted via average energies, the Z-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for Z is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these Z-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2020

State Action Separable Reinforcement Learning

Reinforcement Learning (RL) based methods have seen their paramount succ...
research
09/11/2019

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Reinforcement Learning (RL) has emerged as an efficient method of choice...
research
02/28/2021

Accelerated Jarzynski Estimator with Deterministic Virtual Trajectories

The Jarzynski estimator is a powerful tool that uses nonequilibrium stat...
research
06/17/2023

FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning – A Physics-Constrained Approach to Markov Decision Processes

Inverse Reinforcement Learning (IRL) is a compelling technique for revea...
research
06/05/2017

A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

Non-stationary domains, that change in unpredicted ways, are a challenge...
research
07/10/2023

Dynamics of Temporal Difference Reinforcement Learning

Reinforcement learning has been successful across several applications i...
research
05/31/2017

Sequential Dynamic Decision Making with Deep Neural Nets on a Test-Time Budget

Deep neural network (DNN) based approaches hold significant potential fo...

Please sign up or login with your details

Forgot password? Click here to reset