Efficient Inference in Markov Control Problems

02/14/2012
by   Thomas Furmston, et al.
0

Markov control algorithms that perform smooth, non-greedy updates of the policy have been shown to be very general and versatile, with policy gradient and Expectation Maximisation algorithms being particularly popular. For these algorithms, marginal inference of the reward weighted trajectory distribution is required to perform policy updates. We discuss a new exact inference algorithm for these marginals in the finite horizon case that is more efficient than the standard approach based on classical forward-backward recursions. We also provide a principled extension to infinite horizon Markov Decision Problems that explicitly accounts for an infinite horizon. This extension provides a novel algorithm for both policy gradients and Expectation Maximisation in infinite horizon problems.

READ FULL TEXT
research
09/05/2023

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

In this paper, we consider an infinite horizon average reward Markov Dec...
research
10/10/2022

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

The infinite horizon setting is widely adopted for problems of reinforce...
research
08/22/2022

Deriving time-averaged active inference from control principles

Active inference offers a principled account of behavior as minimizing a...
research
06/03/2011

Experiments with Infinite-Horizon, Policy-Gradient Estimation

In this paper, we present algorithms that perform gradient ascent of the...
research
10/19/2021

Planning for Package Deliveries in Risky Environments Over Multiple Epochs

We study a risk-aware robot planning problem where a dispatcher must con...
research
08/31/2022

Partial Counterfactual Identification for Infinite Horizon Partially Observable Markov Decision Process

This paper investigates the problem of bounding possible output from a c...
research
10/14/2021

The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

We consider the problem of finding the best memoryless stochastic policy...

Please sign up or login with your details

Forgot password? Click here to reset