Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

04/07/2016
by   Jordi Grau-Moya, et al.
0

Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.

READ FULL TEXT
research
02/06/2013

Fast Value Iteration for Goal-Directed Markov Decision Processes

Planning problems where effects of actions are non-deterministic can be ...
research
12/28/2020

Blackwell Online Learning for Markov Decision Processes

This work provides a novel interpretation of Markov Decision Processes (...
research
06/08/2021

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Training models that perform well under distribution shifts is a central...
research
04/07/2018

Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

Partially Observable Markov Decision Processes (POMDPs) offer an elegant...
research
02/27/2021

CP-MDP: A CANDECOMP-PARAFAC Decomposition Approach to Solve a Markov Decision Process Multidimensional Problem

Markov Decision Process (MDP) is the underlying model for optimal planni...
research
10/25/2020

XLVIN: eXecuted Latent Value Iteration Nets

Value Iteration Networks (VINs) have emerged as a popular method to inco...
research
03/27/2013

Decision-Theoretic Control of Problem Solving: Principles and Architecture

This paper presents an approach to the design of autonomous, real-time s...

Please sign up or login with your details

Forgot password? Click here to reset