Parameterized MDPs and Reinforcement Learning Problems – A Maximum Entropy Principle Based Framework

06/17/2020
by   Amber Srivastava, et al.
0

We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.

READ FULL TEXT
research
09/19/2017

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

In this paper, a sparse Markov decision process (MDP) with novel causal ...
research
12/31/2021

Robust Entropy-regularized Markov Decision Processes

Stochastic and soft optimal policies resulting from entropy-regularized ...
research
06/06/2021

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

In the predict-then-optimize framework, the objective is to train a pred...
research
01/31/2019

Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning

In this paper, we present a new class of Markov decision processes (MDPs...
research
02/10/2018

Path Consistency Learning in Tsallis Entropy Regularized MDPs

We study the sparse entropy-regularized reinforcement learning (ERL) pro...
research
06/29/2014

Thompson Sampling for Learning Parameterized Markov Decision Processes

We consider reinforcement learning in parameterized Markov Decision Proc...
research
08/19/2021

Smoother Entropy for Active State Trajectory Estimation and Obfuscation in POMDPs

We study the problem of controlling a partially observed Markov decision...

Please sign up or login with your details

Forgot password? Click here to reset