Contextual Markov Decision Processes

02/08/2015
by   Assaf Hallak, et al.
0

We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work focuses on one basic scenario--finite horizon with a small known number of possible contexts. We suggest a family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs. Bounds are obtained for specific naive implementations, and extensions of the framework are discussed, laying the ground for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2019

Inverse Reinforcement Learning in Contextual MDPs

We consider the Inverse Reinforcement Learning (IRL) problem in Contextu...
research
04/14/2010

Mean field for Markov Decision Processes: from Discrete to Continuous Optimization

We study the convergence of Markov Decision Processes made of a large nu...
research
08/15/2013

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations

Control applications often feature tasks with similar, but not identical...
research
04/29/2011

Mean-Variance Optimization in Markov Decision Processes

We consider finite horizon Markov decision processes under performance m...
research
09/26/2013

Approximation of Lorenz-Optimal Solutions in Multiobjective Markov Decision Processes

This paper is devoted to fair optimization in Multiobjective Markov Deci...
research
04/25/2017

Sufficient Markov Decision Processes with Alternating Deep Neural Networks

Advances in mobile computing technologies have made it possible to monit...

Please sign up or login with your details

Forgot password? Click here to reset