Reinforcement Learning with History-Dependent Dynamic Contexts

02/04/2023
by   Guy Tennenholtz, et al.
0

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2020

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

In this work, we propose KeRNS: an algorithm for episodic reinforcement ...
research
12/27/2022

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

We study model-based reinforcement learning (RL) for episodic Markov dec...
research
03/14/2019

Contextual Markov Decision Processes using Generalized Linear Models

We consider the recently proposed reinforcement learning (RL) framework ...
research
11/06/2022

On learning history based policies for controlling Markov decision processes

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapprox...
research
11/03/2019

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

In order to make good decision under uncertainty an agent must learn fro...
research
02/03/2018

Adaptive Representation Selection in Contextual Bandit with Unlabeled History

We consider an extension of the contextual bandit setting, motivated by ...
research
04/20/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

The upper confidence reinforcement learning (UCRL2) strategy introduced ...

Please sign up or login with your details

Forgot password? Click here to reset