PAC Reinforcement Learning with Rich Observations

02/08/2016
by   Akshay Krishnamurthy, et al.
0

We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation spaces and general POMDPs, we focus on problems that can be summarized by a small number of hidden states and have long-term rewards that are predictable by a reactive function class. In this setting, we design and analyze a new reinforcement learning algorithm, Least Squares Value Elimination by Exploration. We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2016

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

This paper studies systematic exploration for reinforcement learning wit...
research
03/15/2020

Provably Efficient Exploration for RL with Unsupervised Learning

We study how to use unsupervised learning for efficient exploration in r...
research
03/01/2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations

We study the computational tractability of provably sample-efficient (PA...
research
11/20/2019

Corruption Robust Exploration in Episodic Reinforcement Learning

We initiate the study of multi-stage episodic reinforcement learning und...
research
02/07/2020

Causally Correct Partial Models for Reinforcement Learning

In reinforcement learning, we can learn a model of future observations a...
research
02/28/2017

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Conventional reinforcement learning methods for Markov decision processe...
research
03/07/2022

Influencing Long-Term Behavior in Multiagent Reinforcement Learning

The main challenge of multiagent reinforcement learning is the difficult...

Please sign up or login with your details

Forgot password? Click here to reset