Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

02/10/2021
by   Shi Dong, et al.
0

We design a simple reinforcement learning agent that, with a specification only of agent state dynamics and a reward function, can operate with some degree of competence in any environment. The agent maintains only visitation counts and value estimates for each agent-state-action pair. The value function is updated incrementally in response to temporal differences and optimistic boosts that encourage exploration. The agent executes actions that are greedy with respect to this value function. We establish a regret bound demonstrating convergence to near-optimal per-period performance, where the time taken to achieve near-optimality is polynomial in the number of agent states and actions, as well as the reward mixing time of the best policy within the reference policy class, which is comprised of those that depend on history only through agent state. Notably, there is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history. Our result sheds light on the potential benefits of (deep) representation learning, which has demonstrated the capability to extract compact and relevant features from high-dimensional interaction histories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2021

Reinforcement Learning in Reward-Mixing MDPs

Learning a near optimal policy in a partially observable system remains ...
research
04/06/2017

Geometry of Policy Improvement

We investigate the geometry of optimal memoryless time independent decis...
research
02/07/2020

Reward-Free Exploration for Reinforcement Learning

Exploration is widely regarded as one of the most challenging aspects of...
research
06/28/2020

Image Classification by Reinforcement Learning with Two-State Q-Learning

In this paper, a simple and efficient Hybrid Classifier is presented whi...
research
08/27/2019

Exploration-Enhanced POLITEX

We study algorithms for average-cost reinforcement learning problems wit...
research
11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...
research
11/01/1997

Dynamic Non-Bayesian Decision Making

The model of a non-Bayesian agent who faces a repeated game with incompl...

Please sign up or login with your details

Forgot password? Click here to reset