Efficient Inference and Exploration for Reinforcement Learning

10/12/2019
by   Yi Zhu, et al.
47

Despite an ever growing literature on reinforcement learning algorithms and applications, much less is known about their statistical inference. In this paper, we investigate the large sample behaviors of the Q-value estimates with closed-form characterizations of the asymptotic variances. This allows us to efficiently construct confidence regions for Q-value and optimal value functions, and to develop policies to minimize their estimation errors. This also leads to a policy exploration strategy that relies on estimating the relative discrepancies among the Q estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2017

Deep Exploration via Randomized Value Functions

We study the use of randomized value functions to guide deep exploration...
research
02/10/2021

Policy Augmentation: An Exploration Strategy for Faster Convergence of Deep Reinforcement Learning Algorithms

Despite advancements in deep reinforcement learning algorithms, developi...
research
06/05/2017

UCB Exploration via Q-Ensembles

We show how an ensemble of Q^*-functions can be leveraged for more effec...
research
08/08/2021

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

The recent emergence of reinforcement learning has created a demand for ...
research
06/01/2021

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

Policy-based reinforcement learning methods suffer from the policy colla...
research
11/29/2021

Dynamic Inference

Traditional statistical estimation, or statistical inference in general,...
research
10/06/2021

Residual Overfit Method of Exploration

Exploration is a crucial aspect of bandit and reinforcement learning alg...

Please sign up or login with your details

Forgot password? Click here to reset