Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

12/23/2019
by   Tian Tan, et al.
0

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2017

Deep Exploration via Randomized Value Functions

We study the use of randomized value functions to guide deep exploration...
research
06/06/2013

Direct Uncertainty Estimation in Reinforcement Learning

Optimal probabilistic approach in reinforcement learning is computationa...
research
06/15/2021

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

We propose a model-free reinforcement learning algorithm inspired by the...
research
05/23/2018

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

We consider a team of reinforcement learning agents that concurrently op...
research
12/19/2019

Uncertainty-sensitive Learning and Planning with Ensembles

We propose a reinforcement learning framework for discrete environments ...
research
11/17/2020

Leveraging the Variance of Return Sequences for Exploration Policy

This paper introduces a method for constructing an upper bound for explo...
research
01/23/2019

Trust Region Value Optimization using Kalman Filtering

Policy evaluation is a key process in reinforcement learning. It assesse...

Please sign up or login with your details

Forgot password? Click here to reset