Adapting the Exploration Rate for Value-of-Information-Based Reinforcement Learning

12/20/2022
by   Isaac J. Sledge, et al.
0

In this paper, we consider the problem of adjusting the exploration rate when using value-of-information-based exploration. We do this by converting the value-of-information optimization into a problem of finding equilibria of a flow for a changing exploration rate. We then develop an efficient path-following scheme for converging to these equilibria and hence uncovering optimal action-selection policies. Under this scheme, the exploration rate is automatically adapted according to the agent's experiences. Global convergence is theoretically assured. We first evaluate our exploration-rate adaptation on the Nintendo GameBoy games Centipede and Millipede. We demonstrate aspects of the search process, like that it yields a hierarchy of state abstractions. We also show that our approach returns better policies in fewer episodes than conventional search strategies relying on heuristic, annealing-based exploration-rate adjustments. We then illustrate that these trends hold for deep, value-of-information-based agents that learn to play ten simple games and over forty more complicated games for the Nintendo GameBoy system. Performance either near or well above the level of human play is observed.

READ FULL TEXT

page 6

page 8

page 12

page 13

page 29

page 36

page 41

page 42

research
03/16/2023

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Efficient exploration is critical in cooperative deep Multi-Agent Reinfo...
research
08/06/2019

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

This paper provides an empirical evaluation of recently developed explor...
research
12/20/2022

Anticipatory Fictitious Play

Fictitious play is an algorithm for computing Nash equilibria of matrix ...
research
07/03/2015

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

Achieving efficient and scalable exploration in complex domains poses a ...
research
03/23/2022

Learning Efficient Exploration through Human Seeded Rapidly-exploring Random Trees

Modern day computer games have extremely large state and action spaces. ...
research
05/27/2019

Learning Policies from Human Data for Skat

Decision-making in large imperfect information games is difficult. Thank...
research
10/06/2021

No-Press Diplomacy from Scratch

Prior AI successes in complex games have largely focused on settings wit...

Please sign up or login with your details

Forgot password? Click here to reset