Angrier Birds: Bayesian reinforcement learning

01/06/2016
by   Imanol Arrieta Ibarra, et al.
0

We train a reinforcement learner to play a simplified version of the game Angry Birds. The learner is provided with a game state in a manner similar to the output that could be produced by computer vision algorithms. We improve on the efficiency of regular ϵ-greedy Q-Learning with linear function approximation through more systematic exploration in Randomized Least Squares Value Iteration (RLSVI), an algorithm that samples its policy from a posterior distribution on optimal policies. With larger state-action spaces, efficient exploration becomes increasingly important, as evidenced by the faster learning in RLSVI.

READ FULL TEXT

page 1

page 2

page 4

research
04/06/2019

Randomised Bayesian Least-Squares Policy Iteration

We introduce Bayesian least-squares policy iteration (BLSPI), an off-pol...
research
02/04/2014

Generalization and Exploration via Randomized Value Functions

We propose randomized least-squares value iteration (RLSVI) -- a new rei...
research
09/10/2016

Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

We consider scenarios from the real-time strategy game StarCraft as new ...
research
05/08/2013

Cover Tree Bayesian Reinforcement Learning

This paper proposes an online tree-based Bayesian approach for reinforce...
research
02/15/2016

Deep Exploration via Bootstrapped DQN

Efficient exploration in complex environments remains a major challenge ...
research
02/23/2023

Targeted Search Control in AlphaZero for Effective Policy Improvement

AlphaZero is a self-play reinforcement learning algorithm that achieves ...
research
02/08/2020

Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions

Bayesian reinforcement learning (BRL) offers a decision-theoretic soluti...

Please sign up or login with your details

Forgot password? Click here to reset