Sampled Policy Gradient for Learning to Play the Game Agar.io

09/15/2018
by   Anton Orell Wiehe, et al.
0

In this paper, a new offline actor-critic learning algorithm is introduced: Sampled Policy Gradient (SPG). SPG samples in the action space to calculate an approximated policy gradient by using the critic to evaluate the samples. This sampling allows SPG to search the action-Q-value space more globally than deterministic policy gradient (DPG), enabling it to theoretically avoid more local optima. SPG is compared to Q-learning and the actor-critic algorithms CACLA and DPG in a pellet collection task and a self play environment in the game Agar.io. The online game Agar.io has become massively popular on the internet due to intuitive game design and the ability to instantly compete against players around the world. From the point of view of artificial intelligence this game is also very intriguing: The game has a continuous input and action space and allows to have diverse agents with complex strategies compete against each other. The experimental results show that Q-Learning and CACLA outperform a pre-programmed greedy bot in the pellet collection task, but all algorithms fail to outperform this bot in a fighting scenario. The SPG algorithm is analyzed to have great extendability through offline exploration and it matches DPG in performance even in its basic form without extensive sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2019

Investigation on the generalization of the Sampled Policy Gradient algorithm

The Sampled Policy Gradient (SPG) algorithm is a new offline actor-criti...
research
11/05/2016

Combining policy gradient and Q-learning

Policy gradient is an efficient technique for improving a policy in a re...
research
03/06/2017

Revisiting stochastic off-policy action-value gradients

Off-policy stochastic actor-critic methods rely on approximating the sto...
research
05/18/2018

Learning Permutations with Sinkhorn Policy Gradient

Many problems at the intersection of combinatorics and computer science ...
research
08/01/2022

Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning

Compared to on-policy policy gradient techniques, off-policy model-free ...
research
09/07/2019

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

This paper investigates trajectory tracking problem for a class of under...
research
01/15/2020

Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO

In this paper, a novel racing environment for OpenAI Gym is introduced. ...

Please sign up or login with your details

Forgot password? Click here to reset