Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

07/04/2018
by   Alexandre Laterre, et al.
2

Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimization problems, such as the traveling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional bin packing problem show that it outperforms generic Monte Carlo tree search, heuristic algorithms and reinforcement learning algorithms not using ranked rewards.

READ FULL TEXT

page 4

page 9

research
03/08/2019

Learning Self-Game-Play Agents for Combinatorial Optimization Problems

Recent progress in reinforcement learning (RL) using self-game-play has ...
research
06/14/2020

Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning

Morpion Solitaire is a popular single player game, performed with paper ...
research
06/07/2023

Policy-Based Self-Competition for Planning Problems

AlphaZero-type algorithms may stop improving on single-player tasks in c...
research
02/11/2020

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Quantum hardware and quantum-inspired algorithms are becoming increasing...
research
05/18/2018

Solving the Rubik's Cube Without Human Knowledge

A generally intelligent agent must be able to teach itself how to solve ...
research
05/06/2023

A Novel Reward Shaping Function for Single-Player Mahjong

Mahjong is a complex game with an intractably large state space with ext...
research
04/26/2020

Warm-Start AlphaZero Self-Play Search Enhancements

Recently, AlphaZero has achieved landmark results in deep reinforcement ...

Please sign up or login with your details

Forgot password? Click here to reset