Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies

07/10/2023
by   Rubens O. Moraes, et al.
0

This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2022

What can we Learn Even From the Weakest? Learning Sketches for Programmatic Strategies

In this paper we show that behavioral cloning can be used to learn effec...
research
11/06/2018

Deep Reinforcement Learning for Green Security Games with Real-Time Information

Green Security Games (GSGs) have been proposed and applied to optimize p...
research
04/23/2021

Synthesis of Deceptive Strategies in Reachability Games with Action Misperception (Technical Report)

Strategic deception is an act of manipulating the opponent's perception ...
research
07/04/2023

Online Learning and Solving Infinite Games with an ERM Oracle

While ERM suffices to attain near-optimal generalization error in the st...
research
03/21/2011

Informed Heuristics for Guiding Stem-and-Cycle Ejection Chains

The state of the art in local search for the Traveling Salesman Problem ...
research
10/29/2019

Multiplayer AlphaZero

The AlphaZero algorithm has achieved superhuman performance in two-playe...
research
01/12/2013

BliStr: The Blind Strategymaker

BliStr is a system that automatically develops strategies for E prover o...

Please sign up or login with your details

Forgot password? Click here to reset