Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games

04/23/2021
by   Ben Wise, et al.
0

The Elo rating system has been used world wide for individual sports and team sports, as exemplified by the European Go Federation (EGF), International Chess Federation (FIDE), International Federation of Association Football (FIFA), and many others. To evaluate the performance of artificial intelligence agents, it is natural to evaluate them on the same Elo scale as humans, such as the rating of 5185 attributed to AlphaGo Zero. There are several fundamental differences between humans and AI that suggest modifications to the system, which in turn require revisiting Elo's fundamental rationale. AI is typically trained on many more games than humans play, and we have little a-priori information on newly created AI agents. Further, AI is being extended into games which are asymmetric between the players, and which could even have large complex boards with different setup in every game, such as commercial paper strategy games. We present a revised rating system, and guidelines for tournaments, to reflect these differences.

READ FULL TEXT
research
01/21/2014

Real Time Strategy Language

Real Time Strategy (RTS) games provide complex domain to test the latest...
research
05/04/2020

Navigating the Landscape of Multiplayer Games to Probe the Drosophila of AI

Multiplayer games have a long history in being used as key testbeds for ...
research
05/28/2021

The Evaluation of Rating Systems in Team-based Battle Royale Games

Online competitive games have become a mainstream entertainment platform...
research
12/27/2022

Measuring an artificial intelligence agent's trust in humans using machine incentives

Scientists and philosophers have debated whether humans can trust advanc...
research
01/12/2022

Learning to Identify Top Elo Ratings: A Dueling Bandits Approach

The Elo rating system is widely adopted to evaluate the skills of (chess...
research
03/29/2017

Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games

Many artificial intelligence (AI) applications often require multiple in...
research
03/12/2022

Discrete, recurrent, and scalable patterns in human judgement underlie affective picture ratings

Operant keypress tasks, where each action has a consequence, have been a...

Please sign up or login with your details

Forgot password? Click here to reset