SC2EGSet: StarCraft II Esport Replay and Game-state Dataset

07/07/2022
by   Andrzej Białecki, et al.
15

As a relatively new form of sport, esports offers unparalleled data availability. Despite the vast amounts of data that are generated by game engines, it can be challenging to extract them and verify their integrity for the purposes of practical and scientific use. Our work aims to open esports to a broader scientific community by supplying raw and pre-processed files from StarCraft II esports tournaments. These files can be used in statistical and machine learning modeling tasks and related to various laboratory-based measurements (e.g., behavioral tests, brain imaging). We have gathered publicly available game-engine generated "replays" of tournament matches and performed data extraction and cleanup using a low-level application programming interface (API) parser library. Additionally, we open-sourced and published all the custom tools that were developed in the process of creating our dataset. These tools include PyTorch and PyTorch Lightning API abstractions to load and model the data. Our dataset contains replays from major and premiere StarCraft II tournaments since 2016. To prepare the dataset, we processed 55 tournament "replaypacks" that contained 17930 files with game-state information. Based on initial investigation of available StarCraft II datasets, we observed that our dataset is the largest publicly available source of StarCraft II esports data upon its publication. Analysis of the extracted data holds promise for further Artificial Intelligence (AI), Machine Learning (ML), psychological, Human-Computer Interaction (HCI), and sports-related studies in a variety of supervised and self-supervised tasks.

READ FULL TEXT

page 4

page 6

research
04/10/2021

ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference

In this paper, we present ManyTypes4Py, a large Python dataset for machi...
research
12/29/2017

RedDwarfData: a simplified dataset of StarCraft matches

The game Starcraft is one of the most interesting arenas to test new mac...
research
02/05/2019

Dungeon Crawl Stone Soup as an Evaluation Domain for Artificial Intelligence

Dungeon Crawl Stone Soup is a popular, single-player, free and open-sour...
research
08/22/2023

The Software Heritage License Dataset (2022 Edition)

Context: When software is released publicly, it is common to include wit...
research
11/26/2018

What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning

Recent years have witnessed the rising popularity of Natural Language Pr...
research
11/06/2017

RoboCupSimData: A RoboCup soccer research dataset

RoboCup is an international scientific robot competition in which teams ...
research
08/07/2017

STARDATA: A StarCraft AI Research Dataset

We release a dataset of 65646 StarCraft replays that contains 1535 milli...

Please sign up or login with your details

Forgot password? Click here to reset