Dungeons and Data: A Large-Scale NetHack Dataset

11/01/2022
by   Eric Hambro, et al.
0

Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks.

READ FULL TEXT
research
07/29/2019

MineRL: A Large-Scale Dataset of Minecraft Demonstrations

The sample inefficiency of standard deep reinforcement learning methods ...
research
05/10/2020

Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Artificial behavioral agents are often evaluated based on their consiste...
research
07/13/2023

Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning

Offline reinforcement learning (RL) is a promising direction that allows...
research
06/10/2022

Large-Scale Retrieval for Reinforcement Learning

Effective decision making involves flexibly relating past experiences an...
research
04/22/2019

The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors

Though deep reinforcement learning has led to breakthroughs in many diff...
research
03/30/2021

Benchmarks for Deep Off-Policy Evaluation

Off-policy evaluation (OPE) holds the promise of being able to leverage ...
research
12/11/2020

OpenHoldem: An Open Toolkit for Large-Scale Imperfect-Information Game Research

Owning to the unremitting efforts by a few institutes, significant progr...

Please sign up or login with your details

Forgot password? Click here to reset