Scaling Laws for Imitation Learning in NetHack

07/18/2023
by   Jens Tuyls, et al.
0

Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, while powerful, many works find it is often not able to fully recover the underlying expert behavior. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting. To demonstrate our findings, we focus on the game of NetHack, a challenging environment featuring procedural generation, stochasticity, long-term dependencies, and partial observability. We find IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. We forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by at least 2x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a challenging domain, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2018

An Algorithmic Perspective on Imitation Learning

As robots and other intelligent agents move from simple environments and...
research
07/06/2020

Scaling Imitation Learning in Minecraft

Imitation learning is a powerful family of techniques for learning senso...
research
09/24/2021

Is the Number of Trainable Parameters All That Actually Matters?

Recent work has identified simple empirical scaling laws for language mo...
research
04/07/2021

Scaling Scaling Laws with Board Games

The largest experiments in machine learning now require resources far be...
research
10/02/2019

Learning Calibratable Policies using Programmatic Style-Consistency

We study the important and challenging problem of controllable generatio...
research
04/04/2023

Quantum Imitation Learning

Despite remarkable successes in solving various complex decision-making ...
research
12/01/2021

A General Language Assistant as a Laboratory for Alignment

Given the broad capabilities of large language models, it should be poss...

Please sign up or login with your details

Forgot password? Click here to reset