ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

05/24/2023
by   Ruoyao Wang, et al.
0

In this work we examine the ability of language models to generate explicit world models of scientific and common-sense reasoning tasks by framing this as a problem of generating text-based games. To support this, we introduce ByteSized32, a corpus of 32 highly-templated text games written in Python totaling 24k lines of code, each centered around a particular task, and paired with a set of 16 unseen text game specifications for evaluation. We propose a suite of automatic and manual metrics for assessing simulation validity, compliance with task specifications, playability, winnability, and alignment with the physical world. In a single-shot evaluation of GPT-4 on this simulation-as-code-generation task, we find it capable of producing runnable games in 27 discuss areas of future improvement, including GPT-4's apparent capacity to perform well at simulating near canonical task solutions, with performance dropping off as simulations include distractors or deviate from canonical solutions in the action space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Text-based games present a unique challenge for autonomous agents to ope...
research
01/24/2020

Exploration Based Language Learning for Text-Based Games

This work presents an exploration and imitation-learning-based agent cap...
research
12/19/2022

Asking Clarification Questions for Code Generation in General-Purpose Programming Language

Code generation from text requires understanding the user's intent from ...
research
08/01/2022

TextWorldExpress: Simulating Text Games at One Million Steps Per Second

Text-based games offer a challenging test bed to evaluate virtual agents...
research
11/07/2018

Baselines for Reinforcement Learning in Text Games

The ability to learn optimal control policies in systems where action sp...
research
01/13/2023

Infusing Commonsense World Models with Graph Knowledge

While language models have become more capable of producing compelling l...
research
08/28/2020

A Framework for Generating Diverse Haskell-IO Exercise Tasks

We present the design of a framework to automatically generate a large r...

Please sign up or login with your details

Forgot password? Click here to reset