ScienceWorld: Is your Agent Smarter than a 5th Grader?

03/14/2022
by   Ruoyao Wang, et al.
2

This paper presents a new benchmark, ScienceWorld, to test agents' scientific reasoning abilities in a new interactive text environment at the level of a standard elementary school science curriculum. Despite the recent transformer-based progress seen in adjacent fields such as question-answering, scientific text processing, and the wider area of natural language processing, we find that current state-of-the-art models are unable to reason about or explain learned science concepts in novel contexts. For instance, models can easily answer what the conductivity of a previously seen material is but struggle when asked how they would conduct an experiment in a grounded, interactive environment to find the conductivity of an unknown material. This begs the question of whether current models are simply retrieving answers by way of seeing a large number of similar input examples or if they have learned to reason about concepts in a reusable manner. We hypothesize that agents need to be grounded in interactive environments to achieve such reasoning capabilities. Our experiments provide empirical evidence supporting this hypothesis – showing that a 1.5 million parameter agent trained interactively for 100k steps outperforms a 11 billion parameter model statically trained for scientific question-answering and reasoning via millions of expert demonstrations.

READ FULL TEXT

page 2

page 11

page 12

page 18

research
11/26/2018

CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning

We introduce the task of acoustic question answering (AQA) in the area o...
research
09/24/2017

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requir...
research
07/01/2020

Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering

Answering questions that involve multi-step reasoning requires decomposi...
research
05/08/2023

Knowledge-enhanced Agents for Interactive Text Games

Communication via natural language is a crucial aspect of intelligence, ...
research
02/24/2022

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

How can we measure the reasoning capabilities of intelligence systems? V...
research
08/28/2023

Bayesian artificial brain with ChatGPT

This paper aims to investigate the mathematical problem-solving capabili...
research
02/25/2019

Embedded Agency

Traditional models of rational action treat the agent as though it is cl...

Please sign up or login with your details

Forgot password? Click here to reset