Learning to Solve Complex Tasks by Talking to Agents

10/16/2021
by   Tushar Khot, et al.
0

Humans often solve complex problems by interacting (in natural language) with existing agents, such as AI assistants, that can solve simpler sub-tasks. These agents themselves can be powerful systems built using extensive resources and privately held data. In contrast, common NLP benchmarks aim for the development of self-sufficient models for every task. To address this gap and facilitate research towards “green” AI systems that build upon existing agents, we propose a new benchmark called CommaQA that contains three kinds of complex reasoning tasks that are designed to be solved by “talking” to four agents with different capabilities. We demonstrate that state-of-the-art black-box models, which are unable to leverage existing agents, struggle on CommaQA (exact match score only reaches 40pts) even when given access to the agents' internal knowledge and gold fact supervision. On the other hand, models using gold question decomposition supervision can indeed solve CommaQA to a high accuracy (over 96% exact match) by learning to utilize the agents. Even these additional supervision models, however, do not solve our compositional generalization test set. Finally the end-goal of learning to solve complex tasks by communicating with existing agents without relying on any additional supervision remains unsolved and we hope CommaQA serves as a novel benchmark to enable the development of such systems.

READ FULL TEXT
research
06/11/2022

A Benchmark for Compositional Visual Reasoning

A fundamental component of human vision is our ability to parse complex ...
research
08/31/2021

Phy-Q: A Benchmark for Physical Reasoning

Humans are well-versed in reasoning about the behaviors of physical obje...
research
04/12/2022

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

Given the ubiquitous nature of numbers in text, reasoning with numbers t...
research
03/29/2023

EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

To enable progress towards egocentric agents capable of understanding ev...
research
07/05/2021

The MineRL BASALT Competition on Learning from Human Feedback

The last decade has seen a significant increase of interest in deep lear...
research
08/10/2020

DQI: A Guide to Benchmark Evaluation

A `state of the art' model A surpasses humans in a benchmark B, but fail...
research
01/17/2023

Learning to solve arithmetic problems with a virtual abacus

Acquiring mathematical skills is considered a key challenge for modern A...

Please sign up or login with your details

Forgot password? Click here to reset