Evaluating Multimodal Interactive Agents

05/26/2022
by   Josh Abramson, et al.
0

Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data. Agents see replayed scenario context, receive an instruction, and are then given control to complete the interaction offline. These agent continuations are recorded and sent to human annotators to mark as success or failure, and agents are ranked according to the proportion of continuations in which they succeed. The resulting STS is fast, controlled, interpretable, and representative of naturalistic interactions. Altogether, the STS consolidates much of what is desirable across many of our standard evaluation metrics, allowing us to accelerate research progress towards producing agents that can interact naturally with humans. https://youtu.be/YR1TngGORGQ

READ FULL TEXT

page 4

page 6

page 18

page 20

research
12/10/2020

Imitating Interactive Intelligence

A common vision from science fiction is that robots will one day inhabit...
research
08/14/2019

Evaluating Empathy in Artificial Agents

The novel research area of computational empathy is in its infancy and m...
research
04/19/2023

On the Perception of Difficulty: Differences between Humans and AI

With the increased adoption of artificial intelligence (AI) in industry ...
research
12/07/2021

Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning

A common vision from science fiction is that robots will one day inhabit...
research
06/17/2020

Causal Meta-Mediation Analysis: Inferring Dose-Response Function From Summary Statistics of Many Randomized Experiments

It is common in the internet industry to use offline-developed algorithm...
research
06/01/2022

A modular architecture for creating multimodal agents

The paper describes a flexible and modular platform to create multimodal...
research
11/23/2019

Corpus-Level End-to-End Exploration for Interactive Systems

A core interest in building Artificial Intelligence (AI) agents is to le...

Please sign up or login with your details

Forgot password? Click here to reset