Evaluating Understanding on Conceptual Abstraction Benchmarks

06/28/2022
by   Victor Vikram Odouard, et al.
0

A long-held objective in AI is to build systems that understand concepts in a humanlike way. Setting aside the difficulty of building such a system, even trying to evaluate one is a challenge, due to present-day AI's relative opacity and its proclivity for finding shortcut solutions. This is exacerbated by humans' tendency to anthropomorphize, assuming that a system that can recognize one instance of a concept must also understand other instances, as a human would. In this paper, we argue that understanding a concept requires the ability to use it in varied contexts. Accordingly, we propose systematic evaluations centered around concepts, by probing a system's ability to use a given concept in many different instantiations. We present case studies of such an evaluations on two domains – RAVEN (inspired by Raven's Progressive Matrices) and the Abstraction and Reasoning Corpus (ARC) – that have been used to develop and assess abstraction abilities in AI systems. Our concept-based approach to evaluation reveals information about AI systems that conventional test sets would have left hidden.

READ FULL TEXT

page 2

page 4

page 5

page 7

page 8

research
05/11/2023

The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain

The abilities to form and abstract concepts is key to human intelligence...
research
03/07/2023

Visual Abstraction and Reasoning through Language

While Artificial Intelligence (AI) models have achieved human or even su...
research
02/22/2021

Abstraction and Analogy-Making in Artificial Intelligence

Conceptual abstraction and analogy-making are key abilities underlying h...
research
03/10/2023

Who's Thinking? A Push for Human-Centered Evaluation of LLMs using the XAI Playbook

Deployed artificial intelligence (AI) often impacts humans, and there is...
research
03/10/2021

Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs

Several pieces of work have uncovered performance disparities by conduct...
research
02/27/2022

Concept Graph Neural Networks for Surgical Video Understanding

We constantly integrate our knowledge and understanding of the world to ...
research
09/27/2021

Abstraction, Reasoning and Deep Learning: A Study of the "Look and Say" Sequence

The ability to abstract, count, and use System 2 reasoning are well-know...

Please sign up or login with your details

Forgot password? Click here to reset