AI Evaluation: past, present and future

08/29/2014
by   Jose Hernandez-Orallo, et al.
0

Artificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify and foster progress in the discipline. We will describe and critically assess the different ways AI systems are evaluated. We first focus on the traditional task-oriented evaluation approach. We see that black-box (behavioural evaluation) is becoming more and more common, as AI systems are becoming more complex and unpredictable. We identify three kinds of evaluation: Human discrimination, problem benchmarks and peer confrontation. We describe the limitations of the many evaluation settings and competitions in these three categories and propose several ideas for a more systematic and robust evaluation. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more general approaches under the perspective of universal psychometrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2011

Analysis of first prototype universal intelligence tests: evaluating and comparing AI algorithms and humans

Today, available methods that assess AI systems are focused on using emp...
research
05/09/2013

On the universality of cognitive tests

The analysis of the adaptive behaviour of many different kinds of system...
research
11/06/2014

The Limitations of Standardized Science Tests as Benchmarks for Artificial Intelligence Research: Position Paper

In this position paper, I argue that standardized tests for elementary s...
research
08/07/2023

Why We Don't Have AGI Yet

The original vision of AI was re-articulated in 2002 via the term 'Artif...
research
08/23/2023

A Theory of Intelligences: Concepts, Models, Implications

Intelligence is a human construct to represent the ability to achieve go...
research
08/10/2020

DQI: A Guide to Benchmark Evaluation

A `state of the art' model A surpasses humans in a benchmark B, but fail...
research
12/08/2013

CLIC: A Framework for Distributed, On-Demand, Human-Machine Cognitive Systems

Traditional Artificial Cognitive Systems (for example, intelligent robot...

Please sign up or login with your details

Forgot password? Click here to reset