The Limitations of Standardized Science Tests as Benchmarks for Artificial Intelligence Research: Position Paper

11/06/2014
by   Ernest Davis, et al.
0

In this position paper, I argue that standardized tests for elementary science such as SAT or Regents tests are not very good benchmarks for measuring the progress of artificial intelligence systems in understanding basic science. The primary problem is that these tests are designed to test aspects of knowledge and ability that are challenging for people; the aspects that are challenging for AI systems are very different. In particular, standardized tests do not test knowledge that is obvious for people; none of this knowledge can be assumed in AI systems. Individual standardized tests also have specific features that are not necessarily appropriate for an AI benchmark. I analyze the Physics subject SAT in some detail and the New York State Regents Science test more briefly. I also argue that the apparent advantages offered by using standardized tests are mostly either minor or illusory. The one major real advantage is that the significance is easily explained to the public; but I argue that even this is a somewhat mixed blessing. I conclude by arguing that, first, more appropriate collections of exam style problems could be assembled, and second, that there are better kinds of benchmarks than exam-style problems. In an appendix I present a collection of sample exam-style problems that test kinds of knowledge missing from the standardized tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2023

Mathematics, word problems, common sense, and artificial intelligence

The paper discusses the capacities and limitations of current artificial...
research
08/29/2014

AI Evaluation: past, present and future

Artificial intelligence develops techniques and systems whose performanc...
research
02/09/2023

Benchmarks for Automated Commonsense Reasoning: A Survey

More than one hundred benchmarks have been developed to test the commons...
research
04/11/2022

Metaethical Perspectives on 'Benchmarking' AI Ethics

Benchmarks are seen as the cornerstone for measuring technical progress ...
research
05/19/2023

Testing System Intelligence

We discuss the adequacy of tests for intelligent systems and practical p...
research
11/26/2021

AI and the Everything in the Whole Wide World Benchmark

There is a tendency across different subfields in AI to valorize a small...
research
05/09/2013

On the universality of cognitive tests

The analysis of the adaptive behaviour of many different kinds of system...

Please sign up or login with your details

Forgot password? Click here to reset