Metrology for AI: From Benchmarks to Instruments

11/05/2019
by   Chris Welty, et al.
18

In this paper we present the first steps towards hardening the science of measuring AI systems, by adopting metrology, the science of measurement and its application, and applying it to human (crowd) powered evaluations. We begin with the intuitive observation that evaluating the performance of an AI system is a form of measurement. In all other science and engineering disciplines, the devices used to measure are called instruments, and all measurements are recorded with respect to the characteristics of the instruments used. One does not report mass, speed, or length, for example, of a studied object without disclosing the precision (measurement variance) and resolution (smallest detectable change) of the instrument used. It is extremely common in the AI literature to compare the performance of two systems by using a crowd-sourced dataset as an instrument, but failing to report if the performance difference lies within the capability of that instrument to measure. To illustrate the adoption of metrology to benchmark datasets we use the word similarity benchmark WS353 and several previously published experiments that use it for evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2021

VFSIE – Development and Testing Framework for Federated Science Instruments

Recent developments in softwarization of networked infrastructures combi...
research
01/10/2013

Semi-Instrumental Variables: A Test for Instrument Admissibility

In a causal graphical model, an instrument for a variable X and its effe...
research
01/24/2020

Learning Multi-instrument Classification with Partial Labels

Multi-instrument recognition is the task of predicting the presence or a...
research
07/12/2023

Cyber Framework for Steering and Measurements Collection Over Instrument-Computing Ecosystems

We propose a framework to develop cyber solutions to support the remote ...
research
12/17/2021

Call for establishing benchmark science and engineering

This article investigates the origin and evolution of the benchmark term...
research
04/25/2023

Onboard Science Instrument Autonomy for the Detection of Microscopy Biosignatures on the Ocean Worlds Life Surveyor

The quest to find extraterrestrial life is a critical scientific endeavo...

Please sign up or login with your details

Forgot password? Click here to reset