Towards A Measure Of General Machine Intelligence

To build increasingly general-purpose artificial intelligence systems that can deal with unknown variables across unknown domains, we need benchmarks that measure precisely how well these systems perform on tasks they have never seen before. A prerequisite for this is a measure of a task's generalization difficulty, or how dissimilar it is from the system's prior knowledge and experience. If the skill of an intelligence system in a particular domain is defined as it's ability to consistently generate a set of instructions (or programs) to solve tasks in that domain, current benchmarks do not quantitatively measure the efficiency of acquiring new skills, making it possible to brute-force skill acquisition by training with unlimited amounts of data and compute power. With this in mind, we first propose a common language of instruction, i.e. a programming language that allows the expression of programs in the form of directed acyclic graphs across a wide variety of real-world domains and computing platforms. Using programs generated in this language, we demonstrate a match-based method to both score performance and calculate the generalization difficulty of any given set of tasks. We use these to define a numeric benchmark called the g-index to measure and compare the skill-acquisition efficiency of any intelligence system on a set of real-world tasks. Finally, we evaluate the suitability of some well-known models as general intelligence systems by calculating their g-index scores.

READ FULL TEXT
research
11/05/2019

The Measure of Intelligence

To make deliberate progress towards more intelligent and more human-like...
research
07/20/2023

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Evaluation of Large Language Models (LLMs) is challenging because aligni...
research
07/07/2023

Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models

In this perspective paper, we first comprehensively review existing eval...
research
03/07/2023

Toward Defining a Domain Complexity Measure Across Domains

Artificial Intelligence (AI) systems planned for deployment in real-worl...
research
04/01/2021

Towards General Purpose Vision Systems

A special purpose learning system assumes knowledge of admissible tasks ...
research
03/09/2020

BitTensor: An Intermodel Intelligence Measure

A purely inter-model version of a machine intelligence benchmark would a...
research
11/04/2021

Representation Edit Distance as a Measure of Novelty

Adaptation to novelty is viewed as learning to change and augment existi...

Please sign up or login with your details

Forgot password? Click here to reset