Exploring and Analyzing Machine Commonsense Benchmarks

12/21/2020
by   Henrique Santos, et al.
0

Commonsense question-answering (QA) tasks, in the form of benchmarks, are constantly being introduced for challenging and comparing commonsense QA systems. The benchmarks provide question sets that systems' developers can use to train and test new models before submitting their implementations to official leaderboards. Although these tasks are created to evaluate systems in identified dimensions (e.g. topic, reasoning type), this metadata is limited and largely presented in an unstructured format or completely not present. Because machine common sense is a fast-paced field, the problem of fully assessing current benchmarks and systems with regards to these evaluation dimensions is aggravated. We argue that the lack of a common vocabulary for aligning these approaches' metadata limits researchers in their efforts to understand systems' deficiencies and in making effective choices for future tasks. In this paper, we first discuss this MCS ecosystem in terms of its elements and their metadata. Then, we present how we are supporting the assessment of approaches by initially focusing on commonsense benchmarks. We describe our initial MCS Benchmark Ontology, an extensible common vocabulary that formalizes benchmark metadata, and showcase how it is supporting the development of a Benchmark tool that enables benchmark exploration and analysis.

READ FULL TEXT
research
02/09/2023

Benchmarks for Automated Commonsense Reasoning: A Survey

More than one hundred benchmarks have been developed to test the commons...
research
10/12/2022

CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm

Recently, the community has achieved substantial progress on many common...
research
10/30/2019

Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering

Non-extractive commonsense QA remains a challenging AI task, as it requi...
research
03/23/2022

A Theoretically Grounded Benchmark for Evaluating Machine Commonsense

Programming machines with commonsense reasoning (CSR) abilities is a lon...
research
03/24/2021

UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

Commonsense AI has long been seen as a near impossible goal – until rece...
research
05/18/2020

Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks

Numerical reasoning is often important to accurately understand the worl...
research
06/01/2021

AMV : Algorithm Metadata Vocabulary

Metadata vocabularies are used in various domains of study. It provides ...

Please sign up or login with your details

Forgot password? Click here to reset