Experience and Prediction: A Metric of Hardness for a Novel Litmus Test

09/05/2023
by   Nicos Isaak, et al.
0

In the last decade, the Winograd Schema Challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this hardness-metric could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness-indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on Machine Learning (ML), able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.

READ FULL TEXT
research
11/14/2022

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations

Recent work on explainable NLP has shown that few-shot prompting can ena...
research
09/06/2022

Use and Misuse of Machine Learning in Anthropology

Machine learning (ML), being now widely accessible to the research commu...
research
05/05/2022

REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine Learning Research

Transparency around limitations can improve the scientific rigor of rese...
research
12/04/2020

Predicting Emotions Perceived from Sounds

Sonification is the science of communication of data and events to users...
research
08/03/2023

LOUC: Leave-One-Out-Calibration Measure for Analyzing Human Matcher Performance

Schema matching is a core data integration task, focusing on identifying...
research
08/31/2023

Generalised Winograd Schema and its Contextuality

Ambiguities in natural language give rise to probability distributions o...
research
03/05/2019

Defining Image Memorability using the Visual Memory Schema

Memorability of an image is a characteristic determined by the human obs...

Please sign up or login with your details

Forgot password? Click here to reset