SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space

07/05/2023
by   Lasha Abzianidze, et al.
0

While many natural language inference (NLI) datasets target certain semantic phenomena, e.g., negation, tense aspect, monotonicity, and presupposition, to the best of our knowledge, there is no NLI dataset that involves diverse types of spatial expressions and reasoning. We fill this gap by semi-automatically creating an NLI dataset for spatial reasoning, called SpaceNLI. The data samples are automatically generated from a curated set of reasoning patterns, where the patterns are annotated with inference labels by experts. We test several SOTA NLI systems on SpaceNLI to gauge the complexity of the dataset and the system's capacity for spatial reasoning. Moreover, we introduce a Pattern Accuracy and argue that it is a more reliable and stricter measure than the accuracy for evaluating a system's performance on pattern-based generated data samples. Based on the evaluation results we find that the systems obtain moderate results on the spatial NLI problems but lack consistency per inference pattern. The results also reveal that non-projective spatial inferences (especially due to the "between" preposition) are the most challenging ones.

READ FULL TEXT
research
04/27/2019

HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning

Large crowdsourced datasets are widely used for training and evaluating ...
research
10/27/2021

IndoNLI: A Natural Language Inference Dataset for Indonesian

We present IndoNLI, the first human-elicited NLI dataset for Indonesian....
research
12/15/2022

The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems

Many state-of-the-art natural language understanding (NLU) models are ba...
research
04/12/2021

SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning

This paper proposes a question-answering (QA) benchmark for spatial reas...
research
05/24/2023

ChatGPT and Simple Linguistic Inferences: Blind Spots and Blinds

This paper sheds light on the limitations of ChatGPT's understanding cap...
research
10/16/2015

Evaluating the Competency of a First-Order Ontology

We report on the results of evaluating the competency of a first-order o...
research
02/12/2017

A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning

In modern machine learning, pattern recognition replaces realtime semant...

Please sign up or login with your details

Forgot password? Click here to reset