Things not Written in Text: Exploring Spatial Commonsense from Visual Signals

03/15/2022
by   Xiao Liu, et al.
0

Spatial commonsense, the knowledge about spatial position and relationship between objects (like the relative size of a lion and a girl, and the position of a boy relative to a bicycle when cycling), is an important part of commonsense knowledge. Although pretrained language models (PLMs) succeed in many NLP tasks, they are shown to be ineffective in spatial commonsense reasoning. Starting from the observation that images are more likely to exhibit spatial commonsense than texts, we explore whether models with visual signals learn more spatial commonsense than text-based PLMs. We propose a spatial commonsense benchmark that focuses on the relative scales of objects, and the positional relationship between people and objects under different actions. We probe PLMs and models with visual signals, including vision-language pretrained models and image synthesis models, on this benchmark, and find that image synthesis models are more capable of learning accurate and consistent spatial knowledge than other models. The spatial knowledge from image synthesis models also helps in natural language understanding tasks that require spatial commonsense.

READ FULL TEXT

page 4

page 5

page 8

research
04/02/2019

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Commonsense knowledge and commonsense reasoning are some of the main bot...
research
09/20/2019

Teaching Pretrained Models with Commonsense Reasoning: A Preliminary KB-Based Approach

Recently, pretrained language models (e.g., BERT) have achieved great su...
research
06/12/2017

Verb Physics: Relative Physical Knowledge of Actions and Objects

Learning commonsense knowledge from natural language text is nontrivial ...
research
04/01/2021

Commonsense Spatial Reasoning for Visually Intelligent Agents

Service robots are expected to reliably make sense of complex, fast-chan...
research
11/13/2019

Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text

Modeling semantic plausibility requires commonsense knowledge about the ...
research
12/20/2022

Do language models have coherent mental models of everyday things?

When people think of everyday things like an "egg," they typically have ...
research
10/29/2020

"where is this relationship going?": Understanding Relationship Trajectories in Narrative Text

We examine a new commonsense reasoning task: given a narrative describin...

Please sign up or login with your details

Forgot password? Click here to reset