Exploring the Grounding Issues in Image Caption

05/24/2023
by   Pin-Er Chen, et al.
0

This paper explores the grounding issue concerning multimodal semantic representation from a computational cognitive-linguistic view. Five perceptual properties of groundedness are annotated and analyzed: Affordance, Perceptual salience, Object number, Gaze cueing, and Ecological Niche Association (ENA). We annotated selected images from the Flickr30k dataset with exploratory analyses and statistical modeling of their captions. Our findings suggest that a comprehensive understanding of an object or event requires cognitive attention, semantic distinctions in linguistic expression, and multimodal construction. During this construction process, viewers integrate situated meaning and affordance into multimodal semantics, which is consolidated into image captions used in the image-text dataset incorporating visual and textual elements. Our findings suggest that situated meaning and affordance grounding are critical for grounded natural language understanding systems to generate appropriate responses and show the potential to advance the understanding of human construal in diverse situations.

READ FULL TEXT

page 7

page 10

research
12/05/2020

Neurosymbolic AI for Situated Language Understanding

In recent years, data-intensive AI, particularly the domain of natural l...
research
02/23/2023

HL Dataset: Grounding High-Level Linguistic Concepts in Vision

Current captioning datasets, focus on object-centric captions, describin...
research
08/11/2023

Evidence of Human-Like Visual-Linguistic Integration in Multimodal Large Language Models During Predictive Language Processing

The advanced language processing abilities of large language models (LLM...
research
06/26/2023

Kosmos-2: Grounding Multimodal Large Language Models to the World

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enablin...
research
08/24/2018

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

We introduce a novel multimodal machine translation model that utilizes ...
research
10/20/2015

Semantic, Cognitive, and Perceptual Computing: Advances toward Computing for Human Experience

The World Wide Web continues to evolve and serve as the infrastructure f...
research
05/22/2022

The Case for Perspective in Multimodal Datasets

This paper argues in favor of the adoption of annotation practices for m...

Please sign up or login with your details

Forgot password? Click here to reset