Grounded Semantic Composition for Visual Scenes

06/30/2011
by   P. Gorniak, et al.
0

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.

READ FULL TEXT

page 2

page 11

page 12

page 13

page 23

page 29

page 31

page 32

research
11/22/2015

Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

We propose a model to learn visually grounded word embeddings (vis-w2v) ...
research
09/07/2023

DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners

State-of-the-art visual grounding models can achieve high detection accu...
research
01/14/2021

Enabling Robots to Draw and Tell: Towards Visually Grounded Multimodal Description Generation

Socially competent robots should be equipped with the ability to perceiv...
research
10/07/2020

A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions

Recent models achieve promising results in visually grounded dialogues. ...
research
03/19/2019

When redundancy is rational: A Bayesian approach to 'overinformative' referring expressions

Referring is one of the most basic and prevalent uses of language. How d...
research
04/15/2019

Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics

Propelling, and propelled by, the "deep learning revolution", recent yea...
research
10/07/2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

A major challenge in visually grounded language generation is to build r...

Please sign up or login with your details

Forgot password? Click here to reset