Learning to generalize to new compositions in image understanding

08/27/2016
by   Yuval Atzmon, et al.
0

Recurrent neural networks have recently been used for learning to describe images using natural language. However, it has been observed that these models generalize poorly to scenes that were not observed during training, possibly depending too strongly on the statistics of the text in the training data. Here we propose to describe images using short structured representations, aiming to capture the crux of a description. These structured representations allow us to tease-out and evaluate separately two types of generalization: standard generalization to new images with similar scenes, and generalization to new combinations of known entities. We compare two learning approaches on the MS-COCO dataset: a state-of-the-art recurrent network based on an LSTM (Show, Attend and Tell), and a simple structured prediction model on top of a deep network. We find that the structured model generalizes to new compositions substantially better than the LSTM, 7 times the accuracy of predicting structured representations. By providing a concrete method to quantify generalization for unseen combinations, we argue that structured representations and compositional splits are a useful benchmark for image captioning, and advocate compositional models that capture linguistic and visual structure.

READ FULL TEXT
research
09/10/2019

Compositional Generalization in Image Captioning

Image captioning models are usually evaluated on their ability to descri...
research
02/25/2021

Natural language description of images using hybrid recurrent neural network

We presented a learning model that generated natural language descriptio...
research
10/16/2021

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

Data-to-text generation focuses on generating fluent natural language re...
research
01/28/2021

The Role of Syntactic Planning in Compositional Image Captioning

Image captioning has focused on generalizing to images drawn from the sa...
research
06/02/2021

SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics

Recently, deep neural networks (DNNs) have achieved great success in sem...
research
04/25/2023

On the Generalization of Learned Structured Representations

Despite tremendous progress over the past decade, deep learning methods ...

Please sign up or login with your details

Forgot password? Click here to reset