DeepAI AI Chat
Log In Sign Up

Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators

09/22/2019
by   Kuang-Huei Lee, et al.
Google
Microsoft
18

Grounding language to visual relations is critical to various language-and-vision applications. In this work, we tackle two fundamental language-and-vision tasks: image-text matching and image captioning, and demonstrate that neural scene graph generators can learn effective visual relation features to facilitate grounding language to visual relations and subsequently improve the two end applications. By combining relation features with the state-of-the-art models, our experiments show significant improvement on the standard Flickr30K and MSCOCO benchmarks. Our experimental results and analysis show that relation features improve downstream models' capability of capturing visual relations in end vision-and-language applications. We also demonstrate the importance of learning scene graph generators with visually relevant relations to the effectiveness of relation features.

READ FULL TEXT

page 1

page 7

page 10

page 11

page 12

page 13

page 14

12/02/2021

Consensus Graph Representation Learning for Better Grounded Image Captioning

The contemporary visual captioning models frequently hallucinate objects...
05/23/2023

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Contrastively trained vision-language models have achieved remarkable pr...
10/12/2018

Quantifying the amount of visual information used by neural caption generators

This paper addresses the sensitivity of neural image caption generators ...
02/11/2019

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Many vision and language models suffer from poor visual grounding - ofte...
04/01/2020

More Grounded Image Captioning by Distilling Image-Text Matching Model

Visual attention not only improves the performance of image captioners, ...
11/21/2018

An Interpretable Model for Scene Graph Generation

We propose an efficient and interpretable scene graph generator. We cons...
09/28/2020

Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Scene graph parsing aims to detect objects in an image scene and recogni...