Topic Scene Graph Generation by Attention Distillation from Caption

10/12/2021
by   W. Wang, et al.
0

If an image tells a story, the image caption is the briefest narrator. Generally, a scene graph prefers to be an omniscient generalist, while the image caption is more willing to be a specialist, which outlines the gist. Lots of previous studies have found that a scene graph is not as practical as expected unless it can reduce the trivial contents and noises. In this respect, the image caption is a good tutor. To this end, we let the scene graph borrow the ability from the image caption so that it can be a specialist on the basis of remaining all-around, resulting in the so-called Topic Scene Graph. What an image caption pays attention to is distilled and passed to the scene graph for estimating the importance of partial objects, relationships, and events. Specifically, during the caption generation, the attention about individual objects in each time step is collected, pooled, and assembled to obtain the attention about relationships, which serves as weak supervision for regularizing the estimated importance scores of relationships. In addition, as this attention distillation process provides an opportunity for combining the generation of image caption and scene graph together, we further transform the scene graph into linguistic form with rich and free-form expressions by sharing a single generation model with image caption. Experiments show that attention distillation brings significant improvements in mining important relationships without strong supervision, and the topic scene graph shows great potential in subsequent applications.

READ FULL TEXT

page 4

page 8

research
09/23/2021

Scene Graph Generation for Better Image Captioning?

We investigate the incorporation of visual relationships into the task o...
research
11/15/2018

LinkNet: Relational Embedding for Scene Graph

Objects and their relationships are critical contents for image understa...
research
05/28/2021

Linguistic Structures as Weak Supervision for Visual Scene Graph Generation

Prior work in scene graph generation requires categorical supervision at...
research
12/01/2019

Interpreting Context of Images using Scene Graphs

Understanding a visual scene incorporates objects, relationships, and co...
research
09/06/2021

Learning to Generate Scene Graph from Natural Language Supervision

Learning from image-text data has demonstrated recent success for many r...
research
01/18/2023

DDS: Decoupled Dynamic Scene-Graph Generation Network

Scene-graph generation involves creating a structural representation of ...

Please sign up or login with your details

Forgot password? Click here to reset