Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

07/26/2021
by   Wentian Zhao, et al.
0

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article. This task remains challenging as it is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities. Furthermore, the complexity of the article brings difficulty in extracting fine-grained relationships between entities to generate informative event descriptions about the image. To tackle these challenges, we propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities and capture the relationship between entities simultaneously with the help of external knowledge collected from the web. Specifically, we build a text sub-graph by extracting named entities and their relationships from the article, and build an image sub-graph by detecting the objects in the image. To connect these two sub-graphs, we propose a cross-modal entity matching module trained using a knowledge base that contains Wikipedia entries and the corresponding images. Finally, the multi-modal knowledge graph is integrated into the captioning model via a graph attention mechanism. Extensive experiments on both GoodNews and NYTimes800k datasets demonstrate the effectiveness of our method.

READ FULL TEXT

page 1

page 4

page 8

research
08/04/2021

ICECAP: Information Concentrated Entity-aware Image Captioning

Most current image captioning systems focus on describing general image ...
research
08/09/2023

AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities

Multi-modal knowledge graphs (MMKGs) combine different modal data (e.g.,...
research
04/21/2018

Entity-aware Image Caption Generation

Image captioning approaches currently generate descriptions which lack s...
research
06/28/2023

Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection

The explosive growth of rumors with text and images on social media plat...
research
03/28/2018

Referring Relationships

Images are not simply sets of objects: each image represents a web of in...
research
06/17/2023

Do as I can, not as I get: Topology-aware multi-hop reasoning on multi-modal knowledge graphs

Multi-modal knowledge graph (MKG) includes triplets that consist of enti...
research
03/06/2019

A Synchronized Multi-Modal Attention-Caption Dataset and Analysis

In this work, we present a novel multi-modal dataset consisting of eye m...

Please sign up or login with your details

Forgot password? Click here to reset