Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

03/14/2019
by   Dong-Jin Kim, et al.
0

Our goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce "relational captioning," a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods.

READ FULL TEXT

page 3

page 6

page 7

research
10/08/2020

Dense Relational Image Captioning via Multi-task Triple-Stream Networks

We introduce dense relational captioning, a novel image captioning task ...
research
02/04/2023

Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

Coherent entity-aware multi-image captioning aims to generate coherent c...
research
12/13/2021

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning

Text-based image captioning (TextCap) requires simultaneous comprehensio...
research
04/27/2022

CapOnImage: Context-driven Dense-Captioning on Image

Existing image captioning systems are dedicated to generating narrative ...
research
04/28/2022

Controllable Image Captioning

State-of-the-art image captioners can generate accurate sentences to des...
research
08/28/2021

Goal-driven text descriptions for images

A big part of achieving Artificial General Intelligence(AGI) is to build...
research
06/23/2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Image captioning aims to describe visual content in natural language. As...

Please sign up or login with your details

Forgot password? Click here to reset