Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning

05/10/2021
by   Dandan Guo, et al.
5

Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image. Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extractor with a deep topic model to guide the learning of a language model. To capture the correlations between the image and text at multiple levels of abstraction and learn the semantic topics from images, we design a variational inference network to build the mapping from image features to textual captions. To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model, including Long Short-Term Memory (LSTM) and Transformer, and jointly optimized. Experiments on public dataset demonstrate that the proposed models, which are competitive with many state-of-the-art approaches in terms of standard evaluation metrics, can be used to both distill interpretable multi-layer topics and generate diverse and coherent captions.

READ FULL TEXT

page 4

page 12

page 14

page 16

research
08/01/2019

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Image paragraph generation is the task of producing a coherent story (us...
research
06/01/2023

CapText: Large Language Model-based Caption Generation From Image Context and Description

While deep-learning models have been shown to perform well on image-to-t...
research
08/08/2019

Image Captioning using Facial Expression and Attention

Benefiting from advances in machine vision and natural language processi...
research
11/23/2016

Semantic Compositional Networks for Visual Captioning

A Semantic Compositional Network (SCN) is developed for image captioning...
research
06/28/2023

VisText: A Benchmark for Semantically Rich Chart Captioning

Captions that describe or explain charts help improve recall and compreh...
research
02/27/2019

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Automatic generation of video captions is a fundamental challenge in com...
research
02/11/2022

Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation

This work aims at generating captions for soccer videos using deep learn...

Please sign up or login with your details

Forgot password? Click here to reset