Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

12/02/2020
by   Jing Su, et al.
0

In this paper, we propose an end-to-end CNN-LSTM model for generating descriptions for sequential images with a local-object attention mechanism. To generate coherent descriptions, we capture global semantic context using a multi-layer perceptron, which learns the dependencies between sequential images. A paralleled LSTM network is exploited for decoding the sequence descriptions. Experimental results show that our model outperforms the baseline across three different evaluation metrics on the datasets published by Microsoft.

READ FULL TEXT
research
12/03/2020

BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Visual storytelling is a creative and challenging task, aiming to automa...
research
07/22/2017

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

Generating captions for images is a task that has recently received cons...
research
05/10/2019

Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network

The Legal Judgment Prediction (LJP) is to determine judgment results bas...
research
03/10/2022

Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling

As a technically challenging topic, visual storytelling aims at generati...
research
05/21/2022

Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics

Few images on the Web receive alt-text descriptions that would make them...
research
09/04/2018

Text2Scene: Generating Abstract Scenes from Textual Descriptions

In this paper, we propose an end-to-end model that learns to interpret n...
research
04/20/2018

Generating Descriptions from Structured Data Using a Bifocal Attention Mechanism and Gated Orthogonalization

In this work, we focus on the task of generating natural language descri...

Please sign up or login with your details

Forgot password? Click here to reset