Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

07/12/2020
by   Aditya Mogadala, et al.
0

Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore. The challenge here proliferate over the standard vision conditioned sentence-level generation (e.g., image or video captioning) as it requires to produce a brief and coherent story describing the visual content. In this paper, we mask this Vision-to-Sequence as Graph-to-Sequence learning problem and approach it with the Transformer architecture. To be specific, we introduce Sparse Graph-to-Sequence Transformer (SGST) for encoding the graph and decoding a sequence. The encoder aims to directly encode graph-level semantics, while the decoder is used to generate longer sequences. Experiments conducted with the benchmark image paragraph dataset show that our proposed achieve 13.3 improvement on the CIDEr evaluation measure when comparing to the previous state-of-the-art approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2018

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

We propose a hierarchically structured reinforcement learning approach t...
research
08/01/2019

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Image paragraph generation is the task of producing a coherent story (us...
research
05/07/2018

A Graph-to-Sequence Model for AMR-to-Text Generation

The problem of AMR-to-text generation is to recover a text representing ...
research
11/21/2019

Incorporating Textual Evidence in Visual Storytelling

Previous work on visual storytelling mainly focused on exploring image s...
research
05/28/2018

GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation

The task of multi-image cued story generation, such as visual storytelli...
research
10/10/2021

DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence

In this paper, we propose Dynamic Compressive Transformer (DCT), a trans...
research
10/07/2019

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

The ability to generate natural language explanations conditioned on the...

Please sign up or login with your details

Forgot password? Click here to reset