Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

by   Jiacheng Li, et al.

Visual Storytelling (VIST) is a task to tell a narrative story about a certain topic according to the given photo stream. The existing studies focus on designing complex models, which rely on a huge amount of human-annotated data. However, the annotation of VIST is extremely costly and many topics cannot be covered in the training dataset due to the long-tail topic distribution. In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting. Inspired by the way humans tell a story, we propose a topic adaptive storyteller to model the ability of inter-topic generalization. In practice, we apply the gradient-based meta-learning algorithm on multi-modal seq2seq models to endow the model the ability to adapt quickly from topic to topic. Besides, We further propose a prototype encoding structure to model the ability of intra-topic derivation. Specifically, we encode and restore the few training story text to serve as a reference to guide the generation at inference time. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model on BLEU and METEOR metric. The further case study shows that the stories generated after few-shot adaptation are more relative and expressive.



page 2

page 3

page 8


Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Personalizing a speech synthesis system is a highly desired application,...

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Visual storytelling aims to generate a narrative paragraph from a sequen...

TopNet: Learning from Neural Topic Model to Generate Long Stories

Long story generation (LSG) is one of the coveted goals in natural langu...

Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling

Visual storytelling includes two important parts: coherence between the ...

Universal-Prototype Augmentation for Few-Shot Object Detection

Few-shot object detection (FSOD) aims to strengthen the performance of n...

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Image paragraph generation is the task of producing a coherent story (us...

Bubble Storytelling with Automated Animation: A Brexit Hashtag Activism Case Study

Hashtag data are common and easy to acquire. Thus, they are widely used ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.