Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

04/06/2016
by   Subhashini Venugopalan, et al.
0

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantics trained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two large movie description datasets showing significant improvements in grammaticality while modestly improving descriptive quality.

READ FULL TEXT
research
05/03/2015

Sequence to Sequence -- Video to Text

Real-world videos often have complex dynamics; and methods for generatin...
research
10/23/2020

Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Generating natural sentences from Knowledge Graph (KG) triples, known as...
research
11/22/2020

QuerYD: A video dataset with high-quality textual and audio narrations

We introduce QuerYD, a new large-scale dataset for retrieval and event l...
research
04/10/2016

TGIF: A New Dataset and Benchmark on Animated GIF Description

With the recent popularity of animated GIFs on social media, there is ne...
research
12/01/2021

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

The recent and increasing interest in video-language research has driven...
research
11/20/2015

Stories in the Eye: Contextual Visual Interactions for Efficient Video to Language Translation

Integrating higher level visual and linguistic interpretations is at the...
research
07/14/2023

MorphPiece : Moving away from Statistical Language Representation

Tokenization is a critical part of modern NLP pipelines. However, contem...

Please sign up or login with your details

Forgot password? Click here to reset