A Comprehensive Review on Recent Methods and Challenges of Video Description

11/30/2020
by   Alok Singh, et al.
1

Video description involves the generation of the natural language description of actions, events, and objects in the video. There are various applications of video description by filling the gap between languages and vision for visually impaired people, generating automatic title suggestion based on content, browsing of the video based on the content and video-guided machine translation [86] etc.In the past decade, several works had been done in this field in terms of approaches/methods for video description, evaluation metrics,and datasets. For analyzing the progress in the video description task, a comprehensive survey is needed that covers all the phases of video description approaches with a special focus on recent deep learning approaches. In this work, we report a comprehensive survey on the phases of video description approaches, the dataset for video description, evaluation metrics, open competitions for motivating the research on the video description, open challenges in this field, and future research directions. In this survey, we cover the state-of-the-art approaches proposed for each and every dataset with their pros and cons. For the growth of this research domain,the availability of numerous benchmark dataset is a basic need. Further, we categorize all the dataset into two classes: open domain dataset and domain-specific dataset. From our survey, we observe that the work in this field is in fast-paced development since the task of video description falls in the intersection of computer vision and natural language processing. But still, the work in the video description is far from saturation stage due to various challenges like the redundancy due to similar frames which affect the quality of visual features, the availability of dataset containing more diverse content and availability of an effective evaluation metric.

READ FULL TEXT

page 2

page 3

page 9

page 17

page 18

page 19

research
06/01/2018

Video Description: A Survey of Methods, Datasets and Evaluation Metrics

Automatic video description is useful for assisting the visually impaire...
research
04/22/2023

A Review of Deep Learning for Video Captioning

Video captioning (VC) is a fast-moving, cross-disciplinary area of resea...
research
03/27/2021

Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review

Research in the area of Vision and Language encompasses challenging topi...
research
02/09/2021

The Role of the Input in Natural Language Video Description

Natural Language Video Description (NLVD) has recently received strong i...
research
08/18/2022

Open Information Extraction from 2007 to 2022 – A Survey

Open information extraction is an important NLP task that targets extrac...
research
04/27/2021

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video anal...
research
06/05/2018

Mining for meaning: from vision to language through multiple networks consensus

Describing visual data into natural language is a very challenging task,...

Please sign up or login with your details

Forgot password? Click here to reset