Video Captioning: a comparative review of where we are and which could be the route

04/12/2022
by   Daniela Moctezuma, et al.
0

Video captioning is the process of describing the content of a sequence of images capturing its semantic relationships and meanings. Dealing with this task with a single image is arduous, not to mention how difficult it is for a video (or images sequence). The amount and relevance of the applications of video captioning are vast, mainly to deal with a significant amount of video recordings in video surveillance, or assisting people visually impaired, to mention a few. To analyze where the efforts of our community to solve the video captioning task are, as well as what route could be better to follow, this manuscript presents an extensive review of more than 105 papers for the period of 2016 to 2021. As a result, the most-used datasets and metrics are identified. Also, the main approaches used and the best ones. We compute a set of rankings based on several performance metrics to obtain, according to its performance, the best method with the best result on the video captioning task. Finally, some insights are concluded about which could be the next steps or opportunity areas to improve dealing with this complex task.

READ FULL TEXT

page 7

page 14

page 15

research
04/22/2023

A Review of Deep Learning for Video Captioning

Video captioning (VC) is a fast-moving, cross-disciplinary area of resea...
research
04/18/2022

End-to-end Dense Video Captioning as Sequence Generation

Dense video captioning aims to identify the events of interest in an inp...
research
04/24/2017

Multi-Task Video Captioning with Video and Entailment Generation

Video captioning, the task of describing the content of a video, has see...
research
07/25/2021

Boosting Video Captioning with Dynamic Loss Network

Video captioning is one of the challenging problems at the intersection ...
research
08/17/2016

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

We present our submission to the Microsoft Video to Language Challenge o...
research
11/13/2019

Crowd Video Captioning

Describing a video automatically with natural language is a challenging ...
research
12/12/2022

"Hey, Can You Add Captions?": The Critical Infrastructuring Practices of Neurodiverse People on TikTok

Accessibility efforts, how we can make the world usable and useful to as...

Please sign up or login with your details

Forgot password? Click here to reset