Annotation Cleaning for the MSR-Video to Text Dataset

02/12/2021
by   Haoran Chen, et al.
0

The video captioning task is to describe the video contents with natural language by the machine. Many methods have been proposed for solving this task. A large dataset called MSR Video to Text (MSR-VTT) is often used as the benckmark dataset for testing the performance of the methods. However, we found that the human annotations, i.e., the descriptions of video contents in the dataset are quite noisy, e.g., there are many duplicate captions and many captions contain grammatical problems. These problems may pose difficulties to video captioning models for learning. We cleaned the MSR-VTT annotations by removing these problems, then tested several typical video captioning models on the cleaned dataset. Experimental results showed that data cleaning boosted the performances of the models measured by popular quantitative metrics. We recruited subjects to evaluate the results of a model trained on the original and cleaned datasets. The human behavior experiment demonstrated that trained on the cleaned dataset, the model generated captions that were more coherent and more relevant to contents of the video clips. The cleaned dataset is publicly available.

READ FULL TEXT

page 1

page 2

page 4

page 6

research
01/25/2022

BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment

Evaluating video captioning systems is a challenging task as there are m...
research
06/12/2023

Scalable 3D Captioning with Pretrained Models

We introduce Cap3D, an automatic approach for generating descriptive tex...
research
08/31/2019

A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling

Given the features of a video, recurrent neural network can be used to a...
research
04/07/2021

Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning

Over the last decade, the use of Deep Learning in many applications prod...
research
03/12/2022

Taking an Emotional Look at Video Paragraph Captioning

Translating visual data into natural language is essential for machines ...
research
11/27/2019

Non-Autoregressive Video Captioning with Iterative Refinement

Existing state-of-the-art autoregressive video captioning methods (ARVC)...
research
09/15/2020

Semantically Sensible Video Captioning (SSVC)

Video captioning, i.e. the task of generating captions from video sequen...

Please sign up or login with your details

Forgot password? Click here to reset