Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation

02/11/2022
by   Ahmad Hammoudeh, et al.
0

This work aims at generating captions for soccer videos using deep learning. In this context, this paper introduces a dataset, model, and triple-level evaluation. The dataset consists of 22k caption-clip pairs and three visual features (images, optical flow, inpainting) for  500 hours of SoccerNet videos. The model is divided into three parts: a transformer learns language, ConvNets learn vision, and a fusion of linguistic and visual features generates captions. The paper suggests evaluating generated captions at three levels: syntax (the commonly used evaluation metrics such as BLEU-score and CIDEr), meaning (the quality of descriptions for a domain expert), and corpus (the diversity of generated captions). The paper shows that the diversity of generated captions has improved (from 0.07 reaching 0.18) with semantics-related losses that prioritize selected words. Semantics-related losses and the utilization of more visual features (optical flow, inpainting) improved the normalized captioning score by 28%. The web page of this work: https://sites.google.com/view/soccercaptioninghttps://sites.google.com/view/soccercaptioning

READ FULL TEXT

page 2

page 3

research
08/08/2019

Image Captioning using Facial Expression and Attention

Benefiting from advances in machine vision and natural language processi...
research
01/25/2022

BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment

Evaluating video captioning systems is a challenging task as there are m...
research
01/14/2021

Exploration of Visual Features and their weighted-additive fusion for Video Captioning

Video captioning is a popular task that challenges models to describe ev...
research
03/20/2021

3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

In this paper, we build a multi-style generative model for stylish image...
research
05/10/2021

Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning

Observing a set of images and their corresponding paragraph-captions, a ...
research
02/27/2019

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Automatic generation of video captions is a fundamental challenge in com...
research
09/13/2023

Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement

While diffusion models demonstrate a remarkable capability for generatin...

Please sign up or login with your details

Forgot password? Click here to reset