TennisVid2Text: Fine-grained Descriptions for Domain Specific Videos

11/26/2015
by   Mohak Sukhwani, et al.
0

Automatically describing videos has ever been fascinating. In this work, we attempt to describe videos from a specific domain - broadcast videos of lawn tennis matches. Given a video shot from a tennis match, we intend to generate a textual commentary similar to what a human expert would write on a sports website. Unlike many recent works that focus on generating short captions, we are interested in generating semantically richer descriptions. This demands a detailed low-level analysis of the video content, specially the actions and interactions among subjects. We address this by limiting our domain to the game of lawn tennis. Rich descriptions are generated by leveraging a large corpus of human created descriptions harvested from Internet. We evaluate our method on a newly created tennis video data set. Extensive analysis demonstrate that our approach addresses both semantic correctness as well as readability aspects involved in the task.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 7

page 9

research
03/27/2023

Fine-grained Audible Video Description

We explore a new task for audio-visual-language modeling called fine-gra...
research
05/08/2020

Text Synopsis Generation for Egocentric Videos

Mass utilization of body-worn cameras has led to a huge corpus of availa...
research
07/26/2018

Move Forward and Tell: A Progressive Generator of Video Descriptions

We present an efficient framework that can generate a coherent paragraph...
research
11/07/2021

NarrationBot and InfoBot: A Hybrid System for Automated Video Description

Video accessibility is crucial for blind and low vision users for equita...
research
03/11/2020

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Captioning is a crucial and challenging task for video understanding. In...
research
09/22/2018

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

Automatic generation of textual video descriptions that are time-aligned...
research
05/10/2021

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

When people observe events, they are able to abstract key information an...

Please sign up or login with your details

Forgot password? Click here to reset