Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data

03/31/2020
by   washington-ramos, et al.
0

The rapid increase in the amount of published visual data and the limited time of users bring the demand for processing untrimmed videos to produce shorter versions that convey the same information. Despite the remarkable progress that has been made by summarization methods, most of them can only select a few frames or skims, which creates visual gaps and breaks the video context. In this paper, we present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos. Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video. Our agent is textually and visually oriented to select which frames to remove to shrink the input video. Additionally, we propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in terms of F1 Score and coverage at the video segment level.

READ FULL TEXT

page 1

page 8

page 12

page 13

research
05/08/2018

FFNet: Video Fast-Forwarding via Reinforcement Learning

For many applications with limited computation, communication, storage a...
research
08/24/2022

Visual Subtitle Feature Enhanced Video Outline Generation

With the tremendously increasing number of videos, there is a great dema...
research
12/29/2019

Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network

The growth of Social Networks has fueled the habit of people logging the...
research
07/29/2020

Compare and Select: Video Summarization with Multi-Agent Reinforcement Learning

Video summarization aims at generating concise video summaries from the ...
research
04/22/2019

Tripping through time: Efficient Localization of Activities in Videos

Localizing moments in untrimmed videos via language queries is a new and...
research
01/09/2023

Cursive Caption Text Detection in Videos

Textual content appearing in videos represents an interesting index for ...
research
04/07/2022

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

Videos are created to express emotion, exchange information, and share e...

Please sign up or login with your details

Forgot password? Click here to reset