Transcript to Video: Efficient Clip Sequencing from Texts

07/25/2021
by   Yu Xiong, et al.
1

Among numerous videos shared on the web, well-edited ones always attract more attention. However, it is difficult for inexperienced users to make well-edited videos because it requires professional expertise and immense manual labor. To meet the demands for non-experts, we present Transcript-to-Video – a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots. Specifically, we propose a Content Retrieval Module and a Temporal Coherent Module to learn visual-language representations and model shot sequencing styles, respectively. For fast inference, we introduce an efficient search strategy for real-time video clip sequencing. Quantitative results and user studies demonstrate empirically that the proposed learning framework can retrieve content-relevant shots while creating plausible video sequences in terms of style. Besides, the run-time performance analysis shows that our framework can support real-world applications.

READ FULL TEXT

page 22

page 23

page 25

page 28

page 31

page 32

page 33

page 34

research
10/20/2020

Real-time Localized Photorealistic Video Style Transfer

We present a novel algorithm for transferring artistic styles of semanti...
research
09/16/2023

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

Large-scale noisy web image-text datasets have been proven to be efficie...
research
08/21/2021

Flikcer – A Chrome Extension to Resolve Online Epileptogenic Visual Content with Real-Time Luminance Frequency Analysis

Video content with fast luminance variations, or with spatial patterns o...
research
04/22/2019

Tripping through time: Efficient Localization of Activities in Videos

Localizing moments in untrimmed videos via language queries is a new and...
research
02/10/2021

A Generic Object Re-identification System for Short Videos

Short video applications like TikTok and Kwai have been a great hit rece...
research
08/09/2021

Learning to Cut by Watching Movies

Video content creation keeps growing at an incredible pace; yet, creatin...
research
07/01/2022

Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling

Ultrasound (US) is widely used for its advantages of real-time imaging, ...

Please sign up or login with your details

Forgot password? Click here to reset