Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

04/06/2021
by   Medhini Narasimhan, et al.
3

We introduce a non-parametric approach for infinite video texture synthesis using a representation learned via contrastive learning. We take inspiration from Video Textures, which showed that plausible new videos could be generated from a single one by stitching its frames together in a novel yet consistent order. This classic work, however, was constrained by its use of hand-designed distance metrics, limiting its use to simple, repetitive videos. We draw on recent techniques from self-supervised learning to learn this distance metric, allowing us to compare frames in a manner that scales to more challenging dynamics, and to condition on other data, such as audio. We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning. To synthesize a texture, we randomly sample frames with high transition probabilities to generate diverse temporally smooth videos with novel sequences and transitions. The model naturally extends to an audio-conditioned setting without requiring any finetuning. Our model outperforms baselines on human perceptual scores, can handle a diverse range of input videos, and can combine semantic and audio-visual cues in order to synthesize videos that synchronize well with an audio signal.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 8

page 13

research
02/15/2023

Audio-Visual Contrastive Learning with Temporal Self-Supervision

We propose a self-supervised learning approach for videos that learns re...
research
10/28/2020

Cycle-Contrast for Self-Supervised Video Representation Learning

We present Cycle-Contrastive Learning (CCL), a novel self-supervised met...
research
07/23/2020

Sound2Sight: Generating Visual Dynamics from Sound and Context

Learning associations across modalities is critical for robust multimoda...
research
07/18/2022

Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks

This paper presents a simple method for speech videos generation based o...
research
04/29/2020

Image Morphing with Perceptual Constraints and STN Alignment

In image morphing, a sequence of plausible frames are synthesized and co...
research
11/04/2020

Improved Algorithm for Seamlessly Creating Infinite Loops from a Video Clip, while Preserving Variety in Textures

This project implements the paper "Video Textures" by Szeliski. The aim ...
research
10/26/2020

Contrastive Unsupervised Learning for Audio Fingerprinting

The rise of video-sharing platforms has attracted more and more people t...

Please sign up or login with your details

Forgot password? Click here to reset