Motion and Context-Aware Audio-Visual Conditioned Video Prediction

12/09/2022
by   Yating Xu, et al.
0

Existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame. However, a direct inference of per-pixel intensity for the next visual frame from the latent codes is extremely challenging because of the high-dimensional image space. To this end, we propose to decouple the audio-visual conditioned video prediction into motion and appearance modeling. The first part is the multimodal motion estimation module that learns motion information as optical flow from the given audio-visual clip. The second part is the context-aware refinement module that uses the predicted optical flow to warp the current visual frame into the next visual frame and refines it base on the given audio-visual context. Experimental results show that our method achieves competitive results on existing benchmarks.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 7

page 8

research
03/29/2018

Context-aware Synthesis for Video Frame Interpolation

Video frame interpolation algorithms typically estimate optical flow or ...
research
07/23/2020

Sound2Sight: Generating Visual Dynamics from Sound and Context

Learning associations across modalities is critical for robust multimoda...
research
10/23/2017

Fully Context-Aware Video Prediction

This paper proposes a new neural network design for unsupervised learnin...
research
09/08/2021

VideoModerator: A Risk-aware Framework for Multimodal Video Moderation in E-Commerce

Video moderation, which refers to remove deviant or explicit content fro...
research
12/09/2022

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

Previous studies have explored generating accurately lip-synced talking ...
research
07/16/2021

CCVS: Context-aware Controllable Video Synthesis

This presentation introduces a self-supervised learning approach to the ...
research
01/18/2023

Real-Time Viewport-Aware Optical Flow Estimation in 360-degree Videos for Visually-Induced Motion Sickness Mitigation

Visually-induced motion sickness (VIMS), a side effect of illusionary mo...

Please sign up or login with your details

Forgot password? Click here to reset