LumièreNet: Lecture Video Synthesis from Audio

07/04/2019
by   Byung-Hak Kim, et al.
6

We present LumièreNet, a simple, modular, and completely deep-learning based architecture that synthesizes, high quality, full-pose headshot lecture videos from instructor's new audio narration of any length. Unlike prior works, LumièreNet is entirely composed of trainable neural network modules to learn mapping functions from the audio to video through (intermediate) estimated pose-based compact and abstract latent codes. Our video demos are available at [22] and [23].

READ FULL TEXT

page 1

page 2

page 5

page 7

research
06/08/2021

NWT: Towards natural audio-to-video generation with representation learning

In this work we introduce NWT, an expressive speech-to-video model. Unli...
research
06/29/2023

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

The Video-to-Audio (V2A) model has recently gained attention for its pra...
research
07/23/2022

Audio-driven Neural Gesture Reenactment with Video Motion Graphs

Human speech is often accompanied by body gestures including arm and han...
research
12/03/2022

A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos

This paper presents the results of a study conducted on the perceptual a...
research
10/17/2020

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning

In this work, we address the problem of audio-based near-duplicate video...
research
09/15/2023

AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder

Learning high-quality video representation has shown significant applica...
research
07/06/2021

Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation

Due to a high heterogeneity in pose and size and to a limited number of ...

Please sign up or login with your details

Forgot password? Click here to reset