A Novel Speech-Driven Lip-Sync Model with CNN and LSTM

05/02/2022
by   Xiaohong Li, et al.
0

Generating synchronized and natural lip movement with speech is one of the most important tasks in creating realistic virtual characters. In this paper, we present a combined deep neural network of one-dimensional convolutions and LSTM to generate vertex displacement of a 3D template face model from variable-length speech input. The motion of the lower part of the face, which is represented by the vertex movement of 3D lip shapes, is consistent with the input speech. In order to enhance the robustness of the network to different sound signals, we adapt a trained speech recognition model to extract speech feature, and a velocity loss term is adopted to reduce the jitter of generated facial animation. We recorded a series of videos of a Chinese adult speaking Mandarin and created a new speech-animation dataset to compensate the lack of such public data. Qualitative and quantitative evaluations indicate that our model is able to generate smooth and natural lip movements synchronized with speech.

READ FULL TEXT

page 3

page 5

research
10/26/2017

Lip2AudSpec: Speech reconstruction from silent lip movements video

In this study, we propose a deep neural network for reconstructing intel...
research
10/15/2021

Neural Dubber: Dubbing for Videos According to Scripts

Dubbing is a post-production process of re-recording actors' dialogues, ...
research
09/20/2023

FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion

Speech-driven 3D facial animation synthesis has been a challenging task ...
research
05/27/2019

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

We propose an end to end deep learning approach for generating real-time...
research
09/09/2022

Reconstructing the Dynamic Directivity of Unconstrained Speech

An accurate model of natural speech directivity is an important step tow...
research
01/15/2023

Learning Audio-Driven Viseme Dynamics for 3D Face Animation

We present a novel audio-driven facial animation approach that can gener...
research
05/24/2022

Deep Learning-based automated classification of Chinese Speech Sound Disorders

This article describes a system for analyzing acoustic data to assist in...

Please sign up or login with your details

Forgot password? Click here to reset