A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

04/02/2023
by   Bo-Kyeong Kim, et al.
0

Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limit their efficient deployment. This study aims to develop a lightweight model for speech-driven talking-face synthesis. We build a compact generator by removing the residual blocks and reducing the channel width from Wav2Lip, a popular talking-face generator. We also present a knowledge distillation scheme to stably yet effectively train the small-capacity generator without adversarial learning. We reduce the number of parameters and MACs by 28× while retaining the performance of the original model. Moreover, to alleviate a severe performance drop when converting the whole generator to INT8 precision, we adopt a selective quantization method that uses FP16 for the quantization-sensitive layers and INT8 for the other layers. Using this mixed precision, we achieve up to a 19× speedup on edge GPUs without noticeably compromising the generation quality.

READ FULL TEXT
research
02/28/2023

UniFLG: Unified Facial Landmark Generator from Text or Speech

Talking face generation has been extensively investigated owing to its w...
research
11/28/2017

Differential Generative Adversarial Networks: Synthesizing Non-linear Facial Variations with Limited Number of Training Data

In face-related applications with a public available dataset, synthesizi...
research
05/08/2020

Data-Free Network Quantization With Adversarial Knowledge Distillation

Network quantization is an essential procedure in deep learning for deve...
research
11/17/2020

Learning Efficient GANs via Differentiable Masks and co-Attention Distillation

Generative Adversarial Networks (GANs) have been widely-used in image tr...
research
05/31/2022

Text/Speech-Driven Full-Body Animation

Due to the increasing demand in films and games, synthesizing 3D avatar ...
research
05/25/2023

On Architectural Compression of Text-to-Image Diffusion Models

Exceptional text-to-image (T2I) generation results of Stable Diffusion m...

Please sign up or login with your details

Forgot password? Click here to reset