Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

03/09/2023
by   Qi Chen, et al.
0

Audio-driven talking face has attracted broad interest from academia and industry recently. However, data acquisition and labeling in audio-driven talking face are labor-intensive and costly. The lack of data resource results in poor synthesis effect. To alleviate this issue, we propose to use TTS (Text-To-Speech) for data augmentation to improve few-shot ability of the talking face system. The misalignment problem brought by the TTS audio is solved with the introduction of soft-DTW, which is first adopted in the talking face task. Moreover, features extracted by HuBERT are explored to utilize underlying information of audio, and found to be superior over other features. The proposed method achieves 17 and user study preference repectively over the baseline model, which shows the effectiveness of improving few-shot learning for talking face system with TTS augmentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Few-shot learning via tensor hallucination

Few-shot classification addresses the challenge of classifying examples ...
research
09/09/2023

AudRandAug: Random Image Augmentations for Audio Classification

Data augmentation has proven to be effective in training neural networks...
research
08/12/2018

Sample Mixed-Based Data Augmentation for Domestic Audio Tagging

Audio tagging has attracted increasing attention since last decade and h...
research
08/24/2022

Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio Text Augmentations

The absence of large labeled datasets remains a significant challenge in...
research
10/18/2021

Ortho-Shot: Low Displacement Rank Regularization with Data Augmentation for Few-Shot Learning

In few-shot classification, the primary goal is to learn representations...
research
04/24/2022

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Although few-shot learning has attracted much attention from the fields ...
research
06/28/2021

Dizygotic Conditional Variational AutoEncoder for Multi-Modal and Partial Modality Absent Few-Shot Learning

Data augmentation is a powerful technique for improving the performance ...

Please sign up or login with your details

Forgot password? Click here to reset