SingFake: Singing Voice Deepfake Detection

09/14/2023
by   Yongyi Zang, et al.
0

The rise of singing voice synthesis presents critical challenges to artists and industry stakeholders over unauthorized voice usage. Unlike synthesized speech, synthesized singing voices are typically released in songs containing strong background music that may hide synthesis artifacts. Additionally, singing voices present different acoustic and linguistic characteristics from speech utterances. These unique properties make singing voice deepfake detection a relevant but significantly different problem from synthetic speech detection. In this work, we propose the singing voice deepfake detection task. We first present SingFake, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide and 29.40 hours of deepfake song clips in five languages from 40 singers. We provide a train/val/test split where the test sets include various scenarios. We then use SingFake to evaluate four state-of-the-art speech countermeasure systems trained on speech utterances. We find these systems lag significantly behind their performance on speech test data. When trained on SingFake, either using separated vocal tracks or song mixtures, these systems show substantial improvement. However, our evaluations also identify challenges associated with unseen singers, communication codecs, languages, and musical contexts, calling for dedicated research into singing voice deepfake detection. The SingFake dataset and related resources are available online.

READ FULL TEXT
research
09/05/2023

FSD: An Initial Chinese Dataset for Fake Song Detection

Singing voice synthesis and singing voice conversion have significantly ...
research
04/06/2022

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

In this work, we present the SOMOS dataset, the first large-scale mean o...
research
10/18/2021

FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

As increasing development of text-to-speech (TTS) and voice conversion (...
research
05/02/2022

How does a spontaneously speaking conversational agent affect user behavior?

This study investigated the effect of synthetic voice of conversational ...
research
06/25/2021

Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments

State-of-the-art deep-learning-based voice activity detectors (VADs) are...
research
05/11/2020

End-To-End Speech Synthesis Applied to Brazilian Portuguese

Voice synthesis systems are popular in different applications, such as p...
research
04/07/2022

Arabic Text-To-Speech (TTS) Data Preparation

People may be puzzled by the fact that voice over recordings data sets e...

Please sign up or login with your details

Forgot password? Click here to reset