Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs

09/15/2023
by   Md Awsafur Rahman, et al.
0

With the huge technological advances introduced by deep learning in audio speech processing, many novel synthetic speech techniques achieved incredible realistic results. As these methods generate realistic fake human voices, they can be used in malicious acts such as people imitation, fake news, spreading, spoofing, media manipulations, etc. Hence, the ability to detect synthetic or natural speech has become an urgent necessity. Moreover, being able to tell which algorithm has been used to generate a synthetic speech track can be of preeminent importance to track down the culprit. In this paper, a novel strategy is proposed to attribute a synthetic speech track to the generator that is used to synthesize it. The proposed detector transforms the audio into log-mel spectrogram, extracts features using CNN, and classifies it between five known and unknown algorithms, utilizing semi-supervision and ensemble to improve its robustness and generalizability significantly. The proposed detector is validated on two evaluation datasets consisting of a total of 18,000 weakly perturbed (Eval 1) 10,000 strongly perturbed (Eval 2) synthetic speeches. The proposed method outperforms other top teams in accuracy by 12-13 on Eval 2 and 1-2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2022

Detecting GAN-generated Images by Orthogonal Training of Multiple CNNs

In the last few years, we have witnessed the rise of a series of deep le...
research
10/14/2022

Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

Speech synthesis methods can create realistic-sounding speech, which may...
research
09/15/2022

Detecting Synthetic Speech Manipulation in Real Audio Recordings

Recent advances in artificial speech and audio technologies have improve...
research
10/18/2021

FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

As increasing development of text-to-speech (TTS) and voice conversion (...
research
12/05/2022

Fake News and Hate Speech: Language in Common

In this paper we raise the research question of whether fake news and ha...
research
07/03/2023

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

The task of synthetic speech generation is to generate language content ...
research
05/17/2022

SEMI-FND: Stacked Ensemble Based Multimodal Inference For Faster Fake News Detection

Fake News Detection (FND) is an essential field in natural language proc...

Please sign up or login with your details

Forgot password? Click here to reset