Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features

With the rapid advancement in synthetic speech generation technologies, great interest in differentiating spoof speech from the natural speech is emerging in the research community. The identification of these synthetic signals is a difficult task not only for the cutting-edge classification models but also for humans themselves. To prevent potential adverse effects, it becomes crucial to detect spoof signals. From a forensics perspective, it is also important to predict the algorithm which generated them to identify the forger. This needs an understanding of the underlying attributes of spoof signals which serve as a signature for the synthesizer. This study emphasizes the segments of speech signals critical in identifying their authenticity by utilizing the Vocal Tract System(VTS) and Voice Source(VS) features. In this paper, we propose a system that detects spoof signals as well as identifies the corresponding speech-generating algorithm. We achieve 99.58% in algorithm classification accuracy. From experiments, we found that a VS feature-based system gives more attention to the transition of phonemes, while, a VTS feature-based system gives more attention to stationary segments of speech signals. We perform model fusion techniques on the VS-based and VTS-based systems to exploit the complementary information to develop a robust classifier. Upon analyzing the confusion plots we found that WaveRNN is poorly classified depicting more naturalness. On the other hand, we identified that synthesizer like Waveform Concatenation, and Neural Source Filter is classified with the highest accuracy. Practical implications of this work can aid researchers from both forensics (leverage artifacts) and the speech communities (mitigate artifacts).

READ FULL TEXT

page 1

page 2

page 4

page 6

page 9

research
04/25/2023

AI-Synthesized Voice Detection Using Neural Vocoder Artifacts

Advancements in AI-synthesized human voices have created a growing threa...
research
01/02/2023

Towards Voice Reconstruction from EEG during Imagined Speech

Translating imagined speech from human brain activity into voice is a ch...
research
02/18/2023

Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

The advancements of AI-synthesized human voices have introduced a growin...
research
04/30/2018

Collapsed speech segment detection and suppression for WaveNet vocoder

In this paper, we propose a technique to alleviate quality degradation c...
research
03/07/2019

Voice Activity Detection: Merging Source and Filter-based Information

Voice Activity Detection (VAD) refers to the problem of distinguishing s...
research
04/10/2021

Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN

We propose a unified approach to data-driven source-filter modeling usin...

Please sign up or login with your details

Forgot password? Click here to reset