Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

05/25/2023
by   Rui Liu, et al.
0

Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc., which is an emerging topic. Traditionally we take the mono signal as input and focus on robust feature extraction and effective classifier design. However, the dual-channel stereo information in the audio signal also includes important cues for deepfake, which has not been studied in the prior work. In this paper, we propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process. We first projects the mono to a stereo signal using a pretrained stereo synthesizer, then employs a dual-branch neural architecture to process the left and right channel signals, respectively. In this way, we effectively reveal the artifacts in the fake audio, thus improve the ADD performance. The experiments on the ASVspoof2019 database show that M2S-ADD outperforms all baselines that input mono. We release the source code at <https://github.com/AI-S2-Lab/M2S-ADD>.

READ FULL TEXT
research
05/23/2023

ADD 2023: the Second Audio Deepfake Detection Challenge

Audio deepfake detection is an emerging topic in the artificial intellig...
research
04/20/2021

Identification of fake stereo audio

Channel is one of the important criterions for digital audio quality. Ge...
research
09/05/2023

FSD: An Initial Chinese Dataset for Fake Song Detection

Singing voice synthesis and singing voice conversion have significantly ...
research
06/27/2019

Sensitivity to Haptic-Audio Envelope Asynchrony

We want to understand the human capabilities to perceive amplitude simil...
research
02/14/2022

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

The past few years have witnessed the significant advances of speech syn...
research
10/26/2022

Acoustically-Driven Phoneme Removal That Preserves Vocal Affect Cues

In this paper, we propose a method for removing linguistic information f...
research
06/27/2022

A Topic-Attentive Transformer-based Model For Multimodal Depression Detection

Depression is one of the most common mental disorders, which imposes hea...

Please sign up or login with your details

Forgot password? Click here to reset