Improved DeepFake Detection Using Whisper Features

06/02/2023
by   Piotr Kawa, et al.
0

With a recent influx of voice generation methods, the threat introduced by audio DeepFake (DF) is ever-increasing. Several different detection methods have been presented as a countermeasure. Many methods are based on so-called front-ends, which, by transforming the raw audio, emphasize features crucial for assessing the genuineness of the audio sample. Our contribution contains investigating the influence of the state-of-the-art Whisper automatic speech recognition model as a DF detection front-end. We compare various combinations of Whisper and well-established front-ends by training 3 detection models (LCNN, SpecRNet, and MesoNet) on a widely used ASVspoof 2021 DF dataset and later evaluating them on the DF In-The-Wild dataset. We show that using Whisper-based features improves the detection for each model and outperforms recent results on the In-The-Wild dataset by reducing Equal Error Rate by 21

READ FULL TEXT
research
05/29/2022

Speaker Identification using Speech Recognition

The audio data is increasing day by day throughout the globe with the in...
research
12/08/2021

Audio-Visual Synchronisation in the wild

In this paper, we consider the problem of audio-visual synchronisation a...
research
05/17/2019

The Audio Auditor: Participant-Level Membership Inference in Voice-Based IoT

Voice interfaces and assistants implemented by various services have bec...
research
11/29/2019

Improving Voice Separation by Incorporating End-to-end Speech Recognition

Despite recent advances in voice separation methods, many challenges rem...
research
03/30/2022

Does Audio Deepfake Detection Generalize?

Current text-to-speech algorithms produce realistic fakes of human voice...
research
05/17/2019

The Audio Auditor: Participant-Level Membership Inference in Internet of Things Voice Services

Voice interfaces and assistants implemented by various services have bec...
research
09/06/2021

Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

An emerging trend in audio processing is capturing low-level speech repr...

Please sign up or login with your details

Forgot password? Click here to reset