Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

08/19/2023
by   Cunhang Fan, et al.
0

The rhythm of synthetic speech is usually too smooth, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47 achieving the state-of-the-art performance among all of the single systems.

READ FULL TEXT

page 2

page 4

research
08/02/2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Recently, pioneer research works have proposed a large number of acousti...
research
03/02/2023

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

In this paper, we propose a novel self-distillation method for fake spee...
research
06/28/2023

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Recently, deep learning-based beamforming algorithms have shown promisin...
research
06/27/2023

Multi-perspective Information Fusion Res2Net with RandomSpecmix for Fake Speech Detection

In this paper, we propose the multi-perspective information fusion (MPIF...
research
11/01/2022

Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection

In a hate speech detection model, we should consider two critical aspect...
research
08/20/2022

Fully Automated End-to-End Fake Audio Detection

The existing fake audio detection systems often rely on expert experienc...
research
10/21/2022

Adaptive re-calibration of channel-wise features for Adversarial Audio Classification

DeepFake Audio, unlike DeepFake images and videos, has been relatively l...

Please sign up or login with your details

Forgot password? Click here to reset