HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

09/15/2023
by   Hyun-seo Shin, et al.
0

Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since the Conformer was designed for sequence-to-sequence tasks, its direct application to ADD tasks may be sub-optimal. To tackle this limitation, we propose HM-Conformer by adopting two components: (1) Hierarchical pooling method progressively reducing the sequence length to eliminate duplicated information (2) Multi-level classification token aggregation method utilizing classification tokens to gather information from different blocks. Owing to these components, HM-Conformer can efficiently detect spoofing evidence by processing various sequence lengths and aggregating them. In experimental results on the ASVspoof 2021 Deepfake dataset, HM-Conformer achieved a 15.71 recent systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2022

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

The voice conversion task is to modify the speaker identity of continuou...
research
12/16/2022

Source Tracing: Detecting Voice Spoofing

Recent anti-spoofing systems focus on spoofing detection, where the task...
research
09/19/2023

Bridging the Spoof Gap: A Unified Parallel Aggregation Network for Voice Presentation Attacks

Automatic Speaker Verification (ASV) systems are increasingly used in vo...
research
07/04/2023

Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure

This paper introduces the Multi-scale Feature Aggregation Conformer (MFA...
research
01/27/2022

The MSXF TTS System for ICASSP 2022 ADD Challenge

This paper presents our MSXF TTS system for Task 3.1 of the Audio Deep S...
research
10/27/2022

Time-Domain Based Embeddings for Spoofed Audio Representation

Anti-spoofing is the task of speech authentication. That is, identifying...

Please sign up or login with your details

Forgot password? Click here to reset