Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection

by   Awais Khan, et al.

Voice spoofing attacks pose a significant threat to automated speaker verification systems. Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks. However, in real-world scenarios, the countermeasures are unaware of the generation schema of the attack, necessitating a unified solution. Current unified solutions struggle to detect spoofing artifacts, especially with recent spoofing mechanisms. For instance, the spoofing algorithms inject spectral or temporal anomalies, which are challenging to identify. To this end, we present a spectra-temporal fusion leveraging frame-level and utterance-level coefficients. We introduce a novel local spectral deviation coefficient (SDC) for frame-level inconsistencies and employ a bi-LSTM-based network for sequential temporal coefficients (STC), which capture utterance-level artifacts. Our spectra-temporal fusion strategy combines these coefficients, and an auto-encoder generates spectra-temporal deviated coefficients (STDC) to enhance robustness. Our proposed approach addresses multiple spoofing categories, including synthetic, replay, and partial deepfake attacks. Extensive evaluation on diverse datasets (ASVspoof2019, ASVspoof2021, VSDC, partial spoofs, and in-the-wild deepfakes) demonstrated its robustness for a wide range of voice applications.


page 1

page 2


Spoofing Attack Detection using the Non-linear Fusion of Sub-band Classifiers

The threat of spoofing can pose a risk to the reliability of automatic s...

Voice Spoofing Detection Corpus for Single and Multi-order Audio Replays

The evolution of modern voice controlled devices (VCDs) in recent years ...

Dataset artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark

The Automatic Speaker Verification Spoofing and Countermeasures Challeng...

Bridging the Spoof Gap: A Unified Parallel Aggregation Network for Voice Presentation Attacks

Automatic Speaker Verification (ASV) systems are increasingly used in vo...

AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks

Artefacts that differentiate spoofed from bona-fide utterances can resid...

Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora

Voice-based biometric systems are highly prone to spoofing attacks. Rece...

Please sign up or login with your details

Forgot password? Click here to reset