Replay and Synthetic Speech Detection with Res2net Architecture

10/28/2020
by   Xu Li, et al.
0

Existing approaches for replay and synthetic speech detection still lack generalizability to unseen spoofing attacks. This work proposes to leverage a novel model structure, so-called Res2Net, to improve the anti-spoofing countermeasure's generalizability. Res2Net mainly modifies the ResNet block to enable multiple feature scales. Specifically, it splits the feature maps within one block into multiple channel groups and designs a residual-like connection across different channel groups. Such connection increases the possible receptive fields, resulting in multiple feature scales. This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks. It also decreases the model size compared to ResNet-based models. Experimental results show that the Res2Net model consistently outperforms ResNet34 and ResNet50 by a large margin in both physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus. Moreover, integration with the squeeze-and-excitation (SE) block can further enhance performance. For feature engineering, we investigate the generalizability of Res2Net combined with different acoustic features, and observe that the constant-Q transform (CQT) achieves the most promising performance in both PA and LA scenarios. Our best single system outperforms other state-of-the-art single systems in both PA and LA of the ASVspoof 2019 corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2021

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

Existing approaches for anti-spoofing in automatic speaker verification ...
research
09/30/2022

Wake Word Detection Based on Res2Net

This letter proposes a new wake word detection system based on Res2Net. ...
research
09/30/2021

Impact of Channel Variation on One-Class Learning for Spoof Detection

The value of Spoofing detection in increasing the reliability of the ASV...
research
08/12/2021

RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

In recent years, synthetic speech generated by advanced text-to-speech (...
research
09/01/2021

Physiological-Physical Feature Fusion for Automatic Voice Spoofing Detection

Speaker verification systems have been used in many production scenarios...
research
06/30/2019

Deep Residual Neural Networks for Audio Spoofing Detection

The state-of-art models for speech synthesis and voice conversion are ca...
research
10/05/2022

ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild

Benchmarking initiatives support the meaningful comparison of competing ...

Please sign up or login with your details

Forgot password? Click here to reset