A Multi-Resolution Front-End for End-to-End Speech Anti-Spoofing

10/11/2021
by   Wei Liu, et al.
0

The choice of an optimal time-frequency resolution is usually a difficult but important step in tasks involving speech signal classification, e.g., speech anti-spoofing. The variations of the performance with different choices of timefrequency resolutions can be as large as those with different model architectures, which makes it difficult to judge what the improvement actually comes from when a new network architecture is invented and introduced as the classifier. In this paper, we propose a multi-resolution front-end for feature extraction in an end-to-end classification framework. Optimal weighted combinations of multiple time-frequency resolutions will be learned automatically given the objective of a classification task. Features extracted with different time-frequency resolutions are weighted and concatenated as inputs to the successive networks, where the weights are predicted by a learnable neural network inspired by the weighting block in squeeze-and-excitation networks (SENet). Furthermore, the refinement of the chosen timefrequency resolutions is investigated by pruning the ones with relatively low importance, which reduces the complexity and size of the model. The proposed method is evaluated on the tasks of speech anti-spoofing in ASVSpoof 2019 and its superiority has been justified by comparing with similar baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2020

Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV

This paper presents a simple but effective method that uses multi-resolu...
research
12/06/2020

Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System

Spoofing attacks posed by generating artificial speech can severely degr...
research
09/21/2023

The Impact of Silence on Speech Anti-Spoofing

The current speech anti-spoofing countermeasures (CMs) show excellent pe...
research
08/18/2019

A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement

In speech enhancement, an end-to-end deep neural network converts a nois...
research
10/27/2022

Time-Domain Based Embeddings for Spoofed Audio Representation

Anti-spoofing is the task of speech authentication. That is, identifying...
research
10/20/2019

Deep speech inpainting of time-frequency masks

In particularly noisy environments, transient loud intrusions can comple...
research
05/03/2022

Attentive activation function for improving end-to-end spoofing countermeasure systems

The main objective of the spoofing countermeasure system is to detect th...

Please sign up or login with your details

Forgot password? Click here to reset